Abstract: Survival analysis is a field of statistics with available tools to estimate, model and analyze the times of occurrences of events. This area of statistics is widely used in medical and engineering research, such as studies of reliability of machines and equipment. Counting processes are widely used to estimate survival functions, and non-recurring events. Counting processes for recurrent events are discussed in this research. This article provides computational tools that facilitate the calculation of counting processes in this analysis. The main objective of this work is the development of some routines “R-language” to estimate these processes count. For the development of this work packages available language used in the Internet network; such as “survival”, “survrec” and “TestSurvRec”. Some illustrative examples and calculations of counting processes for the survival analysis are showed and compared with the results obtained with the other package R-language.
Keywords:counting processescounting processes,survival analysissurvival analysis,recurrent eventsrecurrent events,R-languageR-language.
Resumen: El análisis de supervivencia es un campo de la estadística que dispone de herramientas para estimar, modelar y analizar los tiempos de ocurrencias de eventos. Esta área de la estadística es ampliamente utilizada en investigaciones médicas y de ingeniería, como por ejemplo los estudios de confiabilidad de maquinas y equipos. Los procesos de conteo son muy utilizados para estimar las funciones de supervivencia, con y sin eventos recurrentes. Los procesos de conteo para eventos recurrentes se discuten en esta investigación. Este artículo proporciona herramientas computacionales que facilitan el cálculo de los procesos de conteo en dicho análisis. El objetivo central de este trabajo es el desarrollo de algunas rutinas del “lenguaje R” para estimar estos procesos de conteo. Para el desarrollo de este trabajo se utilizaron paquetes del lenguaje disponibles en la red de internet; como por ejemplo, “survival”, “survrec” y “TestSurvRec”. Algunos ejemplos ilustrativos y los cálculos de los procesos de conteo para el análisis de supervivencia se muestran y se comparan con los resultados obtenidos con el otro paquete del lenguaje R.
Palabras clave: procesos de conteo, análisis de supervivencia, eventos recurrentes, lenguaje R.
Design and development of some functions to estimate the counting processes on the survival analysis.
Diseño y desarrollo de algunas funciones para estimar los procesos de conteo en el análisis de supervivencia.
Fleming & Harrington [1] and Andersen & Gill [2] introduced details of the approach of the counting processes to classical survival Aalen [3] pioneer of this technical introduced a martingale approach to survival analysis, where statistical methods can be cast within a unifying counting processes framework. These processes are referred to a single event for recurrent events the authors were use processes double indexed. Wang-Chang (WC) [4] and Peña et al. [5] developed proposals for survival analysis based on counting processes. WC proposed an estimator for the survival function for the case where there is a correlation between the times of occurrence. Peña, Strawderman and Hollander (PSH) developed a nonparametric estimator, that generalize the product limit estimator (or model GPLE denoted so for its initials in English), for the case with recurrent events. See [6] and [7].
The present review article adopts a basic framework of the counting processes. The adoption of the framework of the counting processes intro- duced by Aalen [3]. We introduce some basic of:
A counting process is a stochastic process {N(t), t≥0} with values that are positive, integer, and increasing. If s < t, then N(t) − N(s) is the number of events occurred during the interval [s, t]. The counting processes for the ith subject was represented for Ni(t) and Yi(t) with t > 0 and where, Ni(t) = I{Xi≤t; δ = 1} and Yi(t) = I{Xi≥t}.
N(t) is referred as the number of event observed up to and including time t and Y(t) is referred to as at risk process, indicating whether the subject is at risk at time t. Examples of counting processes include Poisson processes and Renewal processes. A counting process is increasing and hence, it is associated with a sub martingale. So, it can be written as: N(t) = M(t) + A(t), with a martingale M(t) and a predictable increasing process A(t). M(t) is called the martingale associated with the counting process N(t) and the predictable process A(t) is called the cumulative intensity of the counting process N(t).
Wang & Chang [4] developed a nonparametric estimator of the survival function for recurrent events. These authors propose an estimator for marginal survival function in case where there is correlation between the times of occurrences. The estimator can be defined using two processes, d* and R∗. So,
where, S(t) is the survival function, i is the index for an individual or subject, j represents the index for an event, Tijis the time from the j − 1th to the jth event for subject i, τi is the time between the initial event and the end of follow-up for subject i, Ci the censoring time. d∗(t) is the summation of the weighted average of the total number of observed uncensored recurrent times for a subject that are equal to t. d∗(t) represents the sum of the proportion of individuals of times that are equal at inter occurrence when at least an event. d∗(t) is evaluated at time t as,
Ki*(t) is the number of uncensored recurrent events for unit i, and Kiis the number of recurrent events for subject i. The function {.} is one if the condition on breakers is true and zero on other case. So, let
Ri∗ is the summation of the weighted average the total number of observed uncensored recurrent times for a subject that are greater than or equal to t. Ri is the total mass of the risk set at time t and it is calculated as,
Peña et al. [5] developed a non-parametric estimator of the survival function to estimate the survival function of recurrent events. The nonparametric estimator of the inter-event time survivor functions under the assumption of a renewal. They consider a structure model which gap-times independent and identical distribution (i.i.d.). The author used two counting processes N and Y. they considered two time scales: one related to calendar time (s) and other related to gap time (t). So,
where, S (t, s) is the estimator of survival analysis, N(s,w) is one counting process and represents the number of observed events in the calendar period [0,s] with t≤w. It is calculated as,
And,
The function {.} is one if the condition on breakers is true and zero in other case. Y(s, w) is other counting processes and represents the number of observed events in the period [0; s] with t≥w. So,
We designed two functions in R language. We designed “Counting.Processes.WC” and “Counting.Processes.PSH” function to calculate of the counting processes of WC and PSH. With the help of the functions, the users can be able of to plot the occurrences of an event for number of wished units. The user can change the total units to plot. We do not show the routines of these functions on this work. However, they can be solicited way e-mail of the author. The author included these functions on library of a new version of TestS urvRec package of R − CRAN.
This function permits estimates the counting processes of WC estimator and permits to plot the occurrences of events for a number wished of study units.
The syntaxes of this function is,
The arguments are details as follow, yy <- dataset, dat <- name of dataset, xy <- Initial number of the unit of the dataset that is wish include on the study. For defect is “1”, xf<- Final number of the unit of the dataset that is wish include on the study. For defect is “1”, colevent<- colour wished for represent event line on the plot. For defect is “blue”, colcensor<-colour wished for represent censor line on the plot. For defect is “red”, ltyx<- Refer to the type of line that user wish represent the event. For defect is “1”. lwyx<- Refer to the type of line that user wish represent the censor. For defect is “1”, S<- Calendar time for the study. For defect is “1”and T<- Gap time for the study. For defect is “1”.
This function permits estimates the counting processes of PSH estimator and permits to plot the occurrences of events for a number wished of study units. The syntaxes of this function is,
The arguments are details as follow, yy <- dataset, dat <- name of dataset, xy <- Initial number of the unit of the dataset that is wish include on the study. For defect is “1”, xf<- Final number of the unit of the dataset that is wish include on the study. For defect is “1”, colevent<- Colour wished for represent the event line on the plot. For defect is “blue”, colcensor<- Colour wished for represent the censor line on the plot. For defect is “red”, ltyx<- Refer to the type of line that user wish represent the event. For defect is “1”, lwyx<- Refer to the type of line that user wish represent the censor. For defect is “1”, S<- Calendar time for the study. For defect is “1”and T<- Gap time for the study. For defect is “1”.
Example Nº 1. For this example, we will use TBCplapy dataset of the TestSurvRec package [8]. On the procedure, we estimate the counting processes of WC estimator for the patient with id 72. Figure 1 shows occurrence times of tumours (weeks) on the unit number 72. The figure shows the calendar times of apparitions of the tumours on the patient with id 72: Si=72,j=1= 8, Si=72, j=2= 15, Si=72; j=3= 18, Si=72; j=4= 20, Si=2, j=5= 22, Si=72; j=6= 25, Si=72; j=7= 38, Si=72; j=8= 40 and Si=72= 48. The last observation is the time of study of the patient. Its deduced that, ti=72; j=1= 8, ti=72; j=2= 7, ti=72; j=3= 3, ti=72; j=4= 2, ti=72; j=5= 2, ti=72; j=6= 3, ti=72; j=7= 13, ti=72; j=8= 2 and Ci=72= 8. We shows as calculates the counting processes for the WC estimator,
Estimation of d*i=72 (t = 3). Calculation of d*i=72(t =3) with S = 42,
Numbers of apparitions of the tumour on the unit with id = 72. K∗i=72= 8, (see Figure 1).
As, {.} is one if the condition is true and zero in other case, as Ki≥0 is true, then {Ki=72∗ > 0} = 1, so
See that, there is only one time of occurrence with T = 3 on the unit with id = 72, then,
d*i=72(T = 3) = 1/8 * 2
d*i=72 (T = 3) = 0,25
For the estimation of R*i=72 (t = 3) is used the equation (4). So, for i =72,
Then,
R*I=72(t = 3) = 1/8*[5+ 0x0],
R*i=72(t = 3) = 0,625
It is certificates with the R code,
The output is,
Example Nº 2. On this example, we were used TBCplapy dataset of TestSurvRec package. On the procedure, were estimates the counting processes and survival function of the estimator WC for all patients of the dataset, and was used the Plot.Counting.Processes.WC function created on R language.
To continuation, we show an abstract of the output of the function “Plot.Counting.Processes.WC”. On the abstract, we present the results of the outputs of the first and the last calculations of the counting processes for the units of study, as it is appreciate below,
The Table 1 presents the estimations of the survival curves S (t) using WC model. For these estimation are used the equations (1), (2), (3) and (4).
Example Nº 3. For this example, also we use TBCplapy dataset of TestSurvRec package. On the procedure, we explicate as to estimate the counting processes of the estimator PHS for the patient with id equal to 72. To continuation, we shows the calculates, For the calculate of Ni(s; t) is used the equation (6), i = 72, s = 64 and t = 3,
On the figure Nº 1 is shows that, Ki=72 (s = 64) = 8
and
For, t=2 and s=64,
So, Ni=72 (s = 64; t = 2) = 3
See that, for, t=2 and s=64,
So, it is obtained the same result that before method and,
For the calculate of Yi(s, t) is used the equation (8). So with i = 72, t = 3 and s = 64,
Of the figure 1, we have that Ki=72 (s = 64) = 9 and
so,
Yi=72 (s = 64; t = 3) = 6
It is certificates with the R code,
Output of the R code,
Example Nº 4. For this example, also were use the TBCplapy dataset of the TestSurvRec package. On the procedure, were estimated the counting processes of the estimator PSH for all patients and was used the function created on R language. Code on R-language for the estimation of PSH estimator.
To continuation is shows the output of the function “Plot.Counting.Processes.PSH”. On the abstract, its present the results of the outputs of the first and the last calculations of the counting processes, its show as follow,
The Table 2 present the estimations of S(s,t) using PSH model. For these calculates are used the equations (5), (6), (7) and (8).
We have reviewed two nonparametric models of the survival analysis with recurrent events based on the counting processes, specifically WC and PSH models. This article provides important computerized and graphics tools to estimate the counting processes of the survival analysis with recurrent events. For the estimation of survival function on both models, we required of a dataset with a similar structure that the datasets of the TestSurvRec package. Details of calculations procedure of the counting processes are showed. Methods of estimation of the survival function are described, and illustrative examples are explicated. We hope that this article be useful and of great benefit to researchers from this area of research. We hope that with the design of this program, the survival analysis with recurrent events will be greatly facilitated.
Thanks you to the participants of the first postdoctoral of statistic of the UCV for yours helpful comments on the revisions for this paper.
http://servicio.bc.uc.edu.ve/ingenieria/revista/v23n1/art01.pdf (pdf)