Survival Analysis Article Index for
Survival
Website Links For
Survival
 

Information About

Survival Analysis




This topic is called '' Reliability Theory '' or ''reliability analysis'' in engineering, and '' Duration Analysis '' or '' Duration Modeling '' in Economics .
Death or failure is called an "event" in the survival analysis literature, and so models of death or failure are generically termed ''time-to-event models''.

Survival analysis attempts to answer questions such as: what is the fraction of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the odds of survival?

To answer such questions,
it is necessary to define "lifetime".
In the case of biological survival,
Death is unambiguous,
but for mechanical reliability,
Failure may not be well-defined,
for there may well be mechanical systems in which failure is partial, a matter of degree, or not otherwise not localized in Time .
Even in biological problems,
some events (for example, Heart Attack or other organ failure) may have the same ambiguity.
The Theory outlined below assumes well-defined events at specific times;
other cases may be better treated by models which explicitly account for ambiguous events.

The theory of survival present here also assumes that death or failure happens just once for each subject.
''Recurring event'' or ''repeated event'' models relax that assumption.
The study of recurring events is relevant in Systems Reliability .

This article is phrased primarily in terms of biological survival,
but this is just a convenience.
An equivalent formulation in terms of mechanical failure can be made by replacing every occurrence of ''death'' with ''failure''.


GENERAL FORMULATION



Survival function


The object of primary interest is the survival function, conventionally denoted ''S'', which is defined as

:S(t) = \Pr(T > t)

where ''t'' is some time, ''T'' is the time of death, and "Pr" stands for probability. That is: the survival function is the probability that the time of death is later than some specified time.
The survival function is also called the ''survivor function'' or ''survivorship function'' in problems of biological survival, and the ''reliability function'' in mechanical survival problems.
In the latter case, the reliability function is denoted ''R''(''t'').

Usually one assumes ''S''(0) = 1,
although it could be less than 1 if there is the possibility of immediate death or failure.
Some survival distributions (for example the Gaussian distribution) have the property that ''S''(''t'') < 1 for all finite ''t'',
but this point can be finessed or ignored;
see the discussion under "Some survival distributions" below.

The survival function must be non-increasing: ''S''(''u'') <= ''S''(''t'') if ''u'' > ''t''.
This expresses the notion that survival is only less probable as one ages.
Given this property,
the lifetime distribution function and event density (''F'' and ''f'' below) are well-defined.

Survival probability is usually assumed to approach zero as age increases without bound, i.e., ''S''(''t'') → 0 as ''t'' → ∞,
although the limit could be greater than zero if Eternal Life is possible.


Lifetime distribution function and event density


Related quantities are defined in terms of the survival function.
The lifetime distribution function, conventionally denoted ''F'', is defined as the complement of the survival function,

:F(t) = \Pr(T \le t) = 1 - S(t)

and the derivative of ''F'' (i.e., the density function of the lifetime distribution) is conventionally denoted ''f'',

:f(t) = rac{d}{dt} F(t)

''f'' is sometimes called the event density;
it is the rate of death or failure events per unit time.


Hazard function and cumulative hazard function


The Hazard Function , conventionally denoted \lambda,
is defined as the event rate at time ''t'' conditional on survival until time ''t'' or later,



\end{matrix}



FITTING PARAMETERS TO DATA


Survival models can be usefully viewed as ordinary regression models in which the response variable is time.
However,
computing the likelihood function
(needed for fitting parameters or making other kinds of inferences)
is complicated by missing data problems which are peculiar to time.
The birth and death of a subject may be known,
in which case the lifetime is known.
More generally,
it may be known only that the date of birth was prior to some date:
this is called ''left censoring''.
Also,
it may be known only that the date of death is after some date:
this is called ''right censoring''.
The lifetime may be both right and left censored,
which is sometimes called ''interval censoring''.
It may also happen that subjects with a lifetime less than some threshold may not be observed at all:
this is called ''truncation''.
Note that truncation is different from left censoring,
since for a left censored datum,
we know the subject exists,
but for a truncated datum,
we may be completely unaware of the subject.

There are standard examples of censoring and truncation.
Perhaps the most common is right censoring.
If we examine a group of living subjects,
we know that each one is alive today, but we do not know their future date of death.
Left censoring is also common.
For each subject, we know they are alive today but we may not know their date of birth.
Truncation is also common.
In a so-called ''delayed entry'' study,
subjects are not observed at all until they have reached a certain age.
For example,
people may not be observed until they have reached the age to enter school.
Any deceased subjects in the pre-school age group would be unknown.

The Likelihood Function for a survival model,
in the presence of censored data,
is formulated as follows.
By definition the likelihood function is the joint probability of the data given the parameters of the model.
It is customary to assume that the data are independent given the parameters.
Then the likelihood function is the product of the likelihood of each datum.
It is convenient to partition the data into four categories:
uncensored, left censored, right censored, and interval censored.
These are denoted "unc.", "l.c.", "r.c.", and "i.c." in the equation below.

  :<math> \Pr(T T_i heta) = f(T_i heta) </math>
  :<math> \Pr(T < T I Heta) F(T_i heta) = 1 - S(T_i heta) </math>
  :<math> \Pr(T > T I Heta) S(T_i heta) </math>
  S(T_{i,l} heta) - S(T_{i,r} heta) </math>