### Weibull model

In this section, we define the Weibull model for analysing survival of patients in the context of human health. We confine ourselves to survival times that are the difference between a nominated start time and a declared failure (uncensored data) or a nominated end time (censored time). Let *T* be a nonnegative random variable for a person’s survival time and *t* be a realisation of the random variable *T*. Kleinbaum and Klein (2005) give some reasons for the occurrence of right censoring in survival studies, including termination of the study, drop outs, or loss to follow-up. For the censored observations, one could impute the missing survival times or assume that they are event-free. The former is often difficult, especially if the censoring proportion is large, and extreme imputation assumptions (such as all censored cases fail right after the time of censoring) may distort inferences (Leung et al. 1997; Stajduhar et al. 2009). In this study, we treat all censored cases as event-free regardless of observation time.

Initially, we assume that we observe survival times *t* of patients possibly from a heterogeneous population. The two-parameter Weibull density function for survival time is given by

W(t\mid \alpha ,\gamma )=\alpha \gamma {t}^{\alpha -1}\text{exp}\left(-\gamma {t}^{\alpha}\right),

for *α* > 0 and *γ* > 0, where *α* is a shape parameter and *γ* is a scale parameter (Ibrahim et al. 2001).

Since the logarithm of the Weibull hazard is a linear function of the logarithm of time, it is more convenient to write the model in terms of the parameterisation *λ* = log(*γ*) (Ibrahim et al. 2001), so that:

f(t\mid \alpha ,\lambda )=\alpha {t}^{\alpha -1}\text{exp}(\lambda -\text{exp}(\lambda \left){t}^{\alpha}\right),

where *t* > 0, *α* > 0 and *γ*>0.

The corresponding survival function and the hazard function, using the *λ* parameterisation, are as follows:

\begin{array}{ll}S(t\mid \alpha ,\lambda )& =\text{exp}(-\text{exp}(\lambda \left){t}^{\alpha}\right),\\ h(t\mid \alpha ,\lambda )& =f(t\mid \alpha ,\lambda )/S(t\mid \alpha ,\lambda )=\alpha \text{exp}(\lambda ){t}^{\alpha -1}.\end{array}

We now assume that we observe possibly right-censored data for *n* subjects; *y* = (*y*
_{1},…, *y*
_{
n
}) where *y*
_{
i
} = (*t*
_{
i
}, *δ*
_{
i
}) and *δ*
_{
i
} is an indicator function such that (Marin et al. 2005a):

{\delta}_{i}=\left\{\begin{array}{ll}1,& \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if the lifetime is uncensored, i.e.,}\phantom{\rule{1em}{0ex}}{T}_{i}={t}_{i}.\\ 0,& \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if the lifetime is censored, i.e.,}\phantom{\rule{1em}{0ex}}{T}_{i}>{t}_{i}.\end{array}\right.

(3)

Let *x*
_{
ij
} be the *j*
^{th} covariate associated with *t*
_{
i
} for *j* = 1,2,…, *p* + 1. In our case study, *x*
_{
ij
} indicates the *p* gene expressions from DNA microarray data, and *x*
_{
i 0} indicates the multi-category phenotype covariate. The data structure is as follows:

\left[\begin{array}{c}\text{Survival time}\\ {t}_{1}\\ {t}_{2}\\ \vdots \\ {t}_{n}\end{array}\right]\left[\begin{array}{cccc}\text{Category}& \text{Gene}\phantom{\rule{0.3em}{0ex}}1& \dots & \text{Gene p}\\ {x}_{10}& {x}_{11}& \dots & {x}_{1p}\\ {x}_{20}& {x}_{21}& \dots & {x}_{2p}\\ \vdots & \vdots & \vdots & \vdots \\ {x}_{n0}& {x}_{n1}& \dots & {x}_{\mathit{\text{np}}}\end{array}\right].

The gene expression data can be included in the model through *λ* (Thamrin et al. 2013). Given that *λ* must be positive, one option is to include the covariates as follows:

\begin{array}{ll}{\gamma}_{i}& =\text{exp}(\underset{i}{\overset{\prime}{\mathit{x}}}\mathit{\beta}),\phantom{\rule{2.77626pt}{0ex}}\text{so that}\\ {\lambda}_{i}& =\text{log}\left({\gamma}_{i}\right)={\mathit{x}}_{i}^{\prime}\mathit{\beta}.\end{array}

(4)

Thus, the log-likelihood function becomes:

\begin{array}{ll}\text{log}L(\alpha ,\mathit{\beta}\mid D)=& \sum _{i=1}^{n}{\delta}_{i}\left(\text{log}(\alpha )+(\alpha -1)\text{log}\left({t}_{i}\right)+\underset{i}{\overset{\prime}{\mathit{x}}}\mathit{\beta}\right)\phantom{\rule{2em}{0ex}}\\ -\text{exp}(\underset{i}{\overset{\prime}{\mathit{x}}}\mathit{\beta}){t}_{i}^{\alpha}.\phantom{\rule{2em}{0ex}}\end{array}

We assume that (*α*, *λ*) are independent a priori (Marin et al. 2005a), and assign Gamma distributions. Thus, the priors are now given by:

\begin{array}{ll}\alpha & \sim \mathit{\text{Gamma}}({u}_{\alpha},{v}_{\alpha})\phantom{\rule{2em}{0ex}}\\ {\lambda}_{i}& \sim \mathit{\text{Normal}}({\mathit{x}}_{i}^{\prime}\mathit{\beta},{\sigma}^{2})\phantom{\rule{2em}{0ex}}\\ \mathit{\beta}& \sim \mathit{\text{Normal}}(0,\mathit{\Sigma}),\phantom{\rule{2em}{0ex}}\end{array}

and we allow *Σ* to be diagonal with elements {\sigma}_{j}^{2},j=1,2,\dots ,p.

Diffuse priors are represented by large positive values for *σ*
^{2}, and small positive values for *u*
_{
α
} and *v*
_{
α
}.

The joint posterior distribution of (*α*, *β*) is given by:

\begin{array}{ll}\phantom{\rule{-13.0pt}{0ex}}p(\alpha ,\mathit{\beta}\mid \mathrm{D})& \propto L(\alpha ,\mathit{\beta}\mid \mathrm{D})p(\alpha )p(\mathit{\beta})\phantom{\rule{2em}{0ex}}\\ \propto {\alpha}^{{\alpha}_{0}+d-1}\text{exp}\phantom{\rule{0.3em}{0ex}}\left\{\sum _{i=1}^{n}\left({\delta}_{i}\underset{i}{\overset{\prime}{x}}\mathit{\beta}+{\delta}_{i}\left(\alpha -1\right)\text{log}\left({t}_{i}\right)\right.\right.\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}-\left(\right)close=")">{t}_{i}^{\alpha}\text{exp}\left(\underset{i}{\overset{\prime}{x}}\mathit{\beta}\right)& \phantom{\rule{2em}{0ex}}\end{array}\n \n \n \n -\n close="}">\n \n \n \n b\n \n \n 0\n \n \n \alpha \n -\n \n \n 1\n \n \n 2\n \n \n \n \n \beta \n -\n \n \n \mu \n \n \n 0\n \n \n \n \n \n \n \Sigma \n \n \n 0\n \n \n -\n 1\n \n \n \n \n \beta \n -\n \n \n \mu \n \n \n 0\n \n \n \n \n \n \n ,\n \n \n

\begin{array}{l}\phantom{\rule{34.0pt}{0ex}}d=\sum _{i}^{n}{\delta}_{i}.\end{array}

MCMC analysis is performed by sampling from the conditional distributions of the parameters. The conditional distribution of *α* does not have an explicit form but can be sampled from MCMC algorithms such as Metropolis Hastings or slice sampling (Gilks et al. 1996).

### Weibull mixture model

We define the Weibull mixture model for analysing survival data. A mixture of *K* Weibull densities (Marin et al. 2005a) is defined by

f(t\mid K,\mathit{w},\mathit{\alpha},\mathit{\gamma})=\sum _{m=1}^{K}{w}_{m}W(t\mid {\alpha}_{m},{\gamma}_{m}),

(5)

where *α* = (*α*
_{1},…, *α*
_{
K
}), *γ* = (*γ*
_{1},…, *γ*
_{
K
}) are the parameters of each Weibull distribution and *w* = (*w*
_{1},…, *w*
_{
K
}) is a vector of nonnegative weights which sum to one.

The corresponding survival function *S*(*t* ∣ *K*, *w*, *α*, *γ*) and hazard function *h*(*t* ∣ *K*, *w*, *α*, *γ*) are as follows:

\begin{array}{ll}S(t\mid K,\mathit{w},\mathit{\alpha},\mathit{\gamma})& =\sum _{m=1}^{K}{w}_{m}\text{exp}\left(-{\gamma}_{m}{t}^{{\alpha}_{m}}\right),\phantom{\rule{2em}{0ex}}\\ h(t\mid K,\mathit{w},\mathit{\alpha},\mathit{\gamma})& =f(t\mid K,\mathit{w},\mathit{\alpha},\mathit{\gamma})/S(t\mid K,\mathit{w},\mathit{\alpha},\mathit{\gamma}).\phantom{\rule{2em}{0ex}}\end{array}

We now assume that we observe possibly right-censored data for *n* patients; *y* = (*y*
_{1},…, *y*
_{
n
}) where *y*
_{
i
} = (*t*
_{
i
}, *δ*
_{
i
}) and *δ*
_{
i
} is an indicator function as described in Section “Weibull model”.

Let *x*
_{
ij
} be the *j*
^{th} covariate associated with patient *i*, for *j* = 1,2,…, *p*. In our application, *x*
_{
ij
} could indicate, for example, the gene expressions. The covariates can be included in the model as follows (Farmomeni and Nardi 2010)

\text{log}\left({\gamma}_{m}\right)={\mathit{x}}_{i}^{\prime}{\mathit{\beta}}_{m}={\lambda}_{m},

(6)

where *x*
_{
i
} = (*x*
_{
i 1},…, *x*
_{
ip
}), *γ*
_{
m
} = (*γ*
_{1m
},…, *γ*
_{
pm
}) and *β*
_{
m
} = (*β*
_{1m
},…, *β*
_{
pm
}), for *i* = 1,2,…, *n* and *m* = 1,2,…, *K*.

Thus, the likelihood function becomes:

\begin{array}{l}L\left(\mathit{w},\mathit{\alpha},\mathit{\gamma}\mid K,{t}_{i},{\delta}_{i},\mathit{x}\right)\propto \prod _{i=1}^{n}\left[\phantom{\rule{0.3em}{0ex}}f{\left({t}_{i}\mid K,\mathit{w},\mathit{\alpha},\mathit{\gamma},\mathit{x}\right)}^{{\delta}_{i}}\right.\\ \phantom{\rule{12em}{0ex}}\times \left(\right)close="]">S{\left({t}_{i}\mid K,\mathit{w},\mathit{\alpha},\mathit{\gamma},\mathit{x}\right)}^{1-{\delta}_{i}}\end{array}\n

Here, the incomplete information is modelled via the survivor function, which reflects the probability that the patient was alive for duration greater than *t*
_{
i
}.

The following prior distributions are placed on the parameters *w* and *α*:

\begin{array}{ll}\mathit{w}\mid K& \sim \mathit{\text{Dirichlet}}({\varphi}_{1},\dots ,{\varphi}_{K}),{\varphi}_{m}=\varphi ,\forall m=1,2,\dots ,\mathrm{K.}\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}{\alpha}_{m}& \sim \mathit{\text{Gamma}}({u}_{\alpha},{v}_{\alpha}),m=1,2,\dots ,\mathrm{K.}\phantom{\rule{2em}{0ex}}\end{array}

For a model without covariates, we employ the following prior for *γ*
_{
m
}.

{\gamma}_{m}\sim \mathit{\text{Gamma}}({u}_{\gamma},{v}_{\gamma}),m=1,2,\dots ,\mathrm{K.}

We chose small positive values for *u*
_{
α
}, *v*
_{
α
}, *u*
_{
γ
}, *v*
_{
γ
} to express vague prior knowledge about these parameters and we set *ϕ* = 1 (Marin et al. 2005a). For a model with covariates, we employ a multivariate normal prior on *β*
_{
m
}, so that

{\mathit{\beta}}_{m}\mid K\sim N(0,\mathit{\Sigma}),

and we allow *Σ* to be diagonal with elements {\sigma}_{j}^{2},j=1,2,\dots ,p. Again, we express a vaguely informative prior by setting a large positive value for {\sigma}_{j}^{2}. The diagonal matrices were used here but this changed recently (Bhadra and Mallick 2013), so one may argue that a non-diagonal variace-covariance matrix may be more appropriate.

The model described in this section can be fitted using MCMC sampling with latent values *Z*
_{
i
} to indicate component membership of the *i*
^{th} observation (Diebolt and Robert 1994; Robert and Casella 2000). Since *w*
_{
m
} = *P* *r*(*Z*
_{
i
} = *m*), we can write *Z*
_{
i
} ∼ *M*(*w*
_{1},…, *w*
_{
K
}). In this scheme, the *Z*
_{
i
} are sampled by computing posterior probabilities of membership, and the other parameters are sampled from their full conditional distributions. This was implemented in the WinBUGS software package (Spiegelhalter et al. 2002).

The WinBUGS software (Lunn et al. 2000; Ntzoufras 2009; Spiegelhalter et al. 2002) is an interactive Windows version of the BUGS program for Bayesian analysis of complex statistical models using MCMC techniques.

Label switching, caused by non-identifiability of the mixture components, was dealt with post-MCMC using the reordering algorithm of Marin et al. (2005b). The algorithm proceeded by selecting the permutation of components at each iteration that minimised the vector dot product with the so-called “pivot”, a high density point from the posterior distribution. The MCMC output was then reordered according to each selected permutation. In this paper, the approximate maximum a posteriori (MAP) (i.e. the realization of parameters corresponding to the MCMC iterate that maximised the unnormalised posterior) was chosen as the pivot.

### Cure model

As in Section “Weibull model”, we observe time to the event of interest for *n* independent subjects, and we let (*t*
_{
i
}, *δ*
_{
i
}) denote the observed time and the event indicator for the *i*-th observation. Let *S*
_{1}(*t*) be the survivor function for the entire population, *S*
^{∗}(*t*) be the survivor function for the non-cured group in the population, and *π* be the cure rate function. Then the standard cure rate model is given by:

{S}_{1}\left(t\right)=\pi +(1-\pi ){S}^{\ast}\left(t\right).

(7)

The commonly used parametric distributions include Exponential and Weibull for *S*
^{∗}(*t*).

As in Yakovlev and Tsodikov (1996), Chen et al. (1999) and Ibrahim et al. (2001), for an individual in a population, let *N* denote the number of latent variables. Assume that *N* has a Poisson distribution with mean *θ*. Let *Z*
_{
i
}, *i* = 1,…, *N* denote the random time, where *Z*
_{
i
} are independently and identically distributed (i.i.d.) with a common distribution function *F*(*t*) = 1-*S*(*t*). Also, assume that *Z*
_{
i
} are independent of *N*. The time to event can be defined by the random variable *Y* = min(*Z*
_{
i
},0 ≤ *i* ≤ *N*), where *P*(*Z*
_{0} = *∞*) = 1. Hence, the survival function for the population is given by

\begin{array}{ll}{S}_{\mathit{\text{pop}}}\left(t\right)& =P(N=0)+P({Z}_{1}>t,\dots ,{Z}_{N}>t,N\ge 1)\\ =\text{exp}(-\theta )+\sum _{k=1}^{\infty}{\left[S\left(t\right)\right]}^{k}\frac{{\theta}^{k}}{k!}\text{exp}(-\theta )\\ =\text{exp}(-\theta F(t\left)\right).\end{array}

(8)

A corresponding cure fraction in model (8) is {lim}_{t\to \infty}{S}_{\mathit{\text{pop}}}\left(t\right)=\text{exp}(-\theta )>0. We also know from (8) that the cure fraction is given by *S*
_{
pop
}(*∞*) = *P*(*N* = 0) = exp(-*θ*). As *θ* → *∞*, the cure fraction tends to 0, whereas as *θ*→0, the cure fraction tends to 1. Corresponding population density and hazard functions are {f}_{\mathit{\text{pop}}}\left(t\right)=-\frac{d}{\mathit{\text{dt}}}{S}_{\mathit{\text{pop}}}\left(t\right)=\theta f\left(t\right)\text{exp}(-\theta FF(t\left)\right) and *h*
_{
pop
}(*t*) = *θ* *f*(*t*), respectively.

The proportional hazards structure with the covariates is modelled through *θ* (Chen et al. 1999; Ibrahim et al. 2001). The population survival function (8) can be written as

{S}_{\mathit{\text{pop}}}\left(t\right)=\text{exp}(-\theta )+\left[1-\text{exp}(-\theta )\right]{S}^{\ast}\left(t\right),

where {S}^{\ast}\left(t\right)=\frac{\text{exp}(-\theta F(t\left)\right)-\text{exp}(-\theta )}{1-\text{exp}(-\theta )}, and \phantom{\rule{0.3em}{0ex}}{f}^{\ast}\left(t\right)=\frac{\text{exp}(-\theta F(t\left)\right)}{1-\text{exp}(-\theta )}\theta f\left(t\right).

Following Chen et al. (1999) and Ibrahim et al. (2001), we construct the likelihood function. Suppose we have *n* subjects and we assume that the *N*
_{
i
} are i.i.d with Poisson distributions with means *θ*
_{
i
}, *i* = 1,…, *n*. Let *Z*
_{
i 1},…, *Z*
_{
iN
} denote the times for the *N*
_{
i
} competing causes, which are unobserved, and which have a cumulative distribution function, *F*(.). In this section, we will specify a parametric form for *F*(.) that is a Weibull distribution. Let *ψ* = (*α*, *λ*)^{′}, where *α* is the shape parameter and *λ* is the scale parameter. We incorporate covariates for the cure rate model through the cure parameter *θ* and we have a different cure rate parameter, *θ*
_{
i
}, for each subject.

Let {\mathit{x}}_{i}^{\prime}=({x}_{i1},\dots ,{x}_{\mathit{\text{ik}}}) denote the *k* x 1 vector of covariates for the *i* th subject, and let *β* = (*β*
_{1},…, *β*
_{
k
}) denote the corresponding vector of regression coefficients. We relate *θ* to the covariates by {\theta}_{i}=\text{exp}(\underset{i}{\overset{\prime}{\mathit{x}}}\mathit{\beta}). Let *t*
_{
i
} denote the survival time for subject *i*, which is right censored, let *C*
_{
i
} be the censoring time, and let *δ*
_{
i
} be the censoring indicator, assuming 1 if *T*
_{
i
} is a failure time and 0 if it is right censored. The observed data are *D* = (*n*, *t*, *δ*, *X*), where *t* = (*t*
_{1},…, *t*
_{
n
})^{′}, *δ* = (*δ*
_{1},…, *δ*
_{
n
})^{′} and *X* = (*x*
_{1},…, *x*
_{
n
})^{′}. The complete data are given by *D*
_{
c
} = (*n*, *t*, *δ*, *X*, *N*), where *N* = (*N*
_{1},…, *N*
_{
n
})^{′}. The complete-data likelihood function of the parameter (*ψ*, *β*) can be written as

\begin{array}{ll}L(\mathit{\psi},\mathit{\beta}\mid {D}_{c})& =\left\{\prod _{i=1}^{n}S{({t}_{i}\mid \mathit{\psi})}^{{N}_{i}-{\delta}_{i}}{\left({N}_{i}f\right({t}_{i}\mid \mathit{\psi}\left)\right)}^{{\delta}_{i}}\right\}\\ \phantom{\rule{1em}{0ex}}\times \text{exp}\left\{\sum _{i=1}^{n}{N}_{i}\text{log}\left({\theta}_{i}\right)-\text{log}({N}_{i}!)-n{\theta}_{i}\right\}.\end{array}

(9)

Again, we assume independent priors for *β* and *ψ*, where *α*∼*G* *amma*(*a*
_{
α
}, *b*
_{
α
}), *λ* ∼ *N*(*μ*
_{
λ
}, *Σ*
_{
λ
}) and *β* ∼ *N*(*μ*
_{
β
}, *Σ*
_{
β
}). We also assume *p*(*α*, *λ*) = *p*(*α* ∣ *δ*
_{0}, *τ*
_{0})*p*(*λ*), p(\alpha \mid {\delta}_{0},{\tau}_{0})\propto {\alpha}^{{\delta}_{0}-1}\text{exp}(-{\tau}_{0}\alpha ), and the hyperparameters (*δ*
_{0}, *τ*
_{0}) are specified (Chen et al. 1999; Ibrahim et al. 2001).

Combining these specifications with the likelihood function (9), the joint posterior distribution of (*α*, *λ*, *β*) becomes

\begin{array}{ll}p(\alpha ,\lambda ,\mathit{\beta}\mid D)& \phantom{\rule{0.3em}{0ex}}\propto \phantom{\rule{0.3em}{0ex}}\prod _{i=1}^{n}{\left({\theta}_{i}f\right({t}_{i}\phantom{\rule{0.3em}{0ex}}\mid \phantom{\rule{0.3em}{0ex}}\alpha ,\lambda \left)\right)}^{{\delta}_{i}}\text{exp}(-{\theta}_{i}(1\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}S({t}_{i}\mid \alpha ,\lambda )\left)\right)\\ \phantom{\rule{1em}{0ex}}\times p(\alpha \mid {\delta}_{0},{\tau}_{0})p(\alpha ,\lambda )p(\mathit{\beta}).\end{array}

(10)

The joint posterior density of (*α*, *λ*, *β*) in equation (10) is analytically intractable because the integration of the joint posterior density is not easy to perform. Hence, inferences are based on MCMC simulation methods. We can use, for example, the Metropolis-Hastings algorithms or slice sampling to simulate samples of *α*, *λ* and *β*. MCMC computations were implemented using the WinBUGS system (Spiegelhalter et al. 2002).