Skip to main content

Modelling survival data to account for model uncertainty: a single model or model averaging?


This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival.


Modelling survival data plays an important role in the application of statistics in medicine and health science. In addition to a nonparametric formulation, there are many parametric models available for describing survival, including models based on a single distribution such as the Exponential and Weibull, mixture models based for example on mixtures of distributions and a mixture of susceptible and insusceptible individuals or so-called cure models which account for a fraction of the patients being cured from the disease. Given the wealth of models, the dilemma that is faced by many practitioners is the choice of a survival model.

The problem of model selection is abundant throughout the literature. This includes both covariate selection and choice of the model itself. Some of the methods are based on a series of significance tests while others fit more comprehensive models; some include prior information; some use analytic or approximate methods of estimation while others use Markov Chain Monte Carlo (MCMC) methods; different approaches use different optimisation or model comparison criteria such as Bayes factors (Raftery 1996). For example, McGrory and Titterington (2007) showed how variational techniques can be used to extend the deviance information criterion (DIC) to include the comparison of mixture models, while Basu and Tiwari (2010) used Bayes factors to compare the various model structures in breast cancer survival data.

Recently, Bonato et al. (2011) proposed Bayesian ensemble methods to obtain better survival prediction in high-dimensional gene expression data. Regardless of the method, the most common approach is to choose a single model based on the adapted optimisation or model choice criterion. However, if a single model is selected, then inferences are conditional on the selected model, and model uncertainty is ignored which often leads to excessively narrow or misleading inferences (Hjort and Claeskens 2003; Raftery et al. 1997). This difficulty can be overcome by combining the information provided by all suitable models into the analysis. The most common way of achieving this is to use a form of model averaging. From a Bayesian point of view, this averaging is applied such that the posterior distribution of the quantity of interest is obtained over the set of suitable models, weighted by the respective posterior model probabilities (Raftery 1996).

Draper (1995) and Raftery (1995) reviewed Bayesian model averaging (BMA) and the cost of ignoring model uncertainty. Madigan and Raftery (1994) also considered BMA by using Occam’s razor and Occam’s window approaches to reduce the number of candidate models. Yuan and Yin (2011) used model averaging procedures to make more robust inferences regarding the dose-finding design for phase I clinical trials. Pramana et al. (2012) focused on the case in which several parametric models are fitted to gene expression data and discussed model averaging techniques for the estimation of dose-response models.

In this paper, we consider the problem of predicting survival, based on three alternatives models; a single Weibull, a mixture of Weibulls and a cure model. The Weibull distribution is a popular parametric distribution for describing survival times (Dodson 1994). Given the variety of shapes that can be described by the probability density function (pdf) and the simple representation of the survival function, the Weibull distribution has been used very effectively for analysing lifetime data, particularly when the data are censored, which is very common in most life testing experiments (Collet 1994; Kundu 2008).

Given the nature of microarray data to describe biological systems and outcomes of patients, and the potential of these covariates to produce more precise inferences about survival, the use of a single parametric distribution to describe survival time may not be adequate. Microarray data may enable the description of several homogeneous subgroups of patients with respect to survival time. This paper therefore also considered a mixture of Weibull models for precise estimation and prediction of survival. Mixture models can be used to describe a population consisting of several disjoint groups, where each group is assigned its own distribution, weighted by the probability of an individual from the overall population belonging to that group. This model thus provides a convenient and flexible mechanism for identification and estimation of distributions which are not well modelled by any standard parametric family (Stephens 1997). In the study considered here, the mixture is assumed to comprise a known number of Weibull distributions, with potentially different parameters. Most approaches to the analysis of time to event data implicitly assume all individuals will experience the event of interest. However, there are situations when a proportion of individuals are not expected to experience the event of interest; that is, those individuals are often referred to as immune, cured or nonsusceptible (Ibrahim et al. 2001). To address this issue, cure rate models are considered, which are survival models incorporating a cure fraction. These models, which can be considered as a form of mixture model with one component degenerating to a point mass, extend the understanding of time to event data by allowing the formulation of more accurate and informative conclusions about the two groups of subjects.

Finally, instead of adopting the usual practice of choosing a single “best” model, where “best” is defined in terms of the probability of the model given the data, a BMA approach was adopted to account for model uncertainty in the prediction of the response. We illustrate the approach using a microarray dataset.

The paper is organised as follows. In Section “Methods”, we define BMA. The three competing models are described in a Bayesian framework in Section “Models”. The computational approach for estimation is also presented in this section. In the Section “Application to gene expression data”, we illustrate the model using a case study. The results are discussed further in Section “Discussion”.


The key elements of BMA were discussed by Raftery (1995). He suggested weighting each model by the posterior model probabilities derived from a Bayesian analysis. Assume that there are S models being considered, for s = 1,2,…, S, each with parameter set θ s  based on data D. Let Δ be the quantity of interest; this could represent, for example, the posterior predictive distribution of y. Hence, the posterior distribution of Δ given data D (Hoeting et al. 1999) is

p ( Δ D ) = s = 1 S p ( Δ S = s , D ) p ( S = s D ) ,

where p(S = sD) is the posterior probability of a particular model being true, defined as

p ( S = s D ) = p ( D S = s ) p ( S = s ) s = 1 S p ( D S = s ) p ( S = s ) , s = 1 , 2 , , S ,

where p(DS=s)=p(D θ s ,S=s)p( θ s S=s)d θ s .

Here, p(DS = s) is the marginal likelihood of the data D given model S = s and p(θ s S = s) is the prior density of θ s  given model S = s. p(S = s) is the prior probability that model s is the true model (Hoeting et al. 1999).

Given a model selection problem in which we have to choose between two models, the plausibility of the two different models S 1 and S 2 is assessed by the Bayes factor as the ratio of posterior model probabilities.

The main detractor from using Bayes factors is that they are, in general, difficult to compute. Raftery (1995) proposed using the Bayesian information criterion (BIC) (Schwarz 1978) as an approximation. Buckland et al. (1997) and Claeskens and Hjort (2008) discussed the utilization of BIC in BMA. Buckland et al. (1997) proposed simpler methods in which weights are based upon the penalised likelihood functions formed from the AIC (Akaike 1973).

The starting point for Burnham and Anderson’s model selection theory is the Kullback-Leibler (KL) information given by Burnham and Anderson (2002) and Claeskens and Hjort (2008):

I ( f q ) = f ( x ) log f ( x ) q ( x θ s ) dx ,

where f represents the density function of the true and unknown model, q represents the density function of the model that is used to approximate f, and θ s  is a vector of the unknown parameters to be estimated. The notation I(fq) denotes the information lost when q is used to approximate f or the distance from q to f. For a given set of models, one can compare the KL information for each model and select the model that minimises the information loss across the considered set of models (Burnham and Anderson 2002;2004). However, in practice I(fq) cannot be computed since the true model f is unknown. Schwarz (1978) and Burnham and Anderson (2002) made the link between the KL information and likelihood theory, and showed that the expected KL information can be expressed as

E ( KL ) = - log p ( D θ ̂ s , S = s ) + d s log ( n ) ,

where p(Dθ s ) is the likelihood, d s  is the number of parameters in the model and n is the number of uncensored observations in a survival context (Volinsky and Raftery 2000). A Laplace approximation, typically the BIC (Schwarz 1978), can be used to approximate p(DS = s) (Clyde 2000; Hoeting et al. 1999; Jackson et al. 2009; Yuan and Yin 2011):

log ( p ( D S = s ) ) log p ( D θ ̂ s , S = s ) - d s log ( n ) , BIC = - 2 log p ( D θ ̂ s , S = s ) + d s log ( n ) .

Here logp(D θ ̂ s ,S=s) is the maximised log-likelihood of model s, which estimates goodness of fit of the data.

Schwarz (1978) and Burnham and Anderson (2002) proposed the likelihood of the model given the data, using θ s ̂ defined by

p(D θ ̂ s ,S=s) e 0.5 × BIC .

The BMA weight for the s th model (Jackson et al. 2009; Yuan and Yin 2011) is therefore given by

p ( S = s D ) = exp ( - 1 2 BIC s ) p ( S = s ) s = 1 S exp ( - 1 2 BIC s ) p ( S = s ) .

The BMA weight can be interpreted as the weight of the evidence that model s is true model given a set of S models. For the case in which there is no information about prior probabilities, we can let p(S = s) be equal for all candidate models (1/S), indicating no prior preference for any of the models (Jackson et al. 2009; Pramana et al. 2012). The model with the largest BMA weight will be considered as the best model. Therefore, p(S = sD) is also an approximation to the posterior probability of the model s being correct (Schwarz 1978). A smaller BIC value indicates a better model fit, accounting for model complexity.

Let f ~ sj be the j th simulated observation from the s th model. Then the mean of f from the BMA model, ( f ̄ MA ), can be calculated as follows

f ̄ MA = j = 1 N s = 1 S w s f ~ sj / N ,

where N is the number of simulated observations and w s  = p(S = sD) is the BMA weight, defined previously.


Weibull model

In this section, we define the Weibull model for analysing survival of patients in the context of human health. We confine ourselves to survival times that are the difference between a nominated start time and a declared failure (uncensored data) or a nominated end time (censored time). Let T be a nonnegative random variable for a person’s survival time and t be a realisation of the random variable T. Kleinbaum and Klein (2005) give some reasons for the occurrence of right censoring in survival studies, including termination of the study, drop outs, or loss to follow-up. For the censored observations, one could impute the missing survival times or assume that they are event-free. The former is often difficult, especially if the censoring proportion is large, and extreme imputation assumptions (such as all censored cases fail right after the time of censoring) may distort inferences (Leung et al. 1997; Stajduhar et al. 2009). In this study, we treat all censored cases as event-free regardless of observation time.

Initially, we assume that we observe survival times t of patients possibly from a heterogeneous population. The two-parameter Weibull density function for survival time is given by

W ( t α , γ ) = α γ t α - 1 exp - γ t α ,

for α > 0 and γ > 0, where α is a shape parameter and γ is a scale parameter (Ibrahim et al. 2001).

Since the logarithm of the Weibull hazard is a linear function of the logarithm of time, it is more convenient to write the model in terms of the parameterisation λ = log(γ) (Ibrahim et al. 2001), so that:

f ( t α , λ ) = α t α - 1 exp ( λ - exp ( λ ) t α ) ,

where t > 0, α > 0 and γ>0.

The corresponding survival function and the hazard function, using the λ parameterisation, are as follows:

S ( t α , λ ) = exp ( - exp ( λ ) t α ) , h ( t α , λ ) = f ( t α , λ ) / S ( t α , λ ) = α exp ( λ ) t α - 1 .

We now assume that we observe possibly right-censored data for n subjects; y = (y 1,…, y n ) where y i  = (t i , δ i ) and δ i  is an indicator function such that (Marin et al. 2005a):

δ i = 1 , if the lifetime is uncensored, i.e., T i = t i . 0 , if the lifetime is censored, i.e., T i > t i .

Let x ij  be the j th covariate associated with t i  for j = 1,2,…, p + 1. In our case study, x ij  indicates the p gene expressions from DNA microarray data, and x i 0 indicates the multi-category phenotype covariate. The data structure is as follows:

Survival time t 1 t 2 t n Category Gene 1 Gene p x 10 x 11 x 1 p x 20 x 21 x 2 p x n 0 x n 1 x np .

The gene expression data can be included in the model through λ (Thamrin et al. 2013). Given that λ must be positive, one option is to include the covariates as follows:

γ i = exp ( x i β ) , so that λ i = log ( γ i ) = x i β .

Thus, the log-likelihood function becomes:

log L ( α , β D ) = i = 1 n δ i log ( α ) + ( α - 1 ) log ( t i ) + x i β - exp ( x i β ) t i α .

We assume that (α, λ) are independent a priori (Marin et al. 2005a), and assign Gamma distributions. Thus, the priors are now given by:

α Gamma ( u α , v α ) λ i Normal ( x i β , σ 2 ) β Normal ( 0 , Σ ) ,

and we allow Σ to be diagonal with elements σ j 2 ,j=1,2,,p.

Diffuse priors are represented by large positive values for σ 2, and small positive values for u α  and v α .

The joint posterior distribution of (α, β) is given by:

p ( α , β D ) L ( α , β D ) p ( α ) p ( β ) α α 0 + d - 1 exp i = 1 n δ i x i β + δ i α - 1 log t i - t i α exp x i β - b 0 α - 1 2 β - μ 0 Σ 0 - 1 β - μ 0 ,
d = i n δ i .

MCMC analysis is performed by sampling from the conditional distributions of the parameters. The conditional distribution of α does not have an explicit form but can be sampled from MCMC algorithms such as Metropolis Hastings or slice sampling (Gilks et al. 1996).

Weibull mixture model

We define the Weibull mixture model for analysing survival data. A mixture of K Weibull densities (Marin et al. 2005a) is defined by

f(tK,w,α,γ)= m = 1 K w m W(t α m , γ m ),

where α = (α 1,…, α K ), γ = (γ 1,…, γ K ) are the parameters of each Weibull distribution and w = (w 1,…, w K ) is a vector of nonnegative weights which sum to one.

The corresponding survival function S(tK, w, α, γ) and hazard function h(tK, w, α, γ) are as follows:

S ( t K , w , α , γ ) = m = 1 K w m exp - γ m t α m , h ( t K , w , α , γ ) = f ( t K , w , α , γ ) / S ( t K , w , α , γ ) .

We now assume that we observe possibly right-censored data for n patients; y = (y 1,…, y n ) where y i  = (t i , δ i ) and δ i  is an indicator function as described in Section “Weibull model”.

Let x ij  be the j th covariate associated with patient i, for j = 1,2,…, p. In our application, x ij  could indicate, for example, the gene expressions. The covariates can be included in the model as follows (Farmomeni and Nardi 2010)

log( γ m )= x i β m = λ m ,

where x i  = (x i 1,…, x ip ), γ m  = (γ 1m ,…, γ pm ) and β m  = (β 1m ,…, β pm ), for i = 1,2,…, n and m = 1,2,…, K.

Thus, the likelihood function becomes:

L w , α , γ K , t i , δ i , x i = 1 n f t i K , w , α , γ , x δ i × S t i K , w , α , γ , x 1 - δ i

Here, the incomplete information is modelled via the survivor function, which reflects the probability that the patient was alive for duration greater than t i .

The following prior distributions are placed on the parameters w and α:

w K Dirichlet ( ϕ 1 , , ϕ K ) , ϕ m = ϕ , m = 1 , 2 , , K. α m Gamma ( u α , v α ) , m = 1 , 2 , , K.

For a model without covariates, we employ the following prior for γ m .

γ m Gamma ( u γ , v γ ) , m = 1 , 2 , , K.

We chose small positive values for u α , v α , u γ , v γ  to express vague prior knowledge about these parameters and we set ϕ = 1 (Marin et al. 2005a). For a model with covariates, we employ a multivariate normal prior on β m , so that

β m K N ( 0 , Σ ) ,

and we allow Σ to be diagonal with elements σ j 2 ,j=1,2,,p. Again, we express a vaguely informative prior by setting a large positive value for σ j 2 . The diagonal matrices were used here but this changed recently (Bhadra and Mallick 2013), so one may argue that a non-diagonal variace-covariance matrix may be more appropriate.

The model described in this section can be fitted using MCMC sampling with latent values Z i  to indicate component membership of the i th observation (Diebolt and Robert 1994; Robert and Casella 2000). Since w m  = P r(Z i  = m), we can write Z i M(w 1,…, w K ). In this scheme, the Z i  are sampled by computing posterior probabilities of membership, and the other parameters are sampled from their full conditional distributions. This was implemented in the WinBUGS software package (Spiegelhalter et al. 2002).

The WinBUGS software (Lunn et al. 2000; Ntzoufras 2009; Spiegelhalter et al. 2002) is an interactive Windows version of the BUGS program for Bayesian analysis of complex statistical models using MCMC techniques.

Label switching, caused by non-identifiability of the mixture components, was dealt with post-MCMC using the reordering algorithm of Marin et al. (2005b). The algorithm proceeded by selecting the permutation of components at each iteration that minimised the vector dot product with the so-called “pivot”, a high density point from the posterior distribution. The MCMC output was then reordered according to each selected permutation. In this paper, the approximate maximum a posteriori (MAP) (i.e. the realization of parameters corresponding to the MCMC iterate that maximised the unnormalised posterior) was chosen as the pivot.

Cure model

As in Section “Weibull model”, we observe time to the event of interest for n independent subjects, and we let (t i , δ i ) denote the observed time and the event indicator for the i-th observation. Let S 1(t) be the survivor function for the entire population, S (t) be the survivor function for the non-cured group in the population, and π be the cure rate function. Then the standard cure rate model is given by:

S 1 (t)=π+(1-π) S (t).

The commonly used parametric distributions include Exponential and Weibull for S (t).

As in Yakovlev and Tsodikov (1996), Chen et al. (1999) and Ibrahim et al. (2001), for an individual in a population, let N denote the number of latent variables. Assume that N has a Poisson distribution with mean θ. Let Z i , i = 1,…, N denote the random time, where Z i  are independently and identically distributed (i.i.d.) with a common distribution function F(t) = 1-S(t). Also, assume that Z i  are independent of N. The time to event can be defined by the random variable Y = min(Z i ,0 ≤ i ≤ N), where P(Z 0 = ) = 1. Hence, the survival function for the population is given by

S pop ( t ) = P ( N = 0 ) + P ( Z 1 > t , , Z N > t , N 1 ) = exp ( - θ ) + k = 1 S ( t ) k θ k k ! exp ( - θ ) = exp ( - θ F ( t ) ) .

A corresponding cure fraction in model (8) is lim t S pop (t)=exp(-θ)>0. We also know from (8) that the cure fraction is given by S pop () = P(N = 0) = exp(-θ). As θ → , the cure fraction tends to 0, whereas as θ→0, the cure fraction tends to 1. Corresponding population density and hazard functions are f pop (t)=- d dt S pop (t)=θf(t)exp(-θFF(t)) and h pop (t) = θ f(t), respectively.

The proportional hazards structure with the covariates is modelled through θ (Chen et al. 1999; Ibrahim et al. 2001). The population survival function (8) can be written as

S pop ( t ) = exp ( - θ ) + 1 - exp ( - θ ) S ( t ) ,

where S (t)= exp ( - θ F ( t ) ) - exp ( - θ ) 1 - exp ( - θ ) , and f (t)= exp ( - θ F ( t ) ) 1 - exp ( - θ ) θf(t).

Following Chen et al. (1999) and Ibrahim et al. (2001), we construct the likelihood function. Suppose we have n subjects and we assume that the N i  are i.i.d with Poisson distributions with means θ i , i = 1,…, n. Let Z i 1,…, Z iN  denote the times for the N i  competing causes, which are unobserved, and which have a cumulative distribution function, F(.). In this section, we will specify a parametric form for F(.) that is a Weibull distribution. Let ψ = (α, λ), where α is the shape parameter and λ is the scale parameter. We incorporate covariates for the cure rate model through the cure parameter θ and we have a different cure rate parameter, θ i , for each subject.

Let x i =( x i 1 ,, x ik ) denote the k x 1 vector of covariates for the i th subject, and let β = (β 1,…, β k ) denote the corresponding vector of regression coefficients. We relate θ to the covariates by θ i =exp( x i β). Let t i  denote the survival time for subject i, which is right censored, let C i  be the censoring time, and let δ i  be the censoring indicator, assuming 1 if T i  is a failure time and 0 if it is right censored. The observed data are D = (n, t, δ, X), where t = (t 1,…, t n ), δ = (δ 1,…, δ n ) and X = (x 1,…, x n ). The complete data are given by D c  = (n, t, δ, X, N), where N = (N 1,…, N n ). The complete-data likelihood function of the parameter (ψ, β) can be written as

L ( ψ , β D c ) = i = 1 n S ( t i ψ ) N i - δ i ( N i f ( t i ψ ) ) δ i × exp i = 1 n N i log ( θ i ) - log ( N i ! ) - n θ i .

Again, we assume independent priors for β and ψ, where αG amma(a α , b α ), λN(μ λ , Σ λ ) and βN(μ β , Σ β ). We also assume p(α, λ) = p(αδ 0, τ 0)p(λ), p(α δ 0 , τ 0 ) α δ 0 - 1 exp(- τ 0 α), and the hyperparameters (δ 0, τ 0) are specified (Chen et al. 1999; Ibrahim et al. 2001).

Combining these specifications with the likelihood function (9), the joint posterior distribution of (α, λ, β) becomes

p ( α , λ , β D ) i = 1 n ( θ i f ( t i α , λ ) ) δ i exp ( - θ i ( 1 - S ( t i α , λ ) ) ) × p ( α δ 0 , τ 0 ) p ( α , λ ) p ( β ) .

The joint posterior density of (α, λ, β) in equation (10) is analytically intractable because the integration of the joint posterior density is not easy to perform. Hence, inferences are based on MCMC simulation methods. We can use, for example, the Metropolis-Hastings algorithms or slice sampling to simulate samples of α, λ and β. MCMC computations were implemented using the WinBUGS system (Spiegelhalter et al. 2002).

Application to gene expression data

DLBCL dataset

We applied the proposed method of model averaging across the three candidate survival models to a dataset containing gene expression of Diffuse Large B-cell Lymphoma (DLBCL). The dataset comprises gene expression measurements and survival times of patients with DLBCL (Rosenwald et al. 2002). DLBCL (Lenz et al. 2008) is a type of cancer of the lymphatic system in adults which can be cured by anthracycline-based chemotherapy in only 35 to 40 percent of patients (Rosenwald et al. 2002). In general, types of this disease are very diverse and their biological properties are largely unknown, meaning that this is a relatively difficult cancer to cure and prevent. Rosenwald et al. (2002) proposed that there are three phenotypes subgroups of patients of DLBCL: activated B-like DLBCL, germinal centre (GC)-B like and type III DLBCL. The GC B-like DLBCL is less dangerous than the others in the progression of the tumour; the activated B-like DLBCL is more active than the others and the type III DLBCL is the most dangerous in the progression of tumour (Alizadeh et al. 2000). These groups were defined using microarray experiments and hierarchical clustering. The authors showed that these phenotypes subgroups were differentiated from each other by distinct gene expressions of hundreds of different genes and had different survival time patterns. This dataset contains 219 patients with DLBCL, including 138 patient deaths during follow-up. Patients with missing values for a particular microarray element were excluded from all analyses involving that element.

Based on patterns of gene expression in biopsy specimens of the lymphoma, Rosenwald et al. (2002) analysed this dataset to predict the likelihood of patients’ survival after chemotherapy for DLBCL. By using a Cox proportional-hazards model, Rosenwald et al. (2002) identified five individual gene expressions which correlated with the survival after chemotherapy. These gene expressions are germinal center B-cell (GC-B), lymphoma node, proliferation, BMP6 and MHC. In this study, these five gene expressions are used as covariates for estimating survival times based on the three competing models in Section “Models”.


As discussed in Section “Methods”, to account for model uncertainty, the model averaging technique which combines estimates from different survival models was carried out. This was accomplished through a weighted average of the survival considered in the analysis. First, we calculated the Kaplan-Meier estimates of overall survival according to the gene expression and the relation between the gene expression score and the subgroups phenotype of DLBCL. We confirmed that these phenotypes had different survival time patterns (Figure 1). Following this, we fitted the three models to all gene expression data and to the three phenotype subgroups. We then applied the BMA approach described in Section “Methods”. For each model, we ran the corresponding MCMC algorithm for 100 000 iterations, discarding the first 10 000 iterations as burn-in.

Figure 1
figure 1

Kaplan-Meier estimates of overall survival according to the gene-expression subgroups.

Table 1 shows the estimated posterior mean of the parameters, the 95% credible intervals (CI), the BIC values and the BMA weights for each of the fitted models for the whole dataset. The BMA weights reflect the relative posterior probability of the models. As can be seen from Table 1, for the Weibull model, there are three genes that substantially describe patients’ survival times, namely GC-B (β 1), lymphoma node (β 2) and MHC (β 5). These three genes have a negative effect on the expected survival time. For the mixture model, GC-B (β 1), lymphoma node (β 2) and proliferation (β 3) accounted for patients’ survival times in the first component. In the second component, GC-B (β 1), lymphoma node (β 2) and MHC signature (β 5) substantially explained patients’ survival times. All these genes have negative effects on the expected survival time for their respective component. For the cure model, four of these genes substantially describe patients’ survival times, namely GC-B (β 1), lymphoma node (β 2), BMP6 (β 4) and MHC (β 5) signature. Three of these, namely GC-B (β 1), lymphoma node (β 2) and MHC signature (β 5), have a negative effect on the expected survival time. Under the cure model, approximately 33.8% of the patients are cured of DLBCL (Figure 2).

Table 1 The estimated posterior mean of the parameters, the 95% credible intervals (CI), the BIC values and the BMA weights for each of the fitted models for the full DLBCL dataset
Figure 2
figure 2

Box-plots of the cure rates (posterior distribution of π ) for the full DLBCL dataset, and to each of the three phenotypes (ABC, GCB and Type III).

This is clearly exhibited in Table 1, which shows that the cure model has the largest posterior model probability (or BMA weight). To evaluate the model fit, a comparison of predicted values under the models and of the observed data was carried out.

Table 2 shows the 95% credible intervals (CI), BIC values and the BMA weights for each of the models based on phenotype for the DLBCL dataset. In general, for all phenotypes, the mixture model is not favourable as its weight is approximately equal to zero and it has the largest BIC value. On the other hand, the BIC values of the other two models are close to each other, suggesting a combination of these two models in order to account for the uncertainty in the prediction of survival.

Table 2 The estimated posterior mean of parameters, the 95% CI, BIC values and the BMA weights for each of the models based on phenotype for the DLBCL dataset

From Tables 1 and 2, we can see that the Weibull model is better than a two-component Weibull mixture model.

As can be seen in Figure 3, in the full DLBCL dataset, the predicted curve for the cure model is quite close to the observed data, suggesting a good fit of the data. Specifically, in this model, 94.3% of observed survival times in the dataset fall in the corresponding 95% posterior prediction intervals. As expected, this is quite similar to the result obtained from model averaging (91.9%) (Table 3).

Figure 3
figure 3

The posterior densities of the three models and the model averaged density for the full DLBCL dataset and each of the three phenotypes. For comparison, the observed data is also represented as a histogram.

Table 3 The percentage of observed values that lay in the corresponding 95% posterior prediction interval for the individual models and BMA model based on the full DLBCL dataset and each of the three phenotypes

Furthermore, in the GCB phenotype, the genes corresponding to the BMP6 (β 4) and MHC signature (β 5) in the Weibull model and MHC signature (β 5) in the cure model substantially affect patients’ survival time. In the ABC phenotype, in the Weibull model, with the exception of proliferation (β 3), all genes were involved substantially in the description of patients’ survival and lymphoma node (β 2), BMP6 (β 4) and MHC signature (β 5) are potentially important prognostic factors for predicting survival in the cure model. For the type III phenotype, the GC-B gene (β 1) in both models and only the BMP6 gene (β 4) in the cure model are substantial in explaining the survival times of the patients.

Under the cure model, in the GCB phenotype, approximately 33.2% of the patients are estimated to be cured of DLBCL. In the ABC and type III phenotypes, the respective cure rates are approximately 26.6% and 18.7% (Figure 2).

The results of the posterior densities prediction for the individual models and the model averaged prediction based on these three phenotypes are presented in Figure 3. In comparison to other models, the mixture model fitted the data poorly for each phenotype. In detail, using model averaging, for the GCB phenotype, 94% of the observed survival times in the dataset lie in the respective 95% prediction intervals. For the other two phenotypes, namely the ABC and the type III, 93.4% and 92.8% of the observed survival times in the dataset are in the corresponding 95% prediction intervals, respectively (Table 3).


This study has adopted a Bayesian model averaging approach to account for model uncertainty in the prediction of survival. The case study that we considered involved lymphoma cancer survival, with covariates given by phenotypes and gene expressions. Here, we proposed three competing models and used BMA to combine these models to account for model uncertainty.

Overall, the results of this study indicate that if using the full dataset without further grouping, selecting a single model that best fits the data was adequate. The reason is that there is clear support for one model (i.e. only one model has a relatively larger BIC value and dominates based on this criterion). However, the results were different when the model selection process took into account the phenotype subgroups of the patients. A single model appeared to be inadequate. This was due to the fact that the values of BIC for the Weibull and the cure had nearly equal weight, indicating the absence of a dominant model based on this criterion and the presence of uncertainty issues in the model selection. As suggested and shown in this study, BMA was used to address this problem. The applicability of BMA was also associated with the smaller sample size in each phenotype subgroup (Annest et al. 2009; Volinsky et al. 1997; Yeung et al. 2005).

This study also revealed that in each phenotype, the expression and number of predictor genes substantially describing the survival times of the patients varied across models. Overall, in both of the favourable models, none of the genes were identified consistently as substantial predictors for the patients’ survival. For example, in the Weibull model, the MHC and BMP genes in the GCB and ABC phenotypes and the GCB genes in the ABC and Type III phenotypes were important predictors of survival. In contrast, in the cure model, BMP was substantially associated with predicted survival in the ABC and Type III phenotypes. For both models, only three genes i.e. lymphoma node, BMP6 and MHC signature in the ABC phenotype were highly associated with the survival times of the patients.

This study has indicated that the application of BMA to combine competing models overcomes the problem of model uncertainty. Comparison of different survival models has allowed the identification and analysis of more detailed relationships between gene expressions in given phenotypes and the survival times of the patients. An advantage of BMA is more accurate and precise prediction of patient survival. However, this study only involved three candidate models. More models can be obviously included in the analysis. This study has also focused on the marginal likelihood p(DQ s ) estimation methods based on the Laplace approximation. However, other approaches are also possible. Indeed marginal likelihood estimation is possible using nested sampling (Skilling 2006), where the marginal likelihood is viewed as the expectation, with respect to the prior, of the likelihood. Another generic approach is Chib’s method (Chib 1995), which can be applied to output from the Gibbs sampler. Applying BMA to other datasets or other applications is desired to obtain robust predictions.


  • Akaike H: Information theory and an extension of the maximum likelihood principle. In Second Int Symp Inform Theory 1 Edited by: Petrov BN, Csaki F. 1973, 267-281.

    Google Scholar 

  • Alizadeh AA, Eisen MB, Davis RE: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403: 503-511. 10.1038/35000501

    Article  Google Scholar 

  • Annest A, Bumgarner RE, Raftery AE, Yeung KE: Iterative Bayesian model averaging: a method for the application of survival analysis to high-dimensional microarray data. BMC Bioinformatics 2009, 10(72):1471-2105.

    Google Scholar 

  • Bhadra A, Mallick BK: Joint high dimensional Bayesian variable and covariance selection with an application to eQTL analysis. Biometrics 2013, 69: 447-457. 10.1111/biom.12021

    Article  Google Scholar 

  • Basu S, Tiwari RC: Breast cancer survival, competing risk and mixture cure model: a Bayesian analysis. J R Stat Soc 2010, 173(2):307-329. 10.1111/j.1467-985X.2009.00618.x

    Article  Google Scholar 

  • Bonato V, Baladandayuthapani V, Broom BM, Sulman EP, Aldape KD, Do KA: Bayesian Ensemble methods for survival prediction in gene expression data. Bioinformatics 2011, 27(3):359-367. 10.1093/bioinformatics/btq660

    Article  Google Scholar 

  • Buckland S, Burnham K, Augustin N: Model selection: an integral part of inference. Biometrics 1997, 53: 603-618. 10.2307/2533961

    Article  Google Scholar 

  • Burnham K, Anderson D: Model selection and multimodel inference: a practical information-theoretic approach. Springer, New-York; 2002.

    Google Scholar 

  • Burnham K, Anderson D: Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res 2004, 33: 261-304. 10.1177/0049124104268644

    Article  Google Scholar 

  • Chen MH, Ibrahim JG, Sinha D: A new Bayesian model for survival data with a surviving fraction. J Am Stat Assoc 1999, 94: 909-919. 10.1080/01621459.1999.10474196

    Article  Google Scholar 

  • Chib S: Marginal likelihood from the Gibbs output. J Am Stat Assoc 1995, 90: 1313-1321. 10.1080/01621459.1995.10476635

    Article  Google Scholar 

  • Claeskens G, Hjort NL: Model selection and model averaging. Cambridge University Press, New York; 2008.

    Book  Google Scholar 

  • Clyde M: Model uncertainty and health effect studies for particulate matter. Environmetrics 2000, 11: 745-763. 10.1002/1099-095X(200011/12)11:6<745::AID-ENV431>3.0.CO;2-N

    Article  Google Scholar 

  • Collet D: Modelling survival data in medical research,. Chapman and Hall, Florida; 1994.

    Book  Google Scholar 

  • Diebolt J, Robert CP: Estimation of finite mixture distributions through Bayesian sampling. J R Stat Soc Ser B 1994, 56: 363-375.

    Google Scholar 

  • Dodson B: Weibull Analysis. American Society Quality, Milwaukee; 1994.

    Google Scholar 

  • Draper D: Assessment and propagation of model uncertainty. J R Stat Soc Series B 1995, 57: 45-97.

    Google Scholar 

  • Farmomeni A, Nardi A: A two-component Weibull mixture to model early and late mortality in a Bayesian framework. Comput Stat Data Anal 2010, 54: 416-428. 10.1016/j.csda.2009.09.007

    Article  Google Scholar 

  • Gilks WR, Richardson S, Spiegelhalter DJ: Markov chain monte carlo in practice. Chapman and Hall, Florida; 1996.

    Google Scholar 

  • Hjort NL, Claeskens G: Frequentist model average estimators (with discussion). J Am Stat Assoc 2003, 98: 879-899. 10.1198/016214503000000828

    Article  Google Scholar 

  • Hoeting J, Madigan D, Raftery AE, Volinsky C: Bayesian model averaging: a tutorial. Stat Sci 1999, 14: 382-417. 10.1214/ss/1009212519

    Article  Google Scholar 

  • Ibrahim JG, Chen MH, Sinha D: Bayesian survival analysis. Springer, New York; 2001.

    Book  Google Scholar 

  • Jackson CH, Thompson SG, Sharples LD: Accounting for uncertainty in health economic decision models by using model averaging. J R Stat Soc Ser A 2009, 172(2):383-404. 10.1111/j.1467-985X.2008.00573.x

    Article  Google Scholar 

  • Kleinbaum DG, Klein M: Survival analysis: a self-learning text. Springer, New York; 2005.

    Google Scholar 

  • Kundu D: Bayesian inference and life testing plan for the Weibull distribution in presence of pregressive censoring. Technometrics 2008, 50(2):144-154. 10.1198/004017008000000217

    Article  Google Scholar 

  • Lenz G, Wright GW, Emre NT, Kohlhammer H, Dave SS, Davis RE, Carty S, Lam LT, Shaffer AL, Xiao W, Powell J, Rosenwald A, Ott G, Muller-Hermelink HK, Gascoyne RD, Connors JM, Campo E, Jaffe ES, Delabiei J, Smeland EB, Rimsza LM, Fisher RI, Weisenburger DD, Chan WC, Staudt LM: Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways. Proc Natl Acad Sci USA 2008, 105(36):13520-13525. 10.1073/pnas.0804295105

    Article  Google Scholar 

  • Leung KM, Elashoff RM, Afifi AA: Censoring issues in survival analysis. Ann Rev Public Health 1997, 18: 83-104. 10.1146/annurev.publhealth.18.1.83

    Article  Google Scholar 

  • Lunn DJ, Thomas A, Best N, Spiegelhalter D: WinBUGS – A Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput 2000, 10: 325-337. 10.1023/A:1008929526011

    Article  Google Scholar 

  • Madigan D, Raftery AE: Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 1994, 89: 1535-1546. 10.1080/01621459.1994.10476894

    Article  Google Scholar 

  • Marin JM, Bernal MR, Wiper MP: Using Weibull mixture distributions to model heterogeneous survival. Commun Stat Simul Comput 2005a, 34(3):673-684. 10.1081/SAC-200068372

    Article  Google Scholar 

  • Marin JM, Mengersen K, Robert CP: Bayesian modelling and inference on mixtures of distributions. In Handbook of statistics. Edited by: Dey D, Rao CR. Elsevier-Sciences, Amsterdam; 2005b.

    Google Scholar 

  • McGrory CA, Titterington DM: Variational approximations in Bayesian model selection for finite mixture distributions. Comput Stat Data Anal 2007, 51(11):5352-5367. 10.1016/j.csda.2006.07.020

    Article  Google Scholar 

  • Ntzoufras I: Bayesian modelling using WinBUGS. Wiley, New Jersey; 2009.

    Book  Google Scholar 

  • Pramana S, Shkedy Z, Göehlmann HW, Talloen W, Bondt AD, Straetemans R, Lin D, Pinheiro J: Model-based parametric approaches. In Modeling dose-response microarray data in early drug development experiments using R,. Edited by: Lin D, Shkedy Z, Yekutieli D, Amaratunga D, Bijnens L. Springer, New York; 2012:231-249.

    Google Scholar 

  • Raftery AE: Bayesian model selection in social research (with discussion). Soc Methodol 1995, 25: 111-163.

    Article  Google Scholar 

  • Raftery AE: Approximate Bayes factors and accounting for model uncertainty in generalized linear models. Biometrika 1996, 83: 251-266. 10.1093/biomet/83.2.251

    Article  Google Scholar 

  • Raftery AE, Madigan D, Hoeting JA: Bayesian model averaging for linear regression models. J Am Stat Assoc 1997, 92(437):179-191. 10.1080/01621459.1997.10473615

    Article  Google Scholar 

  • Robert CP, Casella G: Monte Carlo statistical methods. Springer, New York; 2000.

    Google Scholar 

  • Rosenwald A, Wright G, Wiestner A, Chan WC, Connors JM, Campo E, Gascoyne RD, Grogan TM, Muller HK, Smeland EB, Chiorazzi M, Giltnane JM, Hurt EM, Zhao H, Averett L, Henrickson S, Yang L, Powell J, Wilson WH, Jaffe ES, Simon R, Klausner RD, Montserrat E, Bosch F, Greiner TC, Weisenburger DD, Sanger WG, Dave BJ, Lynch JC, Vose J, et al.: The use Of molecular profiling to predict survival after chemotherapy for diffuse large B-cell lymphoma. N Engl J Med 2002, 346(25):1937-1947. 10.1056/NEJMoa012914

    Article  Google Scholar 

  • Schwarz GE: Estimating the dimension of a model. Ann Stat 1978, 6(2):461-464. 10.1214/aos/1176344136

    Article  Google Scholar 

  • Skilling J: Nested sampling for general Bayesian computation. Bayesian Anal 2006, 1: 833-860. 10.1214/06-BA127

    Article  Google Scholar 

  • Spiegelhalter D, Best N, Carlin B, VanderLinde A: Bayesian measures of model complexity and fit. J R Stat Soc Ser B 2002, 64(4):583-639. 10.1111/1467-9868.00353

    Article  Google Scholar 

  • Stephens M: Bayesian methods for mixtures of normal distributions. PhD thesis, The University of Oxford; 1997.

    Google Scholar 

  • Stajduhar I, Basic BD, Bogunovic N: Impact of censoring on learning Bayesian networks in survival modelling. Artif Intell Med 2009, 47: 199-217. 10.1016/j.artmed.2009.08.001

    Article  Google Scholar 

  • Thamrin SA, McGree JM, Mengersen KL: Bayesian Weibull Survival Model for gene expression data. In Case studies in Bayesian statistical modelling and analysis,. Edited by: Alston CL, Mengersen KL, Pettitt AN. Wiley, UK; 2013:171-185.

    Google Scholar 

  • Volinsky CT, Raftery AE: Bayesian information criterion for censored survival models. Biometrics 2000, 56: 256-262. 10.1111/j.0006-341X.2000.00256.x

    Article  Google Scholar 

  • Volinsky C, Madigan D, Raftery AE, Kronmal R: Bayesian model averaging in proportional hazard models: assessing the risk of a stroke. Appl Stat 1997, 46(4):443-448.

    Google Scholar 

  • Yakovlev AY, Tsodikov AD: Stochastic models of tumor latency and their biostatistical applications. World Scientific, Singapore; 1996.

    Google Scholar 

  • Yeung KE, Bumgarner R, Raftery AE: Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray. Bioinformatics 2005, 21: 2394-2402. 10.1093/bioinformatics/bti319

    Article  Google Scholar 

  • Yuan Y, Yin G: Robust EM continual reassessment method in oncology dose finding. J Am Stat Assoc 2011, 108(495):818-831.

    Article  Google Scholar 

Download references


The first author acknowledges the Ministry of National Education Indonesia (DIKTI) for providing funding for this research as well as Dr. Christopher Drovandi from Queensland University of Technology for his generous feedback.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Sri Astuti Thamrin or James M. McGree.

Additional information

Competing interests

The authors declare that they have no conflict of interests.

Authors’ contributions

SAT participated in the design of the study, carried out the statistical analysis and drafted the manuscript. KM and JM supervised the design, analysis and write-up of the study. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Thamrin, S.A., McGree, J.M. & Mengersen, K.L. Modelling survival data to account for model uncertainty: a single model or model averaging?. SpringerPlus 2, 665 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: