Log-logistic distribution for survival data analysis using MCMC
- Ali A. Al-Shomrani^{1},
- A. I. Shawky^{1},
- Osama H. Arif^{1} and
- Muhammad Aslam^{1}Email author
Received: 18 April 2016
Accepted: 5 October 2016
Published: 12 October 2016
Abstract
This paper focuses on the application of Markov Chain Monte Carlo (MCMC) technique for estimating the parameters of log-logistic (LL) distribution which is dependent on a complete sample. To find Bayesian estimates for the parameters of the LL model OpenBUGS—established software for Bayesian analysis based on MCMC technique, is employed. It is presumed that samples for independent non informative set of priors for estimating LL parameters are drawn from posterior density function. A proposed module was developed and incorporated in OpenBUGS to estimate the Bayes estimators of the LL distribution. It is shown that statistically consistent parameter estimates and their respective credible intervals can be constructed through the use of OpenBUGS. Finally comparison of maximum likelihood estimate and Bayes estimates is carried out using three plots. Additively through this research it is established that computationally MCMC technique can be effortlessly put into practice. Elaborate procedure for applying MCMC, to estimate parameters of LL model, is demonstrated by making use of real survival data relating to bladder cancer patients.
Keywords
Log-logistic Posterior Non-informative Module OpenBUGS Uniform priorsBackground
The log-logistic (LL) distribution (branded as the Fisk distribution in economics) possesses a rather supple functional form. The LL distribution is among the class of survival time parametric models where the hazard rate initially increases and then decreases and at times can be hump-shaped. The LL distribution can be used as a suitable substitute for Weibull distribution. It is in fact a mixture of Gompertz distribution and Gamma distribution with the value of the mean and the variance coincide—equal to one. The LL distribution as a life testing model has its own standing; it is an increasing failure rate (IFR) model and also is viewed as a weighted exponential distribution.
Scrolling through the literature on the subject distribution we see that Bain (1974) modeled LL distribution by a transformation of a well-known logistic variate. The properties of LL distribution have been deliberated upon by Ragab and Green (1984) who also worked on the order statistics for the said distribution. Kantam et al. (2001) proposed acceptance sampling plan using the LL distribution. Kantam et al. (2006) designed economic acceptance sampling plan using the LL distribution. Kantam and Rao (2002) derived the modified maximum likelihood estimation (MLE) of this distribution. Rosaiah et al. (2007) derived confidence intervals using the LL model-approximation to ML method. The properties, estimation and testing of linear failure rate using exponential and half-logistic distribution has been discussed thoroughly by Rao et al. (2013). Rosaiah et al. (2014) studied the exponential-LL distribution additive failure rate.
The current research intends to use LL distribution for modeling the survival data and to obtain MLE utilizing associated probability intervals of the Bayes estimates. It has been noticed that the Bayesian estimates may not be computed plainly under the assumption of independent uniform priors for the parameters. The authors will work under the assumption that both parameters—shape and scale, of the LL model are unknown.
The authors will develop the algorithm to generate Markov Chain Monte Carlo (MCMC) samples based on the generated posterior samples from the posterior density function using Gibbs sampling technique by employing the OpenBUGS software. Bayesian estimates of parameters along with highest posterior density (HPD) credible intervals will be constructed. Moreover, estimation of the reliability function will also be looked into. Entire statistical computations and functions for LL will be done using R statistical software see Lyu (1996), Srivastava and Kumar (2011a, b, c) and Kumar et al. (2012, 2013). Real life data will be considered, in order to illustrate how the proposed technique can be effortlessly applied in an orderly manner in real life situations.
Remainder of the paper contains six sections: “Model analysis”, “Maximum likelihood estimation, (MLE) and information matrix”, “Model validation”, “Bayesian estimation using Markov Chain Monte Carlo (MCMC) method”; “Comparison of MLE estimates and Bayes estimates” and “Conclusion”.
Model analysis
Probability density function (pdf)
Cumulative density function (CDF)
The reliability function
The Hazard function
The cumulative hazard function H(x)
The failure rate average (FRA) and conditional survival function (CSF)
An analysis of FRA (x) on x enables us to find increasing failure rate average (IFRA) and decreasing failure rate average (DFRA).
The quantile function
The random deviate generation functions
Maximum likelihood estimation (MLE) and information matrix
MLEs of the two-parameter LL model plus their large sample properties in order to find approximate confidence intervals based on MLEs are discussed in this section.
Information matrix and asymptotic confidence intervals
Computation of maximum likelihood estimation
In order to have insight into the ML estimation a data has been adapted from Lee and Wang (2003). The sample data consists of 128 patients having bladder cancer and the values shown are the monthly remission times.
0.08, 2.09, 3.48, 4.87, 6.94, 8.66, 13.11, 23.63, 0.20, 2.23, 3.52, 4.98, 6.97, 9.02, 13.29, 0.40, 2.26, 3.57, 5.06, 7.09, 9.22, 13.80, 25.74, 0.50, 2.46, 3.64, 5.09, 7.26, 9.47, 14.24, 25.82, 0.51, 2.54, 3.70, 5.17, 7.28, 9.74, 14.76, 26.31, 0.81, 2.62, 3.82, 5.32, 7.32, 10.06, 14.77, 32.15, 2.64, 3.88, 5.32, 7.39, 10.34, 14.83, 34.26, 0.90, 2.69, 4.18, 5.34, 7.59, 0.66, 15.96, 36.66, 1.05, 2.69, 4.23, 5.41, 7.62, 10.75, 16.62, 43.01, 1.19, 2.75, 4.26, 5.41, 7.63, 17.12, 46.12, 1.26, 2.83, 4.33, 5.49, 7.66, 11.25, 17.14, 79.05, 1.35, 2.87, 5.62, 7.87, 11.64, 17.36, 1.40, 3.02, 4.34, 5.71, 7.93, 11.79, 18.10, 1.46, 4.40, 5.85, 8.26, 11.98, 19.13, 1.76, 3.25, 4.50, 6.25, 8.37, 12.02, 2.02, 3.31, 4.51, 6.54, 8.53, 12.03, 20.28, 2.02, 3.36, 6.76, 12.07, 21.73, 2.07, 3.36, 6.93, 8.65, 12.63, 22.69.
The values calculated for the mean, variance and the coefficient of skewness are \(9.36562, 110.425 \;{\text{and}}\; 3.32567,\) respectively. The measure of skewness indicates that data are positively skewed whereas the coefficient of skewness is the unbiased estimator for the population skewness obtained by \(= \frac{{\sqrt {n\left( {n - 1} \right)} }}{n - 2}\cdot\frac{{\frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{3} }}{{\left( {\frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} } \right)^{3/2} }}.\)
Maximum likelihood estimates, standard errors and 95 % confidence intervals
Parameter | MLE | SE | 95 % confidence interval |
---|---|---|---|
Alpha | 1.725158 | 0.1279366 | (1.474407, 1.975909) |
Lambda | 6.089820 | 0.5384165 | (5.034543, 7.145097) |
Model validation
Srivastava and Kumar (2011c) suggest than in order to assess the goodness of fit of the proposed LL model, it is essential to work out the Kolmogorov–Smirnov (K–S) statistics between the empirical distribution function and the fitted LL model. The authors found the fit to be appropriate since the value of the K–S test i.e. D = 0.03207318 had the sig. value of 0.998 which is far greater than the predetermined level of 0.05. Therefore, it can be confidently asserted that the proposed LL model is appropriate to analyze the data set.
For model validation quantile- quantile (Q–Q) and probability–probability (P–P) plots are most commonly used graphical methods to assess whether the fitted model is in agreement with the given data.
Bayesian estimation using Markov Chain Monte Carlo (MCMC) method
Monte Carlo is repeated pseudo-random sampling generating technique. It makes use of algorithm to generate samples. Markov Chain on the other hand is a random process with a countable state-space with the Markov property. According to Chen et al. (2000) that Markov property means that the future state is dependent only on the present state and not on the past states. The combination of Markov chains and Monte Carlo techniques is commonly referred to as MCMC, see Robert and Casella (2004). Since the advent of the computed friendly software application of MCMC in Bayesian estimation has gained currency for the last one decade or so. Presently, for applied Bayesian inference, researchers usually work on OpenBUGS (Thomas 2010). It is a menu driven and, for existing probability models, contains a modular framework which is capable of being extended, if such a need arises, for constructing and evaluating Bayesian probability models (Lunn et al. 2000).
Since LL model is not a default probability model in OpenBUGS, therefore it warrants an integration of a module for parameter estimation of LL model. The Bayesian analysis of a probability model can be executed only for the default probability models in OpenBUGS. Of late, some probability models are integrated in OpenBUGS in order to ease the Bayesian analysis (Kumar et al. 2010). For more details about the OpenBUGS of some other models, the readers are referred to Kumar et al. (2012) and Srivastava and Kumar (2011a, b, c).
Bayesian analysis under uniform priors
The authors initiated two parallel chains for sufficiently large number of iterations until the convergence is attained. For the current study the convergence was attained at 40,000 with a burn-in of 5000. Finally posterior sample of size 7000 is used by selecting a thinning interval of five i.e. every fifth outcome is stored. Thus, we have the posterior sample {\(\alpha_{1i} ,\lambda_{1i}\) }, i = 1 … 7000 drawn from chain 1 and { \(\alpha_{2i} ,\lambda_{2i}\) }, i = 1 … 7000 from chain 2. Chain 1 is earmarked for testing convergence. Whereas, chain 2 is earmarked for displaying visual summary. Both Chain 1 and Chain 2 shall be utilized for looking into the numerical summary.
Convergence diagnostics
Simulation draws or chains were started at initial values for each parameter of priors. Due to dependency in successive draws, first draws were discarded as a burn-into obtain independent samples. Therefore, we need to be sure that the chains have converged in MCMC analysis in order to make inferences from the posterior distribution. This was checked by several diagnostic analyses as follows.
History (trace) plot
Autocorrelation plot
Visual summary through Kernel density estimates
Numerical summary
Numerical summaries based on MCMC sample of posterior characteristics for LL model under uniform priors
Characteristics | Chain 1 | Chain 2 | ||
---|---|---|---|---|
α | λ | α | λ | |
Mean | 1.728 | 6.160 | 1.728 | 6.159 |
SD | 0.1271 | 0.5420 | 0.1275 | 0.5473 |
Naive SE | 0.0006791 | 0.0028971 | 0.0006817 | 0.0029254 |
Time-series SE | 0.000845 | 0.003621 | 0.0008483 | 0.0036895 |
Minimum | 1.223 | 4.168 | 1.254 | 4.140 |
2.5th percentile (P_{2.5}) | 1.487 | 5.153 | 1.486 | 5.155 |
First quartile (Q_{1}) | 1.641 | 5.787 | 1.640 | 5.781 |
Median | 1.725 | 6.142 | 1.725 | 6.136 |
Third quartile (Q_{3}) | 1.812 | 6.512 | 1.812 | 6.508 |
97.5th percentile (P_{97.5}) | 1.984 | 7.273 | 1.983 | 7.293 |
Maximum | 2.278 | 9.171 | 2.284 | 8.826 |
95 % credible interval | 1.487, 1.984 | 5.153, 7.273 | 1.486, 1.983 | 5.155, 7.293 |
95 % HPD credible interval | 1.479, 1.796 | 5.093, 7.207 | 1.479, 1.976 | 5.142, 7.273 |
Running mean (Ergodic mean) plot
Brooks–Gelman–Rubin plot
Visual summary using box plots
Comparison of MLE estimates and BAYES estimates
Quantile–quantile (Q–Q) plot of empirical versus theoretical quantiles computed using MLEs and Bayes estimates is displayed in Fig. 11. Here also it is witnessed that the green circles depicting MLEs coincide with the red circles depicting Bayes estimation.
Estimated reliability function is displayed in Fig. 12 using Bayesian estimates calculated from uniform priors along with empirical reliability function.
Keeping in view the foregoing visual representations from Figs. 10, 11 and 12 using MLEs and the Bayes estimates based on uniform priors to a great extent coincide and suggests a good fit for the proposed LL model.
Conclusion
Present research discussed the LL model with two parameters; MLEs and Bayesian estimates are obtained from a real life sample using the Markov Chain Monte Carlo (MCMC) technique using OpenBUGS software. Bayesian analysis under different set of priors has been carried out and convergence pattern was studied using different diagnostics procedures. Numerical summary based on MCMC samples from posterior distribution of LL model has been worked out based on non-informative priors. Visual review for different set of priors including box plot, kernel density estimation in comparison with MLEs has been attempted. It is witnessed that the LL model whether used with MLEs or with Bayesian Estimates fits the data well. It has been found that the proposed methodology is suitable for empirical modeling under uniform sets of priors. Although the simulation study is not conducted in the present work. But, the consistency, basic study and comparisons of present estimation and improved parameters estimation by Reath (2016) will be conducted in future work.
Declarations
Authors’ contributions
All authors contributed extensively in the development and completion of this article. All authors read and approved the final manuscript.
Acknowledgements
This article was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah. The authors, therefore, acknowledge with thanks DSR technical and financial support.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Bain LJ (1974) Analysis for the linear failure rate life-testing distribution. Technometrics 16(4):551–559MathSciNetView ArticleMATHGoogle Scholar
- Chen M, Shao Q, Ibrahim JG (2000) Monte Carlo methods in Bayesian computation. Springer, New YorkView ArticleMATHGoogle Scholar
- Kantam RRL, Rao GS (2002) Log-logistic distribution: modified maximum likelihood estimation. Guj Stat Rev 29(1&2):25–36Google Scholar
- Kantam RRL, Rosaiah K, Rao GS (2001) Acceptance sampling based on life test: log-logistic model. J Appl Stat 28(1):121–128MathSciNetView ArticleMATHGoogle Scholar
- Kantam RRL, Rao GS, Sriram B (2006) An economic reliability test plan: log-logistic distribution. J Appl Stat 33(3):291–296MathSciNetView ArticleMATHGoogle Scholar
- Kumar V, Ligges U, Thomas A (2010) ReliaBUGS user manual: a subsystem in OpenBUGS for some statistical models, version 1.0, OpenBUGS 3.2.1. http://openbugs.info/w/Downloads/
- Kumar R, Srivastava AK, Kumar V (2012) Analysis of Gumbel model for software reliability using Bayesian paradigm. Int J Adv Res Artif Intell (IJARAI) 1(9):39–45Google Scholar
- Kumar R, Srivastava AK, Kumar V (2013) Exponentiated Gumbel model for software reliability data analysis using MCMC simulation method. Int J Comput Appl 62(20):24–32Google Scholar
- Lai CD, Xie M (2006) Stochastic ageing and dependence for reliability. Springer, BerlinMATHGoogle Scholar
- Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Wiley, New YorkMATHGoogle Scholar
- Lee ET, Wang JW (2003) Statistical methods for survival data analysis, 3rd edn. Wiley, New YorkView ArticleMATHGoogle Scholar
- Lunn DJ, Andrew A, Best N, Spiegelhalter D (2000) WinBUGS—a Bayesian modeling framework: concepts, structure, and extensibility. Stat Comput 10:325–337View ArticleGoogle Scholar
- Lyu MR (1996) Handbook of software reliability engineering. IEEE Computer Society Press, McGraw-Hill, New YorkGoogle Scholar
- Ragab A, Green J (1984) On order statistics from the log-logistic distribution and their properties. Commun Stat Theory Methods 13(21):2713–2724MathSciNetView ArticleMATHGoogle Scholar
- Rao BS, Nagendram S, Rosaiah K (2013) Exponential—half logistic additive failure rate model. Int J Sci Res 3(5):1–10View ArticleGoogle Scholar
- Rausand M, Hoyland A (2004) System reliability theory: models, statistical methods, and applications, 2nd edn. Wiley, New YorkMATHGoogle Scholar
- Reath J (2016) Improved parameter estimation of the log-logistic distribution with applications. Master’s Report, Michigan Technological UniversityGoogle Scholar
- Robert CP, Casella G (2004) Monte Carlo statistical methods, 2nd edn. Springer, New YorkView ArticleMATHGoogle Scholar
- Rosaiah K, Kantam RRL, Prasad ARS (2007) Confidence intervals in log-logistic model-approximations to ML method. J Indian Stat Assoc 45:109–122Google Scholar
- Rosaiah K, Nagarjuna KM, Kumar DCUS, Rao BS (2014) Exponential—log logistic additive failure rate model. Int J Sci Res Publ 4(3):1–5Google Scholar
- Singh VP, Guo H (1995) Parameter estimation for 2-parameter log-logistic distribution (LLD2) by maximum entropy. Civ Eng Syst 12(4):343–357View ArticleGoogle Scholar
- Srivastava AK, Kumar V (2011a) Analysis of software reliability data using exponential power model. Int J Adv Comput Sci Appl 2(2):38–45Google Scholar
- Srivastava AK, Kumar V (2011b) Software reliability data analysis with Marshall–Olkin extended Weibull model using MCMC method for non-informative set of priors. Int J Comput Appl 18(4):31–39Google Scholar
- Srivastava AK, Kumar V (2011c) Markov Chain Monte Carlo methods for Bayesian inference of the Chen model. Int J Comput Inf Syst 2(2):7–14Google Scholar
- Thomas A (2010) OpenBUGS developer manual, version 3.1.2. http://www.openbugs.info/