Bayesian analysis of generalized log-Burr family with R

Article metrics

• 1724 Accesses

• 1 Citations

Abstract

Log-Burr distribution is a generalization of logistic and extreme value distributions, which are important reliability models. In this paper, Bayesian approach is used to model reliability data for log-Burr model using analytic and simulation tools. Laplace approximation is implemented for approximating posterior densities of the parameters. Moreover, parallel simulation tools are also implemented using ‘LaplacesDemon’ package of R.

Introduction

The log-Burr distribution is a generalization of two important reliability models, that is, logistic distribution and extreme value distribution. The non-Bayesian analysis of generalized log-Burr distribution is a very difficult task, whereas it can be a routine analysis when dealing in a Bayesian paradigm. In this paper, an attempt has been made with the following objectives:

• To define a Bayesian model, that is, specification of likelihood and prior distribution.

• To write down the R code for approximating posterior densities with Laplace approximation and simulation tools (R Core Team, 2013).

• To illustrate numeric as well as graphic summaries of the posterior densities.

The log location-scale model

The probability density function of a parametric location-scale model for a random variable y on (−,) with location parameter μ (−<μ<) and scale parameter σ (>0) is given by

$f ( y ; μ , σ ) = 1 σ f 0 y − μ σ − ∞ < y < ∞$
(1)

The corresponding distribution and reliability function for y are

$F 0 ( y ; μ , σ ) = ∫ − ∞ y f 0 ( t ) dt R 0 ( y ; μ , σ ) = 1 − ∫ − ∞ y f 0 ( t ) dt = 1 − F 0 ( y ; μ , σ )$

The standardized random variable z=(yμ)/σ clearly has density and reliability functions f0(z) and R0(z) respectively, and Equation (1) with μ=0 and σ=1 is called the standard form of the distribution.

The lifetime distribution, that is, exponential, Weibull, all have the property that y=l o g t has a location scale distribution: the Weibull, log-normal, and log-logistic distribution for t correspond to extreme value, normal, and logistic distributions for y. The reliability functions for z=(yμ)/σ on (−,) are respectively,

$R 0 ( z ) = exp ( e z ) extreme value R 0 ( z ) = 1 − Φ ( z ) normal R 0 ( z ) = ( 1 + e z ) − 1 logistic$

Similarly, any location-scale model (Equation (1)) gives a lifetime distribution by the transformation t=exp(y). In this case the reliability function can be expressed as

$R 0 ( t ; μ , σ ) = R 0 log t − μ σ = R o ′ t α β ,$

where α=exp(μ), β=1/σ and $R 0 ′ = R 0 (log(x))$ is a reliability function defined on (0,) (e.g., Lawless 2003).

The log-Burr distribution can be obtained by generalizing a parametric location-scale family of distribution given by Equation (1), to let pdf, cdf, or reliability function include one or more parameters. This distribution is much useful because they include common two parameter lifetime distributions as special cases.

The generalized log-Burr family

The generalized log-Burr family, for which the standardized variable z=(yμ)/σ has the probability density function of the form

$f 0 ( z ; k ) = e z ( 1 + e z / k ) − k − 1 − ∞ < z < ∞$

and the corresponding reliability function

$R 0 ( z ; k ) = ( 1 + e z / k ) − k − ∞ < z < ∞$

where k (>0) is a shape parameter. The special case, k=1 gives the logistic distribution and k gives the extreme value distribution. Since the generalized log-Burr family includes log-logistic and Weibull distributions, it allows discrimination between them. It is also a flexible model for fitting the lifetime data (e.g., Lawless 2003). Figure 1 shows probability density functions for log-Burr distributions with different values of k.

The half-Cauchy prior distribution

The probability density function of half-Cauchy distribution with scale parameter α is given by

$f ( x ) = 2 α π ( x 2 + α 2 ) , x > 0 , α > 0 .$

The mean and variance of the half-Cauchy distribution do not exist, but its mode is equal to 0. The half-Cauchy distribution with scale α=25 is a recommended, default, noninformative prior distribution for a scale parameter. At this scale α=25, the density of half-Cauchy is nearly flat but not completely (see Figure 2), prior distributions that are not completely flat provide enough information for the numerical approximation algorithm to continue to explore the target density, the posterior distribution. The inverse-gamma is often used as a noninformative prior distribution for scale parameter, however, this model creates problem for scale parameters near zero, Gelman and Hill (2007) recommend that, the uniform, or if more information is necessary the half-Cauchy is a better choice. Thus, in this paper, the half-Cauchy distribution with scale parameter α=25 is used as a noninformative prior distribution.

The Laplace approximation

Many simple Bayesian analyses based on noninformative prior distribution give similar results to standard non-Bayesian approaches, for example, the posterior t-interval for the normal mean with unknown variance. The extent to which a noninformative prior distribution can be justified as an objective assumption depends on the amount of information available in the data; in the simple cases as the sample size n increases, the influence of the prior distribution on posterior inference decreases. These ideas, sometime referred to as asymptotic approximation theory because they refer to properties that hold in the limit as n becomes large. Thus, a remarkable method of asymptotic approximation is the Laplace approximation which accurately approximates the unimodal posterior moments and marginal posterior densities in many cases. In this section we introduce a brief, informal description of Laplace approximation method.

Suppose −h(θ) is a smooth, bounded unimodal function, with a maximum at $θ ̂$, and θ is a scalar. By Laplace’s method (e.g., Tierney and Kadane 1986), the integral

$I=∫f(θ)exp[−nh(θ)]dθ$

can be approximated by

$Î=f( θ ̂ ) 2 π n σexp[−nh( θ ̂ )],$

where

$σ= ∂ 2 h ∂ θ 2 | θ ̂ − 1 / 2 .$

As presented in Mosteller and Wallace (1964), Laplace’s method is to expand about $θ ̂$ to obtain:

$I≈∫f( θ ̂ )exp − n h ( θ ̂ ) + ( θ − θ ̂ ) h ′ ( θ ̂ ) + ( θ − θ ̂ ) 2 2 h ′′ ( θ ̂ ) dθ.$

Recalling that $h ′ ( θ ) ̂ =0$, we have

$I ≈ ∫ f ( θ ̂ ) exp − n h ( θ ̂ ) + ( θ − θ ̂ ) 2 2 h ′′ ( θ ̂ ) dθ = f ( θ ̂ ) exp [ − nh ( θ ̂ ) ] ∫ exp − n ( θ − θ ̂ ) 2 2 σ 2 dθ = f ( θ ̂ ) 2 π n σ exp [ − nh ( θ ̂ ) ] .$

Intuitively, if exp[ −n h(θ)] is very peaked about $θ ̂$, then the integral can be well approximated by the behavior of the integrand near $θ ̂$. More formally, it can be shown that

$I=Î 1 + O 1 n .$

To calculate moments of posterior distributions, we need to evaluate expressions such as:

$E[g(θ)]= ∫ g ( θ ) exp [ − nh ( θ ) ] dθ ∫ exp [ − nh ( θ ) ] dθ ,$
(2)

where exp[ −n h(θ)]=L(θ|y)p(θ) (e.g., Tanner 1996).

Fitting of intercept model

Fitting with LaplaceApproxomation

The Laplace approximation is a family of asymptotic techniques used to approximate integrals (Statisticat LLC 2013). It seems to accurately approximate uni-modal posterior moments and marginal posterior densities in many cases. Here, for fitting of linear regression model we use the function LaplaceApproximation which is an implementation of Laplace’s approximations of the integrals involved in the Bayesian analysis of the parameters in the modeling process. This function deterministically maximizes the logarithm of the unnormalized joint posterior density using one of the several optimization techniques. The aim of Laplace approximation is to estimate posterior mode and variance of each parameter. For getting posterior modes of the log-posteriors, a number of optimization algorithms are implemented. This includes Levenberg-Marquardt (LM) algorithm which is default. However, we find that the Limited-Memory BFGS (L-BFGS) is a better alternative in Bayesian scenario. The limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm is a quasi-Newton optimization algorithm that compactly approximates the Hessian matrix. Rather than storing the dense Hessian matrix, L-BFGS stores only a few vectors that represent the approximation. It may be noted that Newton-Raphson is the last choice as it is very sensitive to the starting values, it creates problems when starting values are far from the targets, and calculating and inverting the Hessian matrix can be computationally expensive, although it is also implemented in LaplaceApproximation for the sake of completion. The main arguments of LaplaceApproximation can be seen by using the function args as First argument Model defines the model to be implemented, which contains specification of likelihood and prior. LaplaceApproximation passes two argument to the model function, parm and Data, and receives five arguments from the model function: LP (the logarithm of the unnormalized joined posterior density), Dev (the deviance), Monitor (the monitored variables), yhat (the variables for the posterior predictive checks), and parm, the vector of parameters, which may be constrained in the model function. The argument parm requires a vector of initial values equal in length to the number of parameters, and LaplaceApproximation will attempt to optimize these initial values for the parameters, where the optimized values are the posterior modes. The Data argument requires a listed data which must be include variable names and parameter names. The argument sir=TRUE stands for implementation of sampling importance resampling algorithm, which is a bootstrap procedure to draw independent sample with replacement from the posterior sample with unequal sampling probabilities. Contrary to sir of LearnBayes package, here proposal density is multivariate normal and not t.

Locomotive controls data

Let us introduce a failure times dataset taken from Lawless (2003), so that all the concepts and computations will be discussed around that data. The same data were discussed by Schmee and Nelson (1977). This data set contains the number of thousand miles at which different locomotive controls failed, in a life test involving 96 controls. The test was terminated after 135,000 miles, by which time 37 failures had occurred. The failure times for the 37 failed units are 22.5, 37.5, 46.0, 48.5, 51.5, 53.0, 54.5, 57.5, 66.5, 68.0, 69.5, 76.5, 77.0, 78.5, 80.0, 81.5, 82.0, 83.0, 84.0, 91.5, 93.5, 102.5, 107.0, 108.5, 112.5, 113.5, 116.0, 117.0, 118.5, 119.0, 120.0, 122.5, 123.0, 127.5, 131.0, 132.5, 134.0. In addition, there are 59 censoring times, all equal to 135.0.

Creation of data

The function LaplaceApproximation requires data that is specified in a list. For locomotive controls data the logarithm of failTime will be the response variable. Since intercept is the only term in the model, a vector of 1’s is inserted into designed matrix X. Thus, J = 1 indicates only column of 1’s in the matrix. In this case, there are two parameters beta and log.sigma which must be specified in vector parm.names. The logposterior LP and sigma are included as monitored variables in vector mon.names. The number of observations are specified by N. Censoring is also taken into account, where 0 stands for censored and 1 for uncensored values. Finally all these thing are combined in a listed form as MyData object at the end of the command.

Initial values

The function LaplaceApproximation requires a vector of initial values for the parameters. Each initial value is a starting point for the estimation of a parameter. Here, the first parameter, the beta has been set equal to zero, and the remaining parameter, log.sigma, has been set equal to log(1), which is zero. The order of the elements of the initial values must match the order of the parameters. Thus, define a vector of initial values For initial values the function GIV (which stands for “Generate Initial Values”) may also be used to randomly generate initial values.

Model specification

The function LaplaceApproximation can fit any Bayesian model for which likelihood and prior are specified. However, it is equally useful for maximum likelihood estimation. To use this method one must specify a model. Thus, for fitting of the locomotive controls data, consider that the logarithm of failTime follows log-Burr distribution which is often written as

$y ∽ Log - Burr ( μ , σ ; k ) ,$

and expectation vector μ is equal to the inner product of design matrix X and parameter β

$μ = Xβ.$

Prior probabilities are specified for regression coefficient β and scale parameter σ

$β j ∽ N ( 0 , 1000 ) , j = 1 , … , J σ ∽ HC ( 25 ) .$

The large variance and small precision indicate a lot of uncertainty of each β, and is hence a weakly informative prior distribution. Similarly, half-Cauchy is a weakly informative prior for σ. The Model function contains two arguments, that is, parm and Data, where parm is for the set of parameters, and Data is the list of data. There are two parameters beta and sigma having priors beta.prior and sigma.prior, respectively. The object LL stands for loglikelihood and LP stands for logposterior. The function Model returns the object Modelout, which contains five objects in listed form that includes logposterior LP, deviance Dev, monitoring parameters Monitor, fitted values yhat and estimates of parameters parm.

Model fitting

To fit the above specified model, the function LaplaceApproximation is used and its results are assigned to object Fit. Its summary of results are printed by the function print, which prints detailed summary of results and it is not possible to show here. However, its relevant parts are summarized in the next section. Summarizing output

The function LaplaceApproximation approximates the posterior density of the fitted model, and posterior summaries can be seen in the following tables. Table 1 represents the analytic result using Laplace approximation method while Table 2 represents the simulated results using sampling importance resampling algorithm. From these posterior summaries, it is obvious that, the posterior mode of intercept parameter β0 for logistic distribution is 5.08±0.09 whereas posterior mode of log(σ)b is −0.96±0.15, while for Weibull distribution the posterior mode of intercept parameter β0 is 5.21±0.09 whereas posterior mode of log(σ) is −0.85±0.15. Both the parameters of different distributions are statistically significant also. In a practical data analysis, intercept model is discussed merely as a beginning point. More meaningful model is simple regression model or multiple regression model, which will be discussed in Section ‘Fitting of regression model’. Simulation tools are being discussed in the next section.

Fitting with LaplacesDemon

Now we have to analyze the same data with the function LaplacesDemon, which is the main function of Laplace’s Demon. Given data, a model specification, and initial values, LaplacesDemon maximizes the logarithm of the unnormalized joint posterior density with Markov chain Monte Carlo (MCMC) algorithms, also called samplers, and provides samples of the marginal posterior distributions, deviance and other monitored variables. Laplace’s Demon offers a large number of MCMC algorithms for numerical approximation. Popular families include Gibbs sampling, Metropolis-Hasting (MH), Random-Walk-Metropolis (RWM), slice sampling, Metropolis-within Gibbs (MWG), Adaptive-Metropolis-within-Gibbs (AMWG), and many others. However, details of MCMC algorithms are best explored online athttp://www.bayesian-inference.com/mcmc, as well as in the “LaplacesDemon Tutorial" vignette. The main arguments of the LaplacesDemon can be seen by using the function args as: The arguments Model and Data specify the model to be implemented and list of data, which are need not to define here for the function LaplacesDemon as they are already defined for LaplaceApproximation. Initial.Values requires a vector of initial values equal in length to the number of parameter. The argument Covar= NULL indicates that variance vector or covariance matrix has not been specified, so the algorithm will begin with its own estimates. Next two arguments Iterations= 100000 and Status= 1000 indicates that the LaplacesDemon function will update 10000 times before completion and status is reported after every 1000 iterations. The thinning argument accepts integers between 1 and number of iterations, and indicates that every 100th iteration will be retained, while the others are discarded. Thinning is performed to reduced autocorrelation and the number of marginal posterior samples. Further, the Algorithm requires abbreviated name of the MCMC algorithm in quotes. In this case RWM is short for the Random-Walk-Metropolis. Finally, Specs= Null is default argument, and accepts a list of specifications for the MCMC algorithm declared in the Algorithm argument.

Initial values

Laplace’s Demon requires a vector of initial values for the parameters. Each initial value will be the starting point for an adaptive chain, or a non-adaptive Markov chain of a parameter. If all initial values are set to zero, then Laplace’s Demon will attempt to optimize the initial values with the LaplaceApproximation function using a resilent backpropagation algorithm. So, it is better to use the last fitted object Fit with the function as.initial.values to get a vector of initial values from the LaplaceApproximation for fitting of LaplacesDemon. Thus, to obtain a vector of initial values the function as.initial.values is used as Model fitting

Laplace’s Demon is stochastic, or involves pseudo-random numbers, its better to set a seed with set.seed function for pseudo-random number generation before fitting with LaplacesDemon, so results can be reproduced. Now, fit the prespecified model with the function LaplacesDemon, and its results are assigned to the object name FitDemon. Its summary of results are printed with the function print, and its relevant parts are summarized in the next section. Summarizing output

The LaplacesDemon simulates the data from the posterior density with Random-Walk Metropolis and approximate the results which can be seen in the in the following tables. Table 3 represents the simulated results in a matrix form that summarizes the marginal posterior distributions of the parameters over all samples which contains mean, standard deviation, MCSE (Monte Carlo Standard Error), ESS (Effective Sample Size), and finally 2.5%, 50%, 97.5% quantiles, and Table 4 summarizes the simulated results due to stationary samples. The complete picture of the results can also be seen in Figure 3.

Fitting of regression model

Fitting with LaplaceApproxomation

Electrical insulating fluid failure times data

Let us introduce a failure times data set of electrical insulating fluid for fitting of regression model, which is taken from Lawless (2003). The same data set is discussed in Nelson (1972). Nelson (1972) described the results of a life test experiment in which specimen of a type of electrical insulating fluid were subjected to a constant voltage stress. The length of time until each specimen failed, or “broke down” was observed. The data give results for seven groups of specimen, tested as voltage ranging from 26 to 38 kilovolts (kV).

Data creation

For fitting of failure times of electrical insulating fluid data with LaplaceApproximation, the logarithm of breakdownTime will be the response variable and voltageLevel will be the regressor variable. Since an intercept term will be included, a vector of 1’s is inserted into the design matrix X. Thus, J=2 indicates that, there are two columns of independent variables, first column for intercept term and second column for regressor, in the design matrix. In this case of electrical insulating fluid data, all the three parameters including log.sigma are specified in a vector parm.names. The logposterior LP and sigma are included as monitored variables in vector mon.names. Total number of observations is specified by N, which is 76. Censoring is not included here. Thus, all these things are combined with object name MyData which returns the data in a list.

Initial values

The initial value is taken as a starting point for the estimation of a parameter. So the first two parameters, the beta parameters have been set equal to zero, and log.sigma has been set equal to log(1), which is zero. Model specification

For fitting of the regression model with LaplaceApproximation, must specify a model. Thus, for failure times of electrical insulating fluid data, consider that logarithm of breakdownTime follows log-Burr distribution. In this Bayesian linear regression with an intercept and one independent variable the model is specified as

$y ∽ Log - Burr ( μ , σ ; k ) ,$

and expectation vector μ is an additive, linear function of vector of regression parameters, β, and the design matrix X.

$μ = β 0 + β 1 X = Xβ.$

Prior probabilities are specified respectively for regression coefficients, β, and scale parameter, σ,

$β j ∽ N ( 0 , 1000 ) , j = 1 , … , J σ ∽ HC ( 25 ) .$

It is obvious that all prior densities defined above are weakly informative. Thus, to specify above defined model, one must create a function called Model as: Model fitting

Now, fit the above specified model using the LaplaceApproximation by assigning the object name Fit, and its results are summarized in the next section. Summarizing output

The relevant summary of results of the fitted regression model using the function LaplaceApproximation, can easily be seen in these two tables. Table 5 represents the analytic result using Laplace approximation method, and Table 6 represents the simulated results using Sampling Importance Resampling method.

Fitting with LaplacesDemon

In this section, the function LaplcesDemon is used to analyze the same data, that is, electrical insulating fluid failure times data. This function maximizes the logarithm of un-normalized joint posterior density with MCMC algorithms, and provides samples of the marginal posterior distributions, deviance and other monitored variables.

Model fitting

For fitting the same model with the function LaplacesDemon by assigning the object name FitDemon, the R codes are as follows. Its summary of results are printed with the function print. Summarizing output

The function LaplacesDemon for this regression model, simulates the data from the posterior density with Random-Walk-Metropolis algorithm, and summaries of results are reported in the following tables. Table 7 represents the posterior summary of all samples, and Table 8 represents the posterior summary of stationary samples. The graphical summaries of the results can also be seen in Figure 4.

Discussion and conclusions

In this article, Bayesian approach is applied to model the real reliability data. The generalized log-Burr distribution is used as a Bayesian model to fit the data, and for the analysis. Two important techniques, that is, asymptotic approximation and simulation method are implemented using the functions of ‘LaplacesDemon’ package of R. This package facilitates high-dimensional Bayesian inference, posing as its own intellect that is capable of impressive analysis, which is written entirely in R environment and has a remarkable provision for user defined probability model. The main body of the manuscript contains the complete description of R code both for intercept and regression models of log-Burr distribution. The function LaplaceApproximation approximates the results asymptotically and simulation is made by the function LaplacesDemon. Results of these two methods are very close to each other for different values of shape parameter k of log-Burr distribution. The excellency of these approximations seem clear in the plots of posterior densities. It is evident from the summaries of results that the Bayesian approach based on weakly informative priors is simpler to implement than the classical approach. The wealth of information provided in these numeric and graphic summaries are not possible in classical framework (e.g., Lawless 2003). Thus, it is very difficult to analyze these types of data by classical method, whereas it is quite simple in Bayesian paradigm using tools like R.

References

1. Gelman A, Hill J: Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press; 2007.

2. Lawless JF: Statistical models and methods for lifetime data. New York: John Wiley & Sons; 2003.

3. Mosteller F, Wallace DL: Inference and disputed authorship: the federalist. Reading: Addition-Wesley; 1964.

4. Nelson WB: Graphical analysis of accelerated life test data with the inverse power law model. IEEE Trans Reliab 1972, R-21: 2-11.

5. R Core Team: R: A Language and Environment for Statistical Computing. 2013.http://www.R-project.org/ R Foundation for Statistical Computing, Vienna, Austria. URL .

6. Schmee J, Nelson W: Estimates and approximate confidence limits for (log) normal life distributions from singly censored samples by maximum likelihood. : ; 1977. General Electric C. R. & D. TIS Report 76CRD250. Schenectady, New York

7. Statisticat LLC: LaplacesDemon: complete environment for Bayesian inference. 2013.http://www.bayesian-inference.com/software R package version 13.03.04, .

8. Tanner MA: Tools for statistical inference. New York: Springer-Verlag; 1996.

9. Tierney L, Kadane JB: Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 1986, 81: 82-86. 10.1080/01621459.1986.10478240

Acknowledgements

The first author would like to thank University Grants Commission (UGC), New Delhi, for financial assistance.

Author information

Correspondence to Md Tanwir Akhtar.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

The first author Md Tanwir Akhtar carried out “Bayesian computations and simulations”. Both authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Rights and permissions

Reprints and Permissions 