Bayesian analysis of generalized log-Burr family with R

Log-Burr distribution is a generalization of logistic and extreme value distributions, which are important reliability models. In this paper, Bayesian approach is used to model reliability data for log-Burr model using analytic and simulation tools. Laplace approximation is implemented for approximating posterior densities of the parameters. Moreover, parallel simulation tools are also implemented using ‘LaplacesDemon’ package of R.


Introduction
The log-Burr distribution is a generalization of two important reliability models, that is, logistic distribution and extreme value distribution. The non-Bayesian analysis of generalized log-Burr distribution is a very difficult task, whereas it can be a routine analysis when dealing in a Bayesian paradigm. In this paper, an attempt has been made with the following objectives: • To define a Bayesian model, that is, specification of likelihood and prior distribution. • To write down the R code for approximating posterior densities with Laplace approximation and simulation tools (R Core Team, 2013). • To illustrate numeric as well as graphic summaries of the posterior densities.

The log location-scale model
The probability density function of a parametric locationscale model for a random variable y on (−∞, ∞) with location parameter μ (−∞ < μ < ∞) and scale parameter σ (> 0) is given by The corresponding distribution and reliability function for y are The standardized random variable z = (y − μ)/σ clearly has density and reliability functions f 0 (z) and R 0 (z) respectively, and Equation (1) with μ = 0 and σ = 1 is called the standard form of the distribution. The lifetime distribution, that is, exponential, Weibull, all have the property that y = logt has a location scale distribution: the Weibull, log-normal, and log-logistic distribution for t correspond to extreme value, normal, and logistic distributions for y. The reliability functions for z = (y − μ)/σ on (−∞, ∞) are respectively, Similarly, any location-scale model (Equation (1)) gives a lifetime distribution by the transformation t = exp(y). In this case the reliability function can be expressed as where α = exp(μ), β = 1/σ and R 0 = R 0 (log(x)) is a reliability function defined on (0, ∞) (e.g., Lawless 2003). The log-Burr distribution can be obtained by generalizing a parametric location-scale family of distribution http://www.springerplus.com/content/3/1/185 given by Equation (1), to let pdf, cdf, or reliability function include one or more parameters. This distribution is much useful because they include common two parameter lifetime distributions as special cases.

The generalized log-Burr family
The generalized log-Burr family, for which the standardized variable z = (y − μ)/σ has the probability density function of the form f 0 (z; k) = e z (1 + e z /k) −k−1 − ∞ < z < ∞ and the corresponding reliability function where k (> 0) is a shape parameter. The special case, k = 1 gives the logistic distribution and k → ∞ gives the extreme value distribution. Since the generalized log-Burr family includes log-logistic and Weibull distributions, it allows discrimination between them. It is also a flexible model for fitting the lifetime data (e.g., Lawless 2003). Figure 1 shows probability density functions for log-Burr distributions with different values of k.

The half-Cauchy prior distribution
The probability density function of half-Cauchy distribution with scale parameter α is given by The mean and variance of the half-Cauchy distribution do not exist, but its mode is equal to 0. The half-Cauchy distribution with scale α = 25 is a recommended, default, noninformative prior distribution for a scale parameter. At this scale α = 25, the density of half-Cauchy is nearly flat but not completely (see Figure 2), prior distributions that are not completely flat provide enough information for the numerical approximation algorithm to continue to explore the target density, the posterior distribution. The inverse-gamma is often used as a noninformative prior distribution for scale parameter, however, this model creates problem for scale parameters near zero, Gelman and Hill (2007) recommend that, the uniform, or if more information is necessary the half-Cauchy is a better choice. Thus, in this paper, the half-Cauchy distribution with scale parameter α = 25 is used as a noninformative prior distribution.

The Laplace approximation
Many simple Bayesian analyses based on noninformative prior distribution give similar results to standard non-Bayesian approaches, for example, the posterior t-interval for the normal mean with unknown variance. The extent to which a noninformative prior distribution can be justified as an objective assumption depends on the amount of information available in the data; in the simple cases as the sample size n increases, the influence of the prior distribution on posterior inference decreases. These ideas, sometime referred to as asymptotic approximation theory because they refer to properties that hold in the limit as n becomes large. Thus, a remarkable method of asymptotic approximation is the Laplace approximation which accurately approximates the unimodal posterior moments and marginal posterior densities in many cases. In this section we introduce a brief, informal description of Laplace approximation method. http://www.springerplus.com/content/3/1/185 Suppose −h(θ ) is a smooth, bounded unimodal function , with a maximum atθ , and θ is a scalar. By Laplace's method (e.g., Tierney and Kadane 1986), the integral can be approximated bŷ As presented in Mosteller and Wallace (1964), Laplace's method is to expand aboutθ to obtain: Recalling that h (θ) = 0, we have Intuitively, if exp[−nh(θ )] is very peaked aboutθ , then the integral can be well approximated by the behavior of the integrand nearθ . More formally, it can be shown that To calculate moments of posterior distributions, we need to evaluate expressions such as: where exp[−nh(θ )] = L(θ |y)p(θ ) (e.g., Tanner 1996).

Fitting with LaplaceApproxomation
The Laplace approximation is a family of asymptotic techniques used to approximate integrals (Statisticat LLC 2013). It seems to accurately approximate uni-modal posterior moments and marginal posterior densities in many cases. Here, for fitting of linear regression model we use the function LaplaceApproximation which is an implementation of Laplace's approximations of the integrals involved in the Bayesian analysis of the parameters in the modeling process. This function deterministically maximizes the logarithm of the unnormalized joint posterior density using one of the several optimization techniques. The aim of Laplace approximation is to estimate posterior mode and variance of each parameter.
For getting posterior modes of the log-posteriors, a number of optimization algorithms are implemented. This includes Levenberg-Marquardt (LM) algorithm which is default. However, we find that the Limited-Memory BFGS (L-BFGS) is a better alternative in Bayesian scenario. The limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm is a quasi-Newton optimization algorithm that compactly approximates the Hessian matrix. Rather than storing the dense Hessian matrix, L-BFGS stores only a few vectors that represent the approximation. It may be noted that Newton-Raphson is the last choice as it is very sensitive to the starting values, it creates problems when starting values are far from the targets, and calculating and inverting the Hessian matrix can be computationally expensive, although it is also implemented in LaplaceApproximation for the sake of completion. The main arguments of LaplaceApproximation can be seen by using the function args as function(Model, parm, Data, Interval = 1e-06, Iterations = 100, Method= "LBFGS", Samples= 1000, sir= TRUE, Stop.Tolerance= 1e-05) NULL First argument Model defines the model to be implemented, which contains specification of likelihood and prior. LaplaceApproximation passes two argument to the model function, parm and Data, and receives five arguments from the model function: LP (the logarithm of the unnormalized joined posterior density), Dev (the deviance), Monitor (the monitored variables), yhat (the variables for the posterior predictive checks), and parm, the vector of parameters, which may be constrained in the model function. The argument parm requires a vector of initial values equal in length to the number of parameters, and LaplaceApproximation will attempt to optimize these initial values for the parameters, where the optimized values are the posterior modes. The Data argument requires a listed data which must be include variable names and parameter names. The argument sir=TRUE stands for implementation of sampling importance resampling algorithm, which is a bootstrap procedure to draw independent sample with replacement from the posterior sample with unequal sampling probabilities. Contrary to sir of LearnBayes package, here proposal density is multivariate normal and not t.

Creation of data
The function LaplaceApproximation requires data that is specified in a list. For locomotive controls data the logarithm of failTime will be the response variable. Since intercept is the only term in the model, a vector of 1's is inserted into designed matrix X. Thus, J=1 indicates only column of 1's in the matrix.

Initial values
The function LaplaceApproximation requires a vector of initial values for the parameters. Each initial value is a starting point for the estimation of a parameter. Here, the first parameter, the beta has been set equal to zero, and the remaining parameter, log.sigma, has been set equal to log(1), which is zero. The order of the elements of the initial values must match the order of the parameters. Thus, define a vector of initial values Initial.Values<-c(rep(0,J),log (1)) For initial values the function GIV (which stands for "Generate Initial Values") may also be used to randomly generate initial values.

Model specification
The function LaplaceApproximation can fit any Bayesian model for which likelihood and prior are specified. However, it is equally useful for maximum likelihood estimation. To use this method one must specify a model. Thus, for fitting of the locomotive controls data, consider that the logarithm of failTime follows log-Burr distribution which is often written as and expectation vector μ is equal to the inner product of design matrix X and parameter β μ = Xβ.
Prior probabilities are specified for regression coefficient β and scale parameter σ The large variance and small precision indicate a lot of uncertainty of each β, and is hence a weakly informative prior distribution. Similarly, half-Cauchy is a weakly informative prior for σ .

Model fitting
To fit the above specified model, the function Laplace-Approximation is used and its results are assigned to object Fit. Its summary of results are printed by the function print, which prints detailed summary of results and it is not possible to show here. However, its relevant parts are summarized in the next section.

Summarizing output
The function LaplaceApproximation approximates the posterior density of the fitted model, and posterior summaries can be seen in the following tables. Table 1 represents the analytic result using Laplace approximation method while Table 2 represents the simulated results using sampling importance resampling algorithm.
From these posterior summaries, it is obvious that, the posterior mode of intercept parameter β 0 for logistic distribution is 5.08 ± 0.09 whereas posterior mode of log(σ )b is −0.96 ± 0.15, while for Weibull distribution the posterior mode of intercept parameter β 0 is 5.21 ±0.09 whereas posterior mode of log(σ ) is −0.85 ± 0.15. Both the parameters of different distributions are statistically significant also. In a practical data analysis, intercept model is discussed merely as a beginning point. More meaningful model is simple regression model or multiple regression model, which will be discussed in Section 'Fitting of regression model'. Simulation tools are being discussed in the next section.

Fitting with LaplacesDemon
Now we have to analyze the same data with the function LaplacesDemon, which is the main function of

.) NULL
The arguments Model and Data specify the model to be implemented and list of data, which are need not to define here for the function LaplacesDemon as http://www.springerplus.com/content/3/1/185 they are already defined for LaplaceApproximation. Initial.Values requires a vector of initial values equal in length to the number of parameter. The argument Covar= NULL indicates that variance vector or covariance matrix has not been specified, so the algorithm will begin with its own estimates. Next two arguments Iterations= 100000 and Status= 1000 indicates that the LaplacesDemon function will update 10000 times before completion and status is reported after every 1000 iterations. The thinning argument accepts integers between 1 and number of iterations, and indicates that every 100th iteration will be retained, while the others are discarded. Thinning is performed to reduced autocorrelation and the number of marginal posterior samples. Further, the Algorithm requires abbreviated name of the MCMC algorithm in quotes. In this case RWM is short for the Random-Walk-Metropolis. Finally, Specs= Null is default argument, and accepts a list of specifications for the MCMC algorithm declared in the Algorithm argument.

Initial values
Laplace's Demon requires a vector of initial values for the parameters. Each initial value will be the starting point for an adaptive chain, or a non-adaptive Markov chain of a parameter. If all initial values are set to zero, then Laplace's Demon will attempt to optimize the initial values with the LaplaceApproximation function using a resilent backpropagation algorithm. So, it is better to use the last fitted object Fit with the function as.initial.values to get a vector of initial values from the LaplaceApproximation for fitting of LaplacesDemon. Thus, to obtain a vector of initial values the function as.initial.values is used as Initial.Values<-as.initial.values(Fit)

Model fitting
Laplace's Demon is stochastic, or involves pseudo-random numbers, its better to set a seed with set.seed function for pseudo-random number generation before fitting with LaplacesDemon, so results can be reproduced. Now, fit the prespecified model with the function LaplacesDemon, and its results are assigned to the object name FitDemon. Its summary of results are printed with the function print, and its relevant parts are summarized in the next section.

Summarizing output
The LaplacesDemon simulates the data from the posterior density with Random-Walk Metropolis and approximate the results which can be seen in the in the following tables. Table 3 represents the simulated results in a matrix form that summarizes the marginal posterior distributions of the parameters over all samples which contains mean, standard deviation, MCSE (Monte Carlo Standard Error), ESS (Effective Sample Size), and finally 2.5%, 50%, 97.5% quantiles, and Table 4 summarizes the simulated results due to stationary samples. The complete picture of the results can also be seen in Figure 3.

Fitting with LaplaceApproxomation Electrical insulating fluid failure times data
Let us introduce a failure times data set of electrical insulating fluid for fitting of regression model, which is taken from Lawless (2003). The same data set is discussed in Nelson (1972). Nelson (1972) described the results of a life test experiment in which specimen of a type of electrical insulating fluid were subjected to a constant voltage stress. The length of time until each specimen failed, or "broke down" was observed. The data give results for seven groups of specimen, tested as voltage ranging from 26 to 38 kilovolts (kV).

Data creation
For fitting of failure times of electrical insulating fluid data with LaplaceApproximation, the logarithm of breakdownTime will be the response variable and voltageLevel will be the regressor variable. Since an intercept term will be included, a vector of 1's is inserted into the design matrix X. Thus, J = 2 indicates that, there are two columns of independent variables, first column for intercept term and second column for regressor, in the design matrix.

Initial values
The initial value is taken as a starting point for the estimation of a parameter. So the first two parameters, the beta parameters have been set equal to zero, and log.sigma has been set equal to log(1), which is zero. Initial.Values<-c(rep(0,J),log(1))

Model specification
For fitting of the regression model with LaplaceApproximation, must specify a model. Thus, for failure times of electrical insulating fluid data, consider that logarithm of breakdownTime follows log-Burr distribution. In this Bayesian linear regression with an intercept and one independent variable the model is specified as y ∼ Log-Burr(μ, σ ; k), and expectation vector μ is an additive, linear function of vector of regression parameters, β, and the design matrix X.

Model fitting
Now, fit the above specified model using the Laplace-Approximation by assigning the object name Fit, and its results are summarized in the next section.

Summarizing output
The relevant summary of results of the fitted regression model using the function LaplaceApproximation, can easily be seen in these two tables. Table 5 represents the analytic result using Laplace approximation method, and Table 6 represents the simulated results using Sampling Importance Resampling method.

Fitting with LaplacesDemon
In this section, the function LaplcesDemon is used to analyze the same data, that is, electrical insulating fluid failure times data. This function maximizes the logarithm of un-normalized joint posterior density with MCMC algorithms, and provides samples of the marginal posterior distributions, deviance and other monitored variables.

Model fitting
For fitting the same model with the function LaplacesDemon by assigning the object name FitDemon, the R codes are as follows. Its summary of results are printed with the function print.

Summarizing output
The function LaplacesDemon for this regression model, simulates the data from the posterior density with Random-Walk-Metropolis algorithm, and summaries of results are reported in the following tables. Table 7 represents the posterior summary of all samples, and Table 8 represents the posterior summary  of stationary samples. The graphical summaries of the results can also be seen in Figure 4.

Discussion and conclusions
In this article, Bayesian approach is applied to model the real reliability data. The generalized log-Burr distribution is used as a Bayesian model to fit the data, and for the analysis. Two important techniques, that is, asymptotic approximation and simulation method are implemented using the functions of 'LaplacesDemon' package of R. This package facilitates high-dimensional Bayesian inference, posing as its own intellect that is capable of impressive analysis, which is written entirely in R environment and has a remarkable provision for user defined probability model. The main body of the manuscript contains the complete description of R code both for intercept and regression models of log-Burr distribution. The function LaplaceApproximation approximates the results asymptotically and simulation is made by the function LaplacesDemon. Results of these two methods are very close to each other for different values of shape parameter k of log-Burr distribution. The excellency of these approximations seem clear in the plots of posterior densities. It is evident from the summaries of results that the Bayesian approach based on weakly informative priors is simpler to implement than the classical approach. The wealth of information provided in these numeric and graphic summaries are not possible in classical framework (e.g., Lawless 2003). Thus, it is very difficult to analyze these types of data by classical method, whereas it is quite simple in Bayesian paradigm using tools like R. http://www.springerplus.com/content/3/1/185