Bayesian analysis of generalized log-Burr family with R

Akhtar, Md Tanwir; Khan, Athar Ali

doi:10.1186/2193-1801-3-185

Research
Open access
Published: 10 April 2014

Bayesian analysis of generalized log-Burr family with R

Md Tanwir Akhtar¹ &
Athar Ali Khan¹

SpringerPlus volume 3, Article number: 185 (2014) Cite this article

2457 Accesses
3 Citations
Metrics details

Abstract

Log-Burr distribution is a generalization of logistic and extreme value distributions, which are important reliability models. In this paper, Bayesian approach is used to model reliability data for log-Burr model using analytic and simulation tools. Laplace approximation is implemented for approximating posterior densities of the parameters. Moreover, parallel simulation tools are also implemented using ‘LaplacesDemon’ package of R.

Introduction

The log-Burr distribution is a generalization of two important reliability models, that is, logistic distribution and extreme value distribution. The non-Bayesian analysis of generalized log-Burr distribution is a very difficult task, whereas it can be a routine analysis when dealing in a Bayesian paradigm. In this paper, an attempt has been made with the following objectives:

To define a Bayesian model, that is, specification of likelihood and prior distribution.
To write down the R code for approximating posterior densities with Laplace approximation and simulation tools (R Core Team, 2013).
To illustrate numeric as well as graphic summaries of the posterior densities.

The log location-scale model

The probability density function of a parametric location-scale model for a random variable y on (−∞,∞) with location parameter μ (−∞<μ<∞) and scale parameter σ (>0) is given by

\begin{matrix} f (y; μ, σ) = \frac{1}{σ} f_{0} (\frac{y - μ}{σ}) - \infty < y < \infty \end{matrix}

(1)

The corresponding distribution and reliability function for y are

\begin{align} F_{0} (y; μ, σ) & = \int_{- \infty}^{y} f_{0} (t) dt \\ R_{0} (y; μ, σ) & = 1 - \int_{- \infty}^{y} f_{0} (t) dt = 1 - F_{0} (y; μ, σ) \end{align}

The standardized random variable z=(y−μ)/σ clearly has density and reliability functions f₀(z) and R₀(z) respectively, and Equation (1) with μ=0 and σ=1 is called the standard form of the distribution.

The lifetime distribution, that is, exponential, Weibull, all have the property that y=l o g t has a location scale distribution: the Weibull, log-normal, and log-logistic distribution for t correspond to extreme value, normal, and logistic distributions for y. The reliability functions for z=(y−μ)/σ on (−∞,∞) are respectively,

\begin{align} R_{0} (z) & = exp (e^{z}) & extreme value \\ R_{0} (z) & = 1 - Φ (z) & normal \\ R_{0} (z) & = {(1 + e^{z})}^{- 1} & logistic \end{align}

Similarly, any location-scale model (Equation (1)) gives a lifetime distribution by the transformation t=exp(y). In this case the reliability function can be expressed as

\begin{align} R_{0} (t; μ, σ) & = R_{0} (\frac{log t - μ}{σ}) \\ = R_{o}^{^{'}} [{(\frac{t}{α})}^{β}], \end{align}

where α=exp(μ), β=1/σ and $R_{0}^{^{'}} = R_{0} (log (x))$ is a reliability function defined on (0,∞) (e.g., Lawless 2003).

The log-Burr distribution can be obtained by generalizing a parametric location-scale family of distribution given by Equation (1), to let pdf, cdf, or reliability function include one or more parameters. This distribution is much useful because they include common two parameter lifetime distributions as special cases.

The generalized log-Burr family

The generalized log-Burr family, for which the standardized variable z=(y−μ)/σ has the probability density function of the form

f_{0} (z; k) = e^{z} {(1 + e^{z} / k)}^{- k - 1} - \infty < z < \infty

and the corresponding reliability function

R_{0} (z; k) = {(1 + e^{z} / k)}^{- k} - \infty < z < \infty

where k (>0) is a shape parameter. The special case, k=1 gives the logistic distribution and k→∞ gives the extreme value distribution. Since the generalized log-Burr family includes log-logistic and Weibull distributions, it allows discrimination between them. It is also a flexible model for fitting the lifetime data (e.g., Lawless 2003). Figure 1 shows probability density functions for log-Burr distributions with different values of k.

The half-Cauchy prior distribution

The probability density function of half-Cauchy distribution with scale parameter α is given by

f (x) = \frac{2 α}{π (x^{2} + α^{2})}, x > 0, α > 0 .

The mean and variance of the half-Cauchy distribution do not exist, but its mode is equal to 0. The half-Cauchy distribution with scale α=25 is a recommended, default, noninformative prior distribution for a scale parameter. At this scale α=25, the density of half-Cauchy is nearly flat but not completely (see Figure 2), prior distributions that are not completely flat provide enough information for the numerical approximation algorithm to continue to explore the target density, the posterior distribution. The inverse-gamma is often used as a noninformative prior distribution for scale parameter, however, this model creates problem for scale parameters near zero, Gelman and Hill (2007) recommend that, the uniform, or if more information is necessary the half-Cauchy is a better choice. Thus, in this paper, the half-Cauchy distribution with scale parameter α=25 is used as a noninformative prior distribution.

The Laplace approximation

Many simple Bayesian analyses based on noninformative prior distribution give similar results to standard non-Bayesian approaches, for example, the posterior t-interval for the normal mean with unknown variance. The extent to which a noninformative prior distribution can be justified as an objective assumption depends on the amount of information available in the data; in the simple cases as the sample size n increases, the influence of the prior distribution on posterior inference decreases. These ideas, sometime referred to as asymptotic approximation theory because they refer to properties that hold in the limit as n becomes large. Thus, a remarkable method of asymptotic approximation is the Laplace approximation which accurately approximates the unimodal posterior moments and marginal posterior densities in many cases. In this section we introduce a brief, informal description of Laplace approximation method.

Suppose −h(θ) is a smooth, bounded unimodal function, with a maximum at $\hat{θ}$ , and θ is a scalar. By Laplace’s method (e.g., Tierney and Kadane 1986), the integral

I = \int f (θ) exp [- nh (θ)] dθ

can be approximated by

Î = f (\hat{θ}) \sqrt{\frac{2 π}{n}} σ exp [- nh (\hat{θ})],

where

σ = {[\frac{\partial^{2} h}{\partial θ^{2}} |_{\hat{θ}}]}^{- 1 / 2} .

As presented in Mosteller and Wallace (1964), Laplace’s method is to expand about $\hat{θ}$ to obtain:

I \approx \int f (\hat{θ}) exp (- n [h (\hat{θ}) + (θ - \hat{θ}) h^{'} (\hat{θ}) + \frac{{(θ - \hat{θ})}^{2}}{2} h^{′′} (\hat{θ})]) dθ.

Recalling that $h^{'} (\hat{θ)} = 0$ , we have

\begin{align} I & \approx \int f (\hat{θ}) exp [- n (h (\hat{θ}) + \frac{{(θ - \hat{θ})}^{2}}{2} h^{′′} (\hat{θ}))] dθ \\ = f (\hat{θ}) exp [- nh (\hat{θ})] \int exp (\frac{- n {(θ - \hat{θ})}^{2}}{2 σ^{2}}) dθ \\ = f (\hat{θ}) \sqrt{\frac{2 π}{n}} σ exp [- nh (\hat{θ})] . \end{align}

Intuitively, if exp[ −n h(θ)] is very peaked about $\hat{θ}$ , then the integral can be well approximated by the behavior of the integrand near $\hat{θ}$ . More formally, it can be shown that

I = Î [1 + O (\frac{1}{n})] .

To calculate moments of posterior distributions, we need to evaluate expressions such as:

E [g (θ)] = \frac{\int g (θ) exp [- nh (θ)] dθ}{\int exp [- nh (θ)] dθ},

(2)

where exp[ −n h(θ)]=L(θ|y)p(θ) (e.g., Tanner 1996).

Fitting of intercept model

Fitting with LaplaceApproxomation

The Laplace approximation is a family of asymptotic techniques used to approximate integrals (Statisticat LLC 2013). It seems to accurately approximate uni-modal posterior moments and marginal posterior densities in many cases. Here, for fitting of linear regression model we use the function LaplaceApproximation which is an implementation of Laplace’s approximations of the integrals involved in the Bayesian analysis of the parameters in the modeling process. This function deterministically maximizes the logarithm of the unnormalized joint posterior density using one of the several optimization techniques. The aim of Laplace approximation is to estimate posterior mode and variance of each parameter. For getting posterior modes of the log-posteriors, a number of optimization algorithms are implemented. This includes Levenberg-Marquardt (LM) algorithm which is default. However, we find that the Limited-Memory BFGS (L-BFGS) is a better alternative in Bayesian scenario. The limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm is a quasi-Newton optimization algorithm that compactly approximates the Hessian matrix. Rather than storing the dense Hessian matrix, L-BFGS stores only a few vectors that represent the approximation. It may be noted that Newton-Raphson is the last choice as it is very sensitive to the starting values, it creates problems when starting values are far from the targets, and calculating and inverting the Hessian matrix can be computationally expensive, although it is also implemented in LaplaceApproximation for the sake of completion. The main arguments of LaplaceApproximation can be seen by using the function args as

First argument Model defines the model to be implemented, which contains specification of likelihood and prior. LaplaceApproximation passes two argument to the model function, parm and Data, and receives five arguments from the model function: LP (the logarithm of the unnormalized joined posterior density), Dev (the deviance), Monitor (the monitored variables), yhat (the variables for the posterior predictive checks), and parm, the vector of parameters, which may be constrained in the model function. The argument parm requires a vector of initial values equal in length to the number of parameters, and LaplaceApproximation will attempt to optimize these initial values for the parameters, where the optimized values are the posterior modes. The Data argument requires a listed data which must be include variable names and parameter names. The argument sir=TRUE stands for implementation of sampling importance resampling algorithm, which is a bootstrap procedure to draw independent sample with replacement from the posterior sample with unequal sampling probabilities. Contrary to sir of LearnBayes package, here proposal density is multivariate normal and not t.

Locomotive controls data

Let us introduce a failure times dataset taken from Lawless (2003), so that all the concepts and computations will be discussed around that data. The same data were discussed by Schmee and Nelson (1977). This data set contains the number of thousand miles at which different locomotive controls failed, in a life test involving 96 controls. The test was terminated after 135,000 miles, by which time 37 failures had occurred. The failure times for the 37 failed units are 22.5, 37.5, 46.0, 48.5, 51.5, 53.0, 54.5, 57.5, 66.5, 68.0, 69.5, 76.5, 77.0, 78.5, 80.0, 81.5, 82.0, 83.0, 84.0, 91.5, 93.5, 102.5, 107.0, 108.5, 112.5, 113.5, 116.0, 117.0, 118.5, 119.0, 120.0, 122.5, 123.0, 127.5, 131.0, 132.5, 134.0. In addition, there are 59 censoring times, all equal to 135.0.

Creation of data

The function LaplaceApproximation requires data that is specified in a list. For locomotive controls data the logarithm of failTime will be the response variable. Since intercept is the only term in the model, a vector of 1’s is inserted into designed matrix X. Thus, J = 1 indicates only column of 1’s in the matrix.

In this case, there are two parameters beta and log.sigma which must be specified in vector parm.names. The logposterior LP and sigma are included as monitored variables in vector mon.names. The number of observations are specified by N. Censoring is also taken into account, where 0 stands for censored and 1 for uncensored values. Finally all these thing are combined in a listed form as MyData object at the end of the command.

Initial values

The function LaplaceApproximation requires a vector of initial values for the parameters. Each initial value is a starting point for the estimation of a parameter. Here, the first parameter, the beta has been set equal to zero, and the remaining parameter, log.sigma, has been set equal to log(1), which is zero. The order of the elements of the initial values must match the order of the parameters. Thus, define a vector of initial values

For initial values the function GIV (which stands for “Generate Initial Values”) may also be used to randomly generate initial values.

Model specification

The function LaplaceApproximation can fit any Bayesian model for which likelihood and prior are specified. However, it is equally useful for maximum likelihood estimation. To use this method one must specify a model. Thus, for fitting of the locomotive controls data, consider that the logarithm of failTime follows log-Burr distribution which is often written as

y ∽ Log - Burr (μ, σ; k),

and expectation vector μ is equal to the inner product of design matrix X and parameter β

μ = Xβ.

Prior probabilities are specified for regression coefficient β and scale parameter σ

\begin{align} β_{j} ∽ N (0, 1000), j = 1, \dots, J \\ σ ∽ HC (25) . \end{align}

The large variance and small precision indicate a lot of uncertainty of each β, and is hence a weakly informative prior distribution. Similarly, half-Cauchy is a weakly informative prior for σ.

The Model function contains two arguments, that is, parm and Data, where parm is for the set of parameters, and Data is the list of data. There are two parameters beta and sigma having priors beta.prior and sigma.prior, respectively. The object LL stands for loglikelihood and LP stands for logposterior. The function Model returns the object Modelout, which contains five objects in listed form that includes logposterior LP, deviance Dev, monitoring parameters Monitor, fitted values yhat and estimates of parameters parm.

Model fitting

To fit the above specified model, the function LaplaceApproximation is used and its results are assigned to object Fit. Its summary of results are printed by the function print, which prints detailed summary of results and it is not possible to show here. However, its relevant parts are summarized in the next section.

Summarizing output

The function LaplaceApproximation approximates the posterior density of the fitted model, and posterior summaries can be seen in the following tables. Table 1 represents the analytic result using Laplace approximation method while Table 2 represents the simulated results using sampling importance resampling algorithm. From these posterior summaries, it is obvious that, the posterior mode of intercept parameter β₀ for logistic distribution is 5.08±0.09 whereas posterior mode of log(σ)b is −0.96±0.15, while for Weibull distribution the posterior mode of intercept parameter β₀ is 5.21±0.09 whereas posterior mode of log(σ) is −0.85±0.15. Both the parameters of different distributions are statistically significant also. In a practical data analysis, intercept model is discussed merely as a beginning point. More meaningful model is simple regression model or multiple regression model, which will be discussed in Section ‘Fitting of regression model’. Simulation tools are being discussed in the next section.

Table 1 Summary of the analytic approximation using the function LaplaceApproximation. It may be noted that these summaries are based on asymptotic approximation, and hence Mode stands for posterior mode, SD stands for posterior standard deviation, and LB, UB are 2.5% and 97.5% quantiles, respectively

Full size table

Table 2 Summary matrices of the simulation due to sampling importance resampling algorithm using the function LaplaceApproximation , where Mean stands for posterior mean, SD for posterior standard deviation, MCSE for Monte Carlo standard error, ESS , for effective sample size, and LB , Median , UB are 2.5%, 50%, 97.5% quantiles, respectively

Full size table

Fitting with LaplacesDemon

Now we have to analyze the same data with the function LaplacesDemon, which is the main function of Laplace’s Demon. Given data, a model specification, and initial values, LaplacesDemon maximizes the logarithm of the unnormalized joint posterior density with Markov chain Monte Carlo (MCMC) algorithms, also called samplers, and provides samples of the marginal posterior distributions, deviance and other monitored variables. Laplace’s Demon offers a large number of MCMC algorithms for numerical approximation. Popular families include Gibbs sampling, Metropolis-Hasting (MH), Random-Walk-Metropolis (RWM), slice sampling, Metropolis-within Gibbs (MWG), Adaptive-Metropolis-within-Gibbs (AMWG), and many others. However, details of MCMC algorithms are best explored online athttp://www.bayesian-inference.com/mcmc, as well as in the “LaplacesDemon Tutorial" vignette. The main arguments of the LaplacesDemon can be seen by using the function args as:

The arguments Model and Data specify the model to be implemented and list of data, which are need not to define here for the function LaplacesDemon as they are already defined for LaplaceApproximation. Initial.Values requires a vector of initial values equal in length to the number of parameter. The argument Covar= NULL indicates that variance vector or covariance matrix has not been specified, so the algorithm will begin with its own estimates. Next two arguments Iterations= 100000 and Status= 1000 indicates that the LaplacesDemon function will update 10000 times before completion and status is reported after every 1000 iterations. The thinning argument accepts integers between 1 and number of iterations, and indicates that every 100th iteration will be retained, while the others are discarded. Thinning is performed to reduced autocorrelation and the number of marginal posterior samples. Further, the Algorithm requires abbreviated name of the MCMC algorithm in quotes. In this case RWM is short for the Random-Walk-Metropolis. Finally, Specs= Null is default argument, and accepts a list of specifications for the MCMC algorithm declared in the Algorithm argument.

Initial values

Laplace’s Demon requires a vector of initial values for the parameters. Each initial value will be the starting point for an adaptive chain, or a non-adaptive Markov chain of a parameter. If all initial values are set to zero, then Laplace’s Demon will attempt to optimize the initial values with the LaplaceApproximation function using a resilent backpropagation algorithm. So, it is better to use the last fitted object Fit with the function as.initial.values to get a vector of initial values from the LaplaceApproximation for fitting of LaplacesDemon. Thus, to obtain a vector of initial values the function as.initial.values is used as

Model fitting

Laplace’s Demon is stochastic, or involves pseudo-random numbers, its better to set a seed with set.seed function for pseudo-random number generation before fitting with LaplacesDemon, so results can be reproduced. Now, fit the prespecified model with the function LaplacesDemon, and its results are assigned to the object name FitDemon. Its summary of results are printed with the function print, and its relevant parts are summarized in the next section.

Summarizing output

The LaplacesDemon simulates the data from the posterior density with Random-Walk Metropolis and approximate the results which can be seen in the in the following tables. Table 3 represents the simulated results in a matrix form that summarizes the marginal posterior distributions of the parameters over all samples which contains mean, standard deviation, MCSE (Monte Carlo Standard Error), ESS (Effective Sample Size), and finally 2.5%, 50%, 97.5% quantiles, and Table 4 summarizes the simulated results due to stationary samples. The complete picture of the results can also be seen in Figure 3.

Table 3 Posterior summaries of simulation due to all samples using the function LaplacesDemon

Full size table

Table 4 Posterior summaries of simulation due to stationary samples using the function LaplacesDemon

Full size table