Skip to main content

Exponential-modified discrete Lindley distribution

Abstract

In this study, we have considered a series system composed of stochastically independent M-component where M is a random variable having the zero truncated modified discrete Lindley distribution. This distribution is newly introduced by transforming on original parameter. The properties of the distribution of the lifetime of above system have been examined under the given circumstances and also parameters of this new lifetime distribution are estimated by using moments, maximum likelihood and EM-algorithm.

Background

Under the name of the “new lifetime distribution”, about 400 studies have been done in the recent 5 years. In particular, the compound distributions obtained by exponential distribution are applicable in the fields such as electronics, geology, medicine, biology and actuarial. Some of these works can be summarized as follows: Adamidis and Loukas (1998) and Adamidis et al. (2005) introduced a two-parameter lifetime distribution with decreasing failure rate by compounding exponential and geometric distribution. In the same way, exponential-Poisson (EP) and exponential-logarithmic (EL) distributions were given by Kus (2007) and Tahmasbi and Rezaei (2008), respectively. Chahkandi and Ganjali (2009) introduced exponential-power series distributions (EPS). Barreto-Souza and Bakouch (2013) introduced a new three-parameter distribution by compounding exponential and Poisson–Lindley distributions, named the exponential Poisson–Lindley (EPL) distribution. Exponential-Negative Binomial distribution is introduced by Hajebi et al. (2013). Furthermore, Gui et al. (2014) have considered the Lindley distribution which can be described as a mixture of the exponential and gamma distributions. This idea has helped them to propose a new distribution named as Lindley–Poisson by compounding the Lindley and Poisson distributions.

Because most of those distributions have decreasing failure rate. They have important place in reliability theory. Lots of those lifetime data can be modelled by compound distributions. Although these compound distributions are quite complex, new distributions can fit better than the known distributions for modelling lifetime data.

Probability mass function of the discrete Lindley distribution obtained by discretizing the continuous survival function of the Lindley distribution (Gómez-Déniz and Calderín-Ojeda 2011; Eq. 3, Bakouch et al. 2014; Eq. 3). This discrete distribution provided by authors above, is quite a complex structure in terms of parameter. In order to overcome problems in estimation process of the parameter of Lindley distribution, we propose a modified discrete Lindley distribution. Thus, estimation process of the parameters using especially the EM algorithm was facilitated. Afterwards, we propose a new lifetime distribution with decreasing hazard rate by compounding exponential and modified-zero-truncated discrete Lindley distributions.

This paper is organized as follows: In “Construction of the model” section, we propose the two-parameter exponential-modified discrete Lindley (EMDL) distribution, by mixing exponential and zero truncated modified discrete Lindley distribution, which exhibits the decreasing failure rate (DFR) property. In “Properties of EMDL distribution” section, we obtain moment generating function, quantile, failure rate, survival and mean residual lifetime functions of the EMDL. In “Inference” section, the estimation of parameters is studied by some methods such as moments, maximum likelihood and EM algorithm. Furthermore, information matrix and observed information matrix are also discussed in this section. The end of this section includes a detailed simulation study to see the performance of Moments (with lower and upper bound approximations), ML and EM estimates. Illustrative examples based on three real data sets are provided in “Applications” section.

Construction of the model

In this section, we first give the definition of the discrete Lindley distribution introduced by Gómez-Déniz and Calderín-Ojeda (2011) and Bakouch et al. (2014). We have achieved a more simplified discrete distribution than discrete Lindley distribution by taking \(1-\theta\) instead of \(e^{-\theta }\) in subsequent definition. Thus, we introduce a new lifetime distribution by compounding Exponential and Modified Discrete Lindley distributions, named the Exponential-Modified Discrete Lindley (EMDL) distribution.

Discrete Lindley distribution

A discrete random variable M is said to have Lindley distribution with the parameter \(\theta >0\), if its probability mass function (p.m.f) is given by

$$P\left( M=m\right) =\frac{e^{-m\theta }}{1+\theta }\left( \theta \left( 1-2e^{-\theta }\right) +\left( 1-e^{-\theta }\right) \left( 1+\theta m\right) \right) , \quad m=0,1,2,\ldots$$
(1)

The cumulative distribution function of M will be given by

$$P\left( M\le m\right) =1-\frac{1+2\theta +\theta m}{1+\theta }e^{-\left( m+1\right) \theta },\ \ \ \ \ \ m=0,1,2,\ldots$$

Modified discrete Lindley distribution

If \(\theta\) is limited to the range (0, 1), then we replace \(exp(-\theta )\) by \(1-\theta\) using the first degree Taylor expansion of \(exp(-\theta )\) in (1). The new discrete distribution is specified by the following probability mass function:

$$P\left( M=m\right) =\frac{{\theta }^2}{1+\theta }{\left( 1-\theta \right) }^m\left( m+2\right),$$
(2)

for \(0<\theta <1\) and \(m=0,1,2,\ldots\). We call this distribution Modified Discrete Lindley (MDL).

Theorem 1

MDL distribution can be represented as a mixture of geometric and negative binomial distributions with mixing proportion is \(\frac{\theta }{1+\theta }\) , and a common success rate \(\theta\).

Proof

If p.m.f in (2) is rewritten as the following form

$$P\left( M=m\right) =\frac{\theta }{1+\theta }\left[ \theta {\left( 1-\theta \right) }^m\right] +\frac{1}{1+\theta }\left[ {\theta }^2\left( m+1\right) {\left( 1-\theta \right) }^m\right] =w_1f_1\left( m\right) +w_2f_2\left( m\right),$$

then \(f_1\) indicates p.m.f of a geometric random variable with success probability \(\theta\) and \(f_2\) indicates p.m.f of a negative binomial random variable which denotes the number of trials until the second success, with common success probability \(\theta\). \(w_1=\frac{\theta }{1+\theta }\) and \(w_2=\frac{1}{1+\theta }\) denote component probabilities; in other words these are called the mixture weights (Fig. 1). \(\square\)

Fig. 1
figure 1

P.m.f of geometric, negative binomial and modified discrete Lindley

Note that MDL distribution has an increasing hazard rate while a geometric distribution has a constant hazard rate. So, MDL distribution is more useful than geometric distribution for modelling the number of rare events.

When the \(\theta\) is closed to zero, then MDL can occure different shapes than the p.m.f of a Geometric distribution. This situation made the distribution thinner right tail than a distribution which is compounded with exponential distribution. Thus, this proposed compound distribution can be usefull for modelling lifetime data such as time interval between successive earthquakes, time period of bacteria spreading, recovery period of the certain disease.

Exponential modified discrete Lindley distribution

Suppose that M is a zero truncated MDL random variable with probablity mass function \(\pi \left( m\right) =P(M=m\) \(\vert {M>0})=\frac{{\theta }^2}{\left( 1+2\theta \right) }{\left( 1-\theta \right) }^{m-1}\left( m+2\right)\) and \(X_1,X_2,\ldots, X_M\) are i.i.d. with probability density function \(h\left( x;\beta \right) =\beta e^{-\beta x},\ x>0\). Let \(X=min\left( X_1,X_2,\ldots ,X_M\right)\), then \(g\left( x\vert {m};\beta \right) =m\beta e^{-m\beta x}\) and \(g\left( x,m\right) =g\left( x\vert {m}\right) \pi (m)=\frac{\beta {\theta }^2}{\left( 1+2\theta \right) }m\left( m+2\right) {\left( 1-\theta \right) }^{m-1}e^{-m{\beta}x }\).

Thus, we can obtain the marginal probability density function of X as

$$f\left( x;\theta ,\beta \right) =\frac{{\theta }^2}{1+2\theta} \frac{\beta e^{-\beta x}\left( 3-\left( 1-\theta \right) e^{-\beta x}\right) }{{\left( 1-\left( 1-\theta \right) e^{-\beta x }\right)}^3},\quad x>0$$
(3)

where θ (0, 1) and β > 0. Henceforth, the distribution of the random variable X having the p.d.f in (3) is called shortly EMDL. By changing of variables \(r=\left( 1-\theta \right) e^{-\beta x}\) in cumulative integration of (3), the distribution function can be found as follows:

$$F\left( x;\theta ,\beta \right) =1-\left[ \frac{{\theta }^2}{1+2\theta }\frac{e^{-\beta x}\left( 3-2\left( 1-\theta \right) e^{-\beta x}\right) }{{\left( 1-\left( 1-\theta \right) e^{-\beta x}\right) }^2}\right].$$

Following figure shows different shapes of p.d.f of EMDL random variable for various values of \(\theta\) and \(\beta\) (Fig. 2).

Fig. 2
figure 2

P.d.f of EMDL random variable for different parameter values

Properties of EMDL distribution

In this section the important characteristics and features in mathematical statistics and realibility which are moment generating function and moments, quantiles, survival, hazard rate and mean residual life functions of the EMDL distribution are introduced. We will also give a relationship with Lomax and Exponential-Poisson distributions.

Moment generating function and moments

Moment generating function of X is given by

$$M\left( t\right) =E\left( e^{tx}\right) =\frac{{\theta }^2}{1+2\theta }\sum _{j=1}^{\infty} j\left( j+2\right) {\left( 1-\theta \right) }^{j-1}\frac{\beta }{\beta j-t}$$

for \(t<\beta\). Hence a closed form of k.th raw moment of X is expressed by

$$E\left( X^k\right) =\frac{\Gamma \left( k+1\right) {\theta }^2}{{\beta }^k\left( 1+2\theta \right) }\sum _{j=1}^{\infty }\frac{\left( j+2\right) }{j^k}{\left( 1-\theta \right) }^{j-1},$$

for \(k=1,2,\ldots\). Here for \(k>1\) raw moments can be calculated numerically for given values of \(\theta\) since infinite series above can be represented by polylog functions.

First and second raw moments are evaluated respectively as

$$E\left( X\right)= \frac{\theta }{\beta \left( 1+2\theta \right) }\left[ 1-\frac{2\theta \ln {\theta }}{1-\theta} \right],$$
(4)
$$E \left( X^2\right)=\frac{2{\theta }^2}{{\beta }^2\left( 1+2\theta \right) \left( 1-\theta \right) }\left[ -\ln {\theta }+2\sum _{k=1}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k^2}\right]$$
(5)

Quantile function

Quantile function of X is obtained simply by inverting \(F(x;\theta ,\beta )=q\) as follows

$$x_q=\frac{log\left( 1-\theta \right) -log\left[ \frac{\left( 2\left( 1-q\right) +3A\left( \theta \right) \right) \,-\,\sqrt{9{A\left( \theta \right) }^2+4\left( 1-q\right) A\left( \theta \right) }}{2\left( 1-q\right) +4A\left( \theta \right) }\right] }\beta$$

where \(0<q<1\) and \(A(\theta )=\frac{{\theta }^2}{(1+2\theta )(1-\theta )}\). In particular, the first quartile of X is

$$x_{0.25}=\frac{log\left( 1-\theta \right) -log\left[ \frac{\left( \frac{3}{2}+3A\left( \theta \right) \right)\, -\,\sqrt{9{A\left( \theta \right) }^2+3A\left( \theta \right) }}{\frac{3}{2}+4A\left( \theta \right) }\right] }{\beta },$$

the median of X is

$$x_{0.5}=\frac{log\left( 1-\theta \right) -log\left[ \frac{\left( 1+3A\left( \theta \right) \right) \,-\,\sqrt{9{A\left( \theta \right) }^2+2A\left( \theta \right) }}{1+4A\left( \theta \right) }\right] }{\beta },$$

and the third quartile of X is

$$x_{0.75}=\frac{log\left( 1-\theta \right) -log\left[ \frac{\left( \frac{1}{2}+3A\left( \theta \right) \right) \,-\,\sqrt{9{A\left( \theta \right) }^2+A\left( \theta \right) }}{\frac{1}{2}+4A\left( \theta \right) }\right] }{\beta }.$$

Survival, hazard rate and mean residual life functions

The survival function of X is given by (Fig. 3)

$$S\left( x\right) =\frac{{\theta }^2}{1+2\theta }\left( \frac{e^{-\beta x}\left( 3-2\left( 1-\theta \right) e^{-\beta x}\right) }{{\left( 1-\left( 1-\theta \right) e^{-\beta x}\right) }^2}\right).$$
(6)
Fig. 3
figure 3

Survival function of EMDL random variable for selected parameter values

From (3) and (6) it is easy to verify that the hazard rate function of X is

$$\begin{aligned} h\left( x\right)&=\,\frac{f\left( x\right) }{S\left( x\right) } =\beta \frac{\left( 3-r\right) }{\left( 1-r\right) \left( 3-2r\right) }=\beta \left[ 1+2\frac{r\left( 2-r\right) }{\left( 1-r\right) \left( 3-2r\right) }\right] \\&=\,\beta \left[ \frac{2}{\left( 1-r\right) }-\frac{3}{\left( 3-2r\right) }\right] \end{aligned}$$
(7)

with \(h\left( 0\right) =\frac{\beta \left( 2+\theta \right) }{\theta \left( 1+2\theta \right) }\ge \beta\) and \(\lim _{x\rightarrow \infty }{h\left( x\right) }=\beta\) where \(r=\left( 1-\theta \right) e^{-\beta x}\). As it can be seen immediately from last two statements on the right side of (7), h(x) is a monotonically decreasing function and bounded from below with \(\beta\) (see Fig. 4).

Fig. 4
figure 4

Hazard rate function of EMDL random variable for selected parameter values

The mean residual life function of X is given by

$$\begin{aligned} mrl\left( x\right)&=E\left( X-x\vert {X>x}\right) \ \\&=\frac{1}{\beta }\ \frac{r\left( 1-r\right) -2{\left( 1-r\right) }^2ln\left( 1-r\right) }{3r-2r^2} \end{aligned}$$

where \(r=\left( 1-\theta \right) e^{-\beta x}\). Note that \(mrl\left( x\right) \le \frac{1}{\beta }\) holds for \(x>0\). We can see this result immediately below by letting \(-ln\left( 1-r\right) =\int _{1-r}^1\frac{1}{z}dz\). Then applying the mean value theorem, we have the upper bound for \(-ln\left( 1-r\right)\) as \(\frac{r}{1-r}\). If this upper bound is written above, then

$$mrl\left( x\right) \le \frac{1}{\beta }\ \frac{3\left( 1-r\right) }{3-2r}\le \frac{1}{\beta }.$$

We have the following graphs of mrl(x) for different values of parameter \(\theta\) and \(\beta\) (Fig. 5).

Fig. 5
figure 5

Mrl function of EMDL random variable for different parameter values

Relationship of the other distribution

Let consider the following transformation of X

$$Y=\frac{e^{\beta X}-1}{1-\theta }.$$

Then the probability density function of Y can be obtained as

$$f_Y\left( y\right) =\frac{3\theta }{1+2\theta }\left( \frac{\frac{\theta }{1-\theta }}{{\left( y+\frac{\theta }{1-\theta }\right) }^2}\right) +\frac{1-\theta }{1+2\theta }\left( \frac{2{\left( \frac{\theta }{1-\theta }\right) }^2}{{\left( y+\frac{\theta }{1-\theta }\right) }^3}\right)$$

It can be easily seen that distribution of Y is a mixture of two Lomax distributions with common scale paramater \(\frac{\theta }{1\,-\,\theta }\), and \(\alpha =1\) and \(\alpha =2\) respectively. Thus, \(\frac{3\theta }{1\,+\,2\theta }\) and \(\frac{1-\theta }{1+2\theta }\) represent the weight probabilities of mixture components.

Inference

In this section the estimation techniques of the parameters of the EMDL distribution are studied using the moments, maximum likelihood and EM algorithm. In particular, because first two moments of the distribution have a very complex structure, we have developed bounds to get a solution more easily. Fisher information matrix and asymptotic confidence ellipsoid for the parameters \(\theta\) and \(\beta\) are also obtained. A detailed simulation study based on four estimation mehods is located at the end of this section.

Estimation by moments

Let \(X_1,\ X_2,\dots ,{X}_n\) be a random sample from EMDL distribution and \(m_1\) and \(m_2\) represent the first two sample moments. Then from (4) and (5), we will have the following system of equations

$$m_1= \frac{\theta }{\beta \left( 1+2\theta \right) }\left[ 1-\frac{2\theta \ln {\theta }}{1-\theta }\right] ,$$
(8)
$$m_2= \frac{2{\theta }^2}{{\beta }^2\left( 1+2\theta \right) \left( 1-\theta \right) }\left[ -\ln {\theta }+2\ I\left( \theta \right) \right] .$$
(9)

where \(I\left( \theta \right) =\sum _{k=1}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k^2}\).

Moment estimates of \(\theta\) and \(\beta\) can be obtained by solving equations above. However, Eqs. (8) and (9) have no explicit analytical solutions for the parameters. Thus, the estimates can be obtained by means of numerical procedures such as Newton-Raphson method. Since we can only get the symbolic computation for \(I\left( \theta \right)\), the calculation process takes too long during simulations. Therefore, we will find the lower and upper bounds for \(I\left( \theta \right)\).

Theorem 2

For \(\theta \in \left[ 0,1\right]\), \(I\left( \theta \right)\) lies between \(\frac{\theta }{1-\theta }\ln {\left( \theta \right) }+\frac{3-\theta }{2}\) and \(\frac{\theta \left( 2-\theta \right) }{2\left( 1-\theta \right) }\ln {\left( \theta \right) }+\frac{7-5\theta }{4}\) i.e.

$$\frac{\theta }{1-\theta }\ln {\left( \theta \right) }+\frac{3-\theta }{2}\le I\left( \theta \right) \le \frac{\theta \left( 2-\theta \right) }{2\left( 1-\theta \right) }\ln {\left( \theta \right) }+\frac{7-5\theta }{4}$$

Proof

(lower bound) Let write inequality \(k^2\le k(k+1)\) for all k, then \(\frac{{\left( 1-\theta \right) }^k}{k^2}\ge \frac{{\left( 1-\theta \right) }^k}{k(k+1)}\) holds. We have the following lower bound for \(I\left( \theta \right)\) when summation is made over k

$$I\left( \theta \right) \ge \sum _{k=1}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k\left( k+1\right) }=\sum _{k=1}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}-\sum _{k=1}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k+1}.$$

According to convergence test (comparison test) of infinite series, since \(\sum _{k=1}^{\infty }{\left( 1-\theta \right) }^k\) is a convergent geometric series, two infinite series in the right hand side of inequality above are both convergent. By using Fubini’s theorem for these series respectively we have

$$\begin{aligned} \sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}&=\sum _{k=2}^{\infty }\left( \int _{\theta }^1{\left( 1-z\right) }^{k-1}dz\right) =\int _{\theta }^1\left( \sum _{k=2}^{\infty }{\left( 1-z\right) }^{k-1}\right) dz \\&=\theta -\ln {\left( \theta \right) }-1\ \end{aligned}$$
(10)

and

$$\begin{aligned} \sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k+1}&=\frac{1}{\left( 1-\theta \right) }\sum _{k=2}^{\infty }\left( \int _{\theta }^1{\left( 1-z\right) }^kdz\right) =\frac{1}{\left( 1-\theta \right) }\int _{\theta }^1\left( \sum _{k=2}^{\infty }{\left( 1-z\right) }^k\right) dz \\&=\frac{-2\ln {\left( \theta \right) }+4\theta -{\theta }^2-3}{2\left( 1-\theta \right) } \end{aligned}$$
(11)

By subtracting first term from the second and adding \(\left( 1-\theta \right)\), then we get the lower bound for \(I\left( \theta \right)\).

(upper bound) Let write inequality \(k^2\ge k^2-1\) for all \(k=2,3,\ldots\), then we have the upper bound for \(\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k^2}\) as below:

$$\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k^2}\le \frac{1}{2}\left[ \sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k-1}-\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k+1}\right]$$
(12)

Let’s add and subtract the term \(\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}\) in bounds above, then

$$\left( \sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k-1}-\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}\right) +\left( \sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}-\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k+1}\right).$$

First term can be rewritten following form

$$\begin{aligned} \sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k-1}-\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}&={\left( 1-\theta \right) }^2+\sum _{k=3}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k-1}-\frac{{\left( 1-\theta \right) }^2}{2}-\sum _{k=3}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}\\&=\frac{{\left( 1-\theta \right) }^2}{2}+\left( 1-\theta \right) \left( \sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}-\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k+1}\right) \end{aligned}$$

Thus, (12) can be expressed by

$$\frac{{\left( 1-\theta \right) }^2}{2}+\left( 2-\theta \right) \left( \sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k}-\sum _{k=2}^{\infty }\frac{{\left( 1-\theta \right) }^k}{k+1}\right) .$$

The latter is combined with the expressions (10) and (11) together then we have

$$\frac{{\left( 1-\theta \right) }^2}{2}+\left( 2-\theta \right) \left( \frac{\theta }{1-\theta }\ln {\left( \theta \right) }+\frac{1+\theta }{2}\right).$$

If this result is placed in position in the brackets in the expression (12), and adding the term \(\left( 1-\theta \right)\), then the upper bound is obtained.

Graph below shows that these bounds are eligible for \(I\left( \theta \right)\), so, this leads us to solve moment estimate by using these bounds (Fig. 6).

Fig. 6
figure 6

Lower and upper bounds for \(I(\theta )\)

Now let’s go back to the moments estimation problem. From the Eq. (8) we get the equality for \(\beta\) and replace it in (9), then we have the following equation to get a solution for \(\theta\)

$$\left( 1+2\theta \right) \left( 1-\theta \right) \frac{-\ln {\left( \theta \right) }+2I\left( \theta \right) }{{\left( 1-\theta -2\theta \ln {\left( \theta \right) }\right) }^2}-\frac{m_2}{2m_1}=0.$$

Solution was obtained by putting lower and upper limits in place of \(I\left( \theta \right)\), and applying Newton Raphson’s method.

Estimation by maximum likelihood

Let \(x\ =\ (x_1,\ x_2,\ldots , x_n)\) be an observation of size n from the EMDL distribution with parameters \(\theta\) and \(\beta\). The log likelihood \(\ell\) = \(\ell (\theta ,\beta ;\ x)\) for \((\theta ,\beta )\) is

$$\begin{aligned} \ell \left( \theta ,\beta ;x\right)&=n\ln {\beta }+n\ln {\left( \frac{{\theta }^2}{1+2\theta }\right) }-\beta \sum _{i=1}^nx_i+\sum _{i=1}^n\ln {\ \left( 3-\left( 1-\theta \right) e^{-\beta x_i}\right) } \\&\quad -3\sum _{i=1}^n\ln {\ \left( 1-\left( 1-\theta \right) e^{-\beta x_i}\right) } \end{aligned}$$
(13)

and subsequently differentiating (13) with respect to \(\theta\) and \(\beta\) yields the likelihood equations for \((\theta ,\beta )\)

$$\begin{aligned} \frac{\partial \ell }{\partial \theta }= & \frac{2n\left( 1+\theta \right) }{\theta \left( 1+2\theta \right) }+\sum _{i=1}^n\frac{e^{-\beta x_i}}{3-\left( 1-\theta \right) e^{-\beta x_i}}-3\sum _{i=1}^n\frac{e^{-\beta x_i}}{1-\left( 1-\theta \right) e^{-\beta x_i}}=0 \\ \frac{\partial \ell }{\partial \beta }= & \frac{n}{\beta }-\sum _{i=1}^nx_i+\sum _{i=1}^n\frac{x_i(1-\theta )e^{-\beta x_i}}{3-\left( 1-\theta \right) e^{-\beta x_i}}-3\sum _{i=1}^n\frac{x_i(1-\theta )e^{-\beta x_i}}{1-\left( 1-\theta \right) e^{-\beta x_i}}=0 \end{aligned}$$

The solution of two equations above does not have a closed form, therefore numerical techniques can be used to solve the above system of equations.

We investigate below conditions for the solution of this system of equations for \(\beta\) and \(\theta\).

Proposition 1

If \(\frac{n}{2}<\sum _{i=1}^{n}e^{-\beta x_{i}}\) , then the equation \(\partial \ell /\partial \theta =0\) has at least one root in \(\left( 0,1\right)\) , where \(\beta\) is the true value of the parameter.

Proof

Let \(\omega \left( \theta \right)\) denote the function on the RHS of the expression \(\partial \ell /\partial \theta\), then it is clear that \(\lim \limits _{\theta \rightarrow 0}\omega \left( \theta \right) =+\infty\) and \(\lim \limits _{\theta \rightarrow 1}\omega \left( \theta \right) =\frac{4n }{3}+\frac{1}{3}\sum _{i=1}^{n}e^{-\beta x_{i}}-3\sum _{i=1}^{n}e^{-\beta x_{i}}\). Therefore, the equation \(\omega \left( \theta \right) =0\) has at least one root in \(\left( 0,1\right)\), if \(\frac{n}{2}-\sum _{i=1}^{n}e^{-\beta x_{i}}<0\). \(\square\)

Proposition 2

If \(\theta\) is the true value of the parameter, the root of the equation \(\partial \ell /\partial \beta =0\) lies in the interval \(\left[ \frac{1}{ \overline{X}}\frac{\theta }{\left( 3-2\theta \right) },~\frac{1}{\overline{X} }\frac{2+\theta }{\left( 1+2\theta \right) }\right]\).

Proof

Let \(\omega \left( \beta \right)\) denote the function on the RHS of the expression \(\partial \ell /\partial \beta\), then

$$\begin{aligned} \omega \left( \beta \right) \le \frac{n}{\beta }-\sum _{i=1}^{n}x_{i}+\sum _{i=1}^{n}\frac{x_{i}(1-\theta )e^{-\beta x_{i}}}{3-(1-\theta )e^{-\beta x_{i}}} \end{aligned}$$

Note that \(3-(1-\theta )e^{-\beta x_{i}}\ge 2+\theta\) and \(e^{-\beta x_{i}}\le 1\). Hence,

$$\begin{aligned} \omega \left( \beta \right) \le \frac{n}{\beta }-\sum _{i=1}^{n}x_{i}+\frac{(1-\theta )}{2+\theta }\sum _{i=1}^{n}x_{i}. \end{aligned}$$

Therefore, \(\omega \left( \beta \right) \le 0\) when \(\beta \ge \frac{1}{ \overline{x}}\frac{2+\theta }{1+2\theta }\). On the other hand,

$$\begin{aligned} \omega \left( \beta \right) \ge \frac{n}{\beta }-\sum _{i=1}^{n}x_{i}-3\sum _{i=1}^{n}\frac{x_{i}(1-\theta )e^{-\beta x_{i}}}{1-(1-\theta )e^{-\beta x_{i}}}. \end{aligned}$$

By noting \(1-(1-\theta )e^{-\beta x_{i}}\ge \theta\) and \(e^{-\beta x_{i}}\le 1\). Hence,

$$\begin{aligned} \omega \left( \beta \right) \ge \frac{n}{\beta }-\left( 1+\frac{3\left( 1-\theta \right) }{\theta }\right) \sum _{i=1}^{n}x_{i}. \end{aligned}$$

Therefore, \(\omega \left( \beta \right) \ge 0\) when \(\beta \le \frac{1}{ \overline{x}}\frac{\theta }{3-2\theta }\). Thus, there is at least one root of \(\omega \left( \beta \right) =0\) in the interval \(\left( \frac{1}{ \overline{X}}\frac{\theta }{\left( 3-2\theta \right) },~\frac{1}{\overline{X} }\frac{2+\theta }{\left( 1+2\theta \right) }\right)\). Recently, EM algorithm has been used by several authors to find the ML estimates of compound distributions’ parameters. EM algorithm which is used to make maximizing the complete data loglikelihood is useful when observed log likelihood equations are difficult to solve. However EM algorithm plays a crucial role for getting parameter estimates in such compound distribution as long as equations obtained in E-step are more simple and clear. \(\square\)

Estimation by EM algorithm

The hypothetical complete-data (xm) density function is given by

$$\begin{aligned} f\left( x,m;\theta ,\beta \right) =\frac{\beta {\theta }^2}{1+2\theta }m\left( m+2\right) {\left( 1-\theta \right) }^{m-1}e^{-\beta mx} \end{aligned}$$

for \(x\epsilon R_+, m=1,2,\ldots ,\ \theta \epsilon \left( 0,1\right) , \beta >0\). Here, \(\theta\) and \(\beta\) are the parameters of the exponential-zero truncated Lindley distribution. According to E-step of EM cycle, we need to compute the conditional expectation of M with given \(X=x\). Therefore, immediately let’s write conditional probability mass function as below:

$$\begin{aligned} P\left( M=m \vert {x};\theta ,\beta \right) =\frac{{\left( 1-r\right) }^3}{3-r}m\left( m+2\right) r^{m-1},\ \end{aligned}$$
(14)

for \(m=1,2,\ldots\), where \(r=\left( 1-\theta \right) e^{-\beta x}\). By using equation (14), we can find the conditional expectation of M to complete E-step as

$$\begin{aligned} \delta (x;\theta ,\beta )=E\left( M\vert {x};\theta ,\beta \right) =6\left[ \frac{1}{\left( 3-r\right) \left( 1-r\right) }\right] -1.\ \end{aligned}$$

M-step of each iteration requires maximization of complete-data likelihood function defined over \(\left( \theta ,\beta \right)\). Let’s \(\ell _c\) indicate complete-data log likelihood function, i.e. \(\ln {L} \left( \theta ,\beta ; x, m\right)\) then

$$\begin{aligned} \ell _c&=n\ln {\beta }+2n\ln {\theta }-n\ln {\left( 1+2\theta \right) }+\sum _{i=1}^n\ln {\left( m_i\left( m_i+2\right) \right) }\\&\quad +\ln {\left( 1-\theta \right) }\sum _{i=1}^n\left( m_i-1\right) -\beta \sum _{i=1}^nm_ix_i.\ \end{aligned}$$

Hence, the likelihood equations can be verified by evaluating \(\frac{\partial \ell _c}{\partial \theta }=0\) and \(\frac{\partial \ell _c}{\partial \beta }=0\) as below:

$$\begin{aligned} &\frac{2\left( 1-\theta \right) }{\theta }-\frac{2\left( 1-\theta \right) }{1+2\theta }+1=\frac{\sum _{i=1}^nm_i}{n} \\ &\frac{1}{\beta }=\frac{\sum _{i=1}^nm_ix_i }{n} \end{aligned}$$
(15)

The M-step is completed with the missing observations of \(M_i\) replaced by \(\delta (x_i;{\theta }^{\left( t\right) },{\beta }^{\left( t\right) }\ )\). Thus, iterative solution of the system of equations in (15) is given by

$$\begin{aligned}&{\theta }^{\left( t+1\right) } =\frac{\left( 1-\frac{\sum _{i=1}^n{k_i}^{\left( t\right) }}{n}\right) +\sqrt{{\left( 1-\frac{\sum _{i=1}^n{k_i}^{\left( t\right) }}{n}\right) }^2+16\frac{\sum _{i=1}^n{k_i}^{\left( t\right) }}{n}}}{4\frac{\sum _{i=1}^n{k_i}^{\left( t\right) }}{n}}\\&{\beta }^{\left( t+1\right) }=\frac{n}{\sum _{i=1}^nx_i{k_i}^{\left( t\right) }} \end{aligned}$$

where \({k_i}^{\left( t\right) }=\delta (x_i;{\theta }^{\left( t\right) },{\beta }^{\left( t\right) }\ )\) and \({r_i}^{\left( t\right) }=\left( 1-{\theta }^{\left( t\right) }\right) e^{-{\beta }^{\left( t\right) }x_i}\).

The information matrix

We first calculate the elements of expected Hessian matrix of \(\ell\) with respect to the distribution of X. According to that, let \(a_{ij}\)’s denote expected values of the second derivatives of \(\ell\) with respect to \(\theta ,\beta\) where \((i,j=1,2)\). Then we have

$$\begin{aligned} a_{11}&=E\left( \frac{{\partial }^2\ell }{\partial {\theta }^2}\right) \\&=\frac{-2n}{{\theta }^2}+\frac{4n}{{\left( 1+2\theta \right) }^2} -\,\frac{n{\theta }^2}{\left( 1+2\theta \right) {\left( 1-\theta \right) }^3}\left( \int _0^{1-\theta }\frac{r^2}{{\left( 3-r\right) \left( 1-r\right) }^3}dr-3\int _0^{1-\theta }\frac{r^2\left( 3-r\right) }{{\left( 1-r\right) }^5}dr\right)\\ &=\frac{-2n}{{\theta }^2}+\frac{4n}{{\left( 1+2\theta \right) }^2} -\,\frac{n{\theta }^2}{\left( 1+2\theta \right) {\left( 1-\theta \right) }^3}\left( \frac{9}{8}ln\left( \frac{2+\theta }{3\theta }\right) +\frac{1-5\theta }{4{\theta }^2}+1-\frac{3}{2}\frac{{\left( 1-\theta \right) }^3\left( 1+\theta \right) }{{\theta }^4}\right) \\ a_{22}&=E\left( \frac{{\partial }^2\ell }{\partial {\beta }^2}\right) =\frac{-n}{{\beta }^2}+\frac{12{n\theta }^2}{{\beta }^2\left( 1+2\theta \right) \left( 1-\theta \right) }\int _0^{1-\theta }\frac{r\left( 2-r\right) }{\left( 3-r\right) {\left( 1-r\right) }^5}{\left( \ln {\left( \frac{r}{1-\theta }\right) }\right) }^2dr\\ a_{12}&=a_{21}=E\left( \frac{{\partial }^2\ell }{\partial \theta \ \partial \beta }\right) =\frac{-12n{\theta }^2}{\beta \left( 1+2\theta \right) {\left( 1-\theta \right) }^2}\int _0^{1-\theta }\frac{r\left( 2-r\right) }{\left( 3-r\right) {\left( 1-r\right) }^5}\ln {\left( \frac{r}{1-\theta }\right) }dr \end{aligned}$$

Thus, Fisher information matrix, \(\ I_n\left( \theta ,\beta \right)\) of sample size n for \(\left( \theta ,\beta \ \right)\) is as follows:

$$\begin{aligned} I_n\left( \theta ,\beta \right) =- \begin{bmatrix} E\left( \frac{\partial ^2 \ell }{\partial {\theta }^2}\right)&\quad E\left( \frac{{\partial }^2 \ell }{\partial \theta \ \partial \beta }\right) \\ E\left( \frac{{\partial }^2 \ell }{\partial \theta \ \partial \beta }\right)&\quad E\left( \frac{{\partial }^2 \ell }{\partial {\beta }^2}\right) \end{bmatrix} = - \begin{bmatrix} a_{11}&\quad a_{12} \\ a_{12}&\quad a_{22} \end{bmatrix} \end{aligned}$$

Inverse of the Fisher-information matrix of single observation, i.e., \(I_1^{-1}\left( \theta ,\beta \right)\) indicates asymptotic variance-covariance matrix of ML estimates of \(\left( \theta ,\beta \right)\). Hence, joint distribution of maximum likelihood estimator for \(\left( \theta ,\beta \right)\) is asymptotically normal with mean \(\left( \theta ,\beta \right)\) and variance-covariance matrix \(I_1^{-1}\left( \theta ,\beta \right)\). Namely,

$$\begin{aligned} \sqrt{n}\left( \left[ \begin{array}{ cc} \hat{\theta } \\ \hat{\beta } \end{array}\right] -\left[ \begin{array}{ cc} \theta \\ \beta \end{array}\right] \right) \sim AN\left( \left[ \begin{array}{ cc} 0 \\ 0 \end{array}\right] ,I_1^{-1}\left( \theta ,\beta \right) \right) . \end{aligned}$$

We have the 200 simulated data sets with sample size of \(n=50\) from the EMDL distribution with known parameters as \(\theta =0.6\) and \(\beta =0.3\). Based on the asymptotic normal distribution, confidence ellipsoid of ML estimates for \(\left( \theta ,\beta \right)\) can be drawn at the 95 % confidence level as follows.

Firstly, we present the asymptotic distribution of the ML estimates,

$$\begin{aligned} \sqrt{50}\left( \left[ \begin{array}{ cc} \hat{\theta } \\ \hat{\beta } \end{array}\right] -\left[ \begin{array}{ cc} 0.6 \\ 0.3 \end{array}\right] \right) \sim AN\left( \left[ \begin{array}{ cc} 0 \\ 0 \end{array}\right] ,{\left[ \begin{array}{ cc} 1.7622 & \quad -3.4018 \\ -3.4018 & \quad 10.3494 \end{array}\right] }^{-1}\right) \end{aligned}$$

Now, let \(\mu =\left[ \begin{array}{ cc} 0.6 \\ 0.3 \end{array}\right]\) indicate the center of ellipsoid, and observed information matrix is calculated as \(\hat{I}\left( \theta ,\beta \ \right) =\left[ \begin{array}{ cc} 1.7622 & -3.4018 \\ -3.4018 & 10.3494 \end{array}\right]\) (note that \(\hat{I}\mathop {\rightarrow }\limits ^{P}I\)). Then the confidence ellipsoid at the level 95 % is defined by \(50{\left( \left[ \begin{array}{ cc} \hat{\theta } \\ \hat{\beta } \end{array}\right] -\mu \right) }^{\prime}\hat{I}\left( \left[ \begin{array}{ cc} \hat{\theta } \\ \hat{\beta } \end{array}\right] -\mu \right) \le 5.99\) where 5.99 is a critical value of the chi-squared distribution with two degrees of freedom with upper percentiles 95 % (Fig. 7).

Fig. 7
figure 7

Confidence region for \(\left( \hat{\theta },\hat{\beta }\right)\)

Simulation study

We conduct a simulation study generating 200 samples, each of which has a sample size of \(n\ =10, 20, 50, 100\). We computed the moment (using lower and upper bounds) and ML (Newton-Raphson and EM algorithm) estimates of the parameters for every sample size level with different values of \(\theta\) and \(\beta\). From each generated sample of a given size n the root mean square errors (RMSE) of four estimates are also calculated. These results are tabulated in Table 1.

Table 1 Simulation results for moment, ML, EM estimates for different parameter values

It is observed from the tables that when \(\beta >\theta\), the ML estimates of \(\theta\) and \(\beta\) are better than the others with respect to the RMSE. When \(\theta >\beta\), the moment estimates (both bounds) are as good as ML and EM estimates. Even for small sample size n, moment estimates are a little better.

Applications

We illustrate the applicability of EMDL distribution by considering three different data sets which have been examined by a lot of other researchers. First data set is tried to be modeled by Transmuted Pareto and Lindley Distributions, second and third data sets are tried to be modeled by the Exponential-Poisson (EP) and Exponential-Geometric (EG) distributions. In order to compare distributional models, we consider some criteria as K-S (Kolmogorow-Smirnow), \(-2LL\)(−2LogL), AIC (Akaike information criterion) and BIC (Bayesian information criterion) for the data sets.

Data Set1 The data consist of the exceedances of flood peaks (in m3/s) of the Wheaton River near Carcross in Yukon Territory, Canada. The data consist of 72 exceedances for the years 1958–1984, rounded to one decimal place. These data were analyzed by Choulakian and Stephens (2001) and are given in Table 2. Later on, Beta-Pareto distribution was applied to these data by Akinsete et al. (2008). Merovcia and Pukab (2014) made a comparison between Pareto and transmuted Pareto distribution. They showed that better model is the transmuted Pareto distribution (TP). Bourguignon et al. (2013) proposed Kumaraswamy (Kw) Pareto distribution (Kw-P). Tahir et al. (2014) have proposed weibull-Pareto distribution (WP) and made a comparison with Beta Exponentiated Pareto (BEP) distriubtion. Nasiru and Luguterah (2015) have proposed different type of weibull-pareto distribution (NWP). Mahmoudi (2011) concluded that the Beta-Generalized Pareto (BGP) distribution fits better to these data than the GP, BP, Weibull and Pareto models.

We fit data to EMDL distribution and get parameter estimates as \(\hat{\theta }=0.7782\), \(\hat{\beta }=0.0695\). According to the model selection criteria (AIC, or BIC) tabulated in Table 3, it is said that EMDL takes fifth place in amongst 10 proposed models.

Table 2 Exceedances of Wheaton river flood data
Table 3 Model selection criteria for river flood data

Data Set2 The data set given in Table 4, contains the time intervals (in days) between coal mine accidents caused death of 10 or more men. Firstly, this data set was obtained by Maguire et al. (1952). There were lots of models on this data set such as Adamidis and Loukas (1998) and Kus (2007). They suggested to use Exponential-Geometric (EG) and Exponential-Poisson (EP) distributions respectively. On the other hand, Yilmaz et al. (2016) have proposed two-component mixed exponential distribution (2MED) for modeling this data set. In addition to these three models, we try to fit this data set by using EMDL distribution and we get the parameter estimates as \(\hat{\theta }=\ 0.5239\) and \(\hat{\beta }=\ 0.0025\). We have only K-S and p values which are tabulated in Table 5 to make a comparison.

According to Table 5, EMDL distribution fits better than EG distribution.

Table 4 The time intervals (in days) between coal mine accidents
Table 5 K-S and p values for EP, EG, 2MED and EMDL

Data Set3 The data set in Table 6 obtained by Kus (2007) includes the time intervals (in days) of the successive earthquakes with magnitudes greater than or equal to 6 Mw. Kus (2007) has used this data set to show the applicability of the EP distribution and he made a comparison between EG and EP distributions with K-S statistic. Parameter esitmates of EMDL distribution are \(\hat{\theta }=0.3540\), \(\hat{\beta }=0.0003\). Calculated K-S statistic for EMDL can be seen in Table 7, according to this, EMDL distribution gives the best fit to earthquake data in three models.

Table 6 Time intervals of the successive earthquakes in North Anatolia fault zone
Table 7 K-S and p values for EP, EG and EMDL

Conclusions

In this paper we have proposed a new lifetime distribution, which is obtained by compounding the modified discrete Lindley distribution (MDL) and exponential distribution, referred to as the EMDL. Some statistical characteristics of the proposed distribution including explicit formulas for the probability density, cumulative distribution, survival, hazard and mean residual life functions, moments and quantiles have been provided. We have proposed bounds to solve moment equations. We have derived the maximum likelihood estimates and EM estimates of the parameters and their asymptotic variance-covariance matrix. Simulation studies have been performed for different parameter values and sample sizes to assess the finite sample behaviour of moments, ML and EM estimates. The usefulness of the new lifetime distribution has been demonstrated in three data sets. EMDL distribution fits better for the third data set consisting of the times between successive earthquakes in North Anatolia fault zone than the EP and EG.

References

  • Adamidis K, Dimitrakopoulou T, Loukas S (2005) On an extension of the exponential-geometric distribution. Stat Probab Lett 73(3):259–269

    Article  MathSciNet  MATH  Google Scholar 

  • Adamidis K, Loukas S (1998) A lifetime distribution with decreasing failure rate. Stat Probab Lett 39(1):35–42

    Article  MathSciNet  MATH  Google Scholar 

  • Akinsete A, Famoye F, Lee C (2008) The beta-Pareto distribution. Statistics 42(6):547–563

    Article  MathSciNet  MATH  Google Scholar 

  • Bakouch HS, Jazi MA, Nadarajah S (2014) A new discrete distribution. Statistics 48(1):200–240

    Article  MathSciNet  MATH  Google Scholar 

  • Barreto-Souza W, Bakouch HS (2013) A new lifetime model with decreasing failure rate. Statistics 47(2):465–476

    Article  MathSciNet  MATH  Google Scholar 

  • Bourguignon M, Silva RB, Zea LM, Cordeiro GM (2013) The Kumaraswamy Pareto distribution. J Stat Theory Appl 12(2):129–144

    Article  MathSciNet  Google Scholar 

  • Chahkandi M, Ganjali M (2009) On some lifetime distributions with decreasing failure rate. Comput Stat Data Anal 53(12):4433–4440

    Article  MathSciNet  MATH  Google Scholar 

  • Choulakian V, Stephens MA (2001) Goodness-of-fit tests for the generalized Pareto distribution. Technometrics 43(4):478–484

    Article  MathSciNet  Google Scholar 

  • Gómez-Déniz E, Calderín-Ojeda E (2011) The discrete Lindley distribution: properties and applications. J Stat Comput Simul 81(11):1405–1416

    Article  MathSciNet  MATH  Google Scholar 

  • Gui W, Zhang S, Lu X (2014) The Lindley-Poisson distribution in lifetime analysis and its properties. Hacet J Math Stat 43(6):1063–1077

    MathSciNet  MATH  Google Scholar 

  • Hajebi M, Rezaei S, Nadarajah S (2013) An exponential-negative binomial distribution. REVSTAT Stat J 11(2):191–210

    MathSciNet  MATH  Google Scholar 

  • Kus C (2007) A new lifetime distribution. Comput Stat Data Anal 51(9):4497–4509

    Article  MathSciNet  MATH  Google Scholar 

  • Maguire BA, Pearson ES, Wynn AHA (1952) The time intervals between industrial accidents. Biometrika 39(1/2):168–180

    Article  MATH  Google Scholar 

  • Mahmoudi E (2011) The beta generalized Pareto distribution with application to lifetime data. Math Comput Simul 81(11):2414–2430

    Article  MathSciNet  MATH  Google Scholar 

  • Merovcia F, Pukab L (2014) Transmuted pareto distribution. ProbStat Forum 07:1–11

    MathSciNet  Google Scholar 

  • Nasiru S, Luguterah A (2015) The new Weibull-Pareto distribution. Pak J Stat Oper Res 11(1):103–114

    Article  MathSciNet  MATH  Google Scholar 

  • Tahir MH, Cordeiro GM, Alzaatreh A, Mansoor M, Zubair M (2016) A new Weibull-Pareto distribution: properties and applications. Commun Stat Simul Comput 45(10):3548–3567

    Article  MathSciNet  Google Scholar 

  • Tahmasbi R, Rezaei S (2008) A two-parameter lifetime distribution with decreasing failure rate. Comput Stat Data Anal 52(8):3889–3901

    Article  MathSciNet  MATH  Google Scholar 

  • Yilmaz M, Potas N, Buyum B (2016) A classical approach to modeling of coal mine data. Chaos, complexity and leadership. Springer, New York, pp 65–73

    Google Scholar 

Download references

Authors' contributions

This work was carried out in cooperation among all the authors (MY, MH and SAK). All authors read and approved the final manuscript.

Acknowledgements

This research has not been funded by any entity.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Yilmaz.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yilmaz, M., Hameldarbandi, M. & Acik Kemaloglu, S. Exponential-modified discrete Lindley distribution. SpringerPlus 5, 1660 (2016). https://doi.org/10.1186/s40064-016-3302-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40064-016-3302-2

Keywords

Mathematics Subject Classification