Exponential-modified discrete Lindley distribution

In this study, we have considered a series system composed of stochastically independent M-component where M is a random variable having the zero truncated modified discrete Lindley distribution. This distribution is newly introduced by transforming on original parameter. The properties of the distribution of the lifetime of above system have been examined under the given circumstances and also parameters of this new lifetime distribution are estimated by using moments, maximum likelihood and EM-algorithm.

distributions. Although these compound distributions are quite complex, new distributions can fit better than the known distributions for modelling lifetime data.
Probability mass function of the discrete Lindley distribution obtained by discretizing the continuous survival function of the Lindley distribution (Gómez-Déniz and Calderín-Ojeda 2011; Eq. 3, Bakouch et al. 2014;Eq. 3). This discrete distribution provided by authors above, is quite a complex structure in terms of parameter. In order to overcome problems in estimation process of the parameter of Lindley distribution, we propose a modified discrete Lindley distribution. Thus, estimation process of the parameters using especially the EM algorithm was facilitated. Afterwards, we propose a new lifetime distribution with decreasing hazard rate by compounding exponential and modified-zerotruncated discrete Lindley distributions.
This paper is organized as follows: In "Construction of the model" section, we propose the two-parameter exponential-modified discrete Lindley (EMDL) distribution, by mixing exponential and zero truncated modified discrete Lindley distribution, which exhibits the decreasing failure rate (DFR) property. In "Properties of EMDL distribution" section, we obtain moment generating function, quantile, failure rate, survival and mean residual lifetime functions of the EMDL. In "Inference" section, the estimation of parameters is studied by some methods such as moments, maximum likelihood and EM algorithm. Furthermore, information matrix and observed information matrix are also discussed in this section. The end of this section includes a detailed simulation study to see the performance of Moments (with lower and upper bound approximations), ML and EM estimates. Illustrative examples based on three real data sets are provided in "Applications" section.

Construction of the model
In this section, we first give the definition of the discrete Lindley distribution introduced by Gómez-Déniz and Calderín-Ojeda (2011) and Bakouch et al. (2014). We have achieved a more simplified discrete distribution than discrete Lindley distribution by taking 1 − θ instead of e −θ in subsequent definition. Thus, we introduce a new lifetime distribution by compounding Exponential and Modified Discrete Lindley distributions, named the Exponential-Modified Discrete Lindley (EMDL) distribution.

Discrete Lindley distribution
A discrete random variable M is said to have Lindley distribution with the parameter θ > 0, if its probability mass function (p.m.f ) is given by The cumulative distribution function of M will be given by
Theorem 1 MDL distribution can be represented as a mixture of geometric and negative binomial distributions with mixing proportion is θ 1+θ , and a common success rate θ.
Proof If p.m.f in (2) is rewritten as the following form then f 1 indicates p.m.f of a geometric random variable with success probability θ and f 2 indicates p.m.f of a negative binomial random variable which denotes the number of trials until the second success, with common success probability θ. w 1 = θ 1+θ and w 2 = 1 1+θ denote component probabilities; in other words these are called the mixture weights ( Fig. 1).
Note that MDL distribution has an increasing hazard rate while a geometric distribution has a constant hazard rate. So, MDL distribution is more useful than geometric distribution for modelling the number of rare events.
When the θ is closed to zero, then MDL can occure different shapes than the p.m.f of a Geometric distribution. This situation made the distribution thinner right tail than a distribution which is compounded with exponential distribution. Thus, this proposed compound distribution can be usefull for modelling lifetime data such as time interval between successive earthquakes, time period of bacteria spreading, recovery period of the certain disease.

Exponential modified discrete Lindley distribution
Suppose that M is a zero truncated MDL random variable with probablity mass function Thus, we can obtain the marginal probability density function of X as where θ ∈ (0, 1) and β > 0. Henceforth, the distribution of the random variable X having the p.d.f in (3) is called shortly EMDL. By changing of variables r = (1 − θ )e −βx in cumulative integration of (3), the distribution function can be found as follows: . Yilmaz et al. SpringerPlus (2016) 5:1660 Following figure shows different shapes of p.d.f of EMDL random variable for various values of θ and β (Fig. 2).

Properties of EMDL distribution
In this section the important characteristics and features in mathematical statistics and realibility which are moment generating function and moments, quantiles, survival, hazard rate and mean residual life functions of the EMDL distribution are introduced. We will also give a relationship with Lomax and Exponential-Poisson distributions.

Moment generating function and moments
Moment generating function of X is given by for t < β. Hence a closed form of k.th raw moment of X is expressed by for k = 1, 2, . . .. Here for k > 1 raw moments can be calculated numerically for given values of θ since infinite series above can be represented by polylog functions. First and second raw moments are evaluated respectively as

Fig. 3 Survival function of EMDL random variable for selected parameter values
The mean residual life function of X is given by We can see this result immediately below by letting −ln(1 − r) = 1 1−r 1 z dz. Then applying the mean value theorem, we have the upper bound for −ln(1 − r) as r 1−r . If this upper bound is written above, then We have the following graphs of mrl(x) for different values of parameter θ and β (Fig. 5).

Relationship of the other distribution
Let consider the following transformation of X Then the probability density function of Y can be obtained as

Fig. 4 Hazard rate function of EMDL random variable for selected parameter values
It can be easily seen that distribution of Y is a mixture of two Lomax distributions with common scale paramater θ 1 − θ , and α = 1 and α = 2 respectively. Thus, 3θ 1 + 2θ and 1−θ 1+2θ represent the weight probabilities of mixture components.

Inference
In this section the estimation techniques of the parameters of the EMDL distribution are studied using the moments, maximum likelihood and EM algorithm. In particular, because first two moments of the distribution have a very complex structure, we have developed bounds to get a solution more easily. Fisher information matrix and asymptotic confidence ellipsoid for the parameters θ and β are also obtained. A detailed simulation study based on four estimation mehods is located at the end of this section.

Estimation by moments
Let X 1 , X 2 , . . . , X n be a random sample from EMDL distribution and m 1 and m 2 represent the first two sample moments. Then from (4) and (5), we will have the following system of equations

Fig. 5 Mrl function of EMDL random variable for different parameter values
Moment estimates of θ and β can be obtained by solving equations above. However, Eqs. (8) and (9) have no explicit analytical solutions for the parameters. Thus, the estimates can be obtained by means of numerical procedures such as Newton-Raphson method. Since we can only get the symbolic computation for I(θ ), the calculation process takes too long during simulations. Therefore, we will find the lower and upper bounds for I(θ ).
holds. We have the following lower bound for I(θ ) when summation is made over k According to convergence test (comparison test) of infinite series, since ∞ k=1 (1 − θ) k is a convergent geometric series, two infinite series in the right hand side of inequality above are both convergent. By using Fubini's theorem for these series respectively we have and By subtracting first term from the second and adding (1 − θ ), then we get the lower bound for I(θ ).

together then we have
If this result is placed in position in the brackets in the expression (12), and adding the term (1 − θ ), then the upper bound is obtained. Graph below shows that these bounds are eligible for I(θ ), so, this leads us to solve moment estimate by using these bounds (Fig. 6). Now let's go back to the moments estimation problem. From the Eq. (8) we get the equality for β and replace it in (9), then we have the following equation to get a solution for θ Solution was obtained by putting lower and upper limits in place of I(θ ), and applying Newton Raphson's method.

Estimation by maximum likelihood
Let x = (x 1 , x 2 , . . . , x n ) be an observation of size n from the EMDL distribution with parameters θ and β. The log likelihood ℓ = ℓ(θ , β; x) for (θ, β) is and subsequently differentiating (13) with respect to θ and β yields the likelihood equations for (θ, β) The solution of two equations above does not have a closed form, therefore numerical techniques can be used to solve the above system of equations. We investigate below conditions for the solution of this system of equations for β and θ.  Yilmaz et al. SpringerPlus (2016) 5:1660 By noting 1 − (1 − θ)e −βx i ≥ θ and e −βx i ≤ 1. Hence, 1+2θ ) . Recently, EM algorithm has been used by several authors to find the ML estimates of compound distributions' parameters. EM algorithm which is used to make maximizing the complete data loglikelihood is useful when observed log likelihood equations are difficult to solve. However EM algorithm plays a crucial role for getting parameter estimates in such compound distribution as long as equations obtained in E-step are more simple and clear.

Estimation by EM algorithm
The hypothetical complete-data (x, m) density function is given by for xǫR + , m = 1, 2, . . . , θǫ(0, 1), β > 0. Here, θ and β are the parameters of the exponential-zero truncated Lindley distribution. According to E-step of EM cycle, we need to compute the conditional expectation of M with given X = x. Therefore, immediately let's write conditional probability mass function as below: for m = 1, 2, . . ., where r = (1 − θ )e −βx . By using equation (14), we can find the conditional expectation of M to complete E-step as M-step of each iteration requires maximization of complete-data likelihood function defined over (θ , β). Let's ℓ c indicate complete-data log likelihood function, i.e. ln L(θ , β; x, m) then Hence, the likelihood equations can be verified by evaluating ∂ℓ c ∂θ = 0 and ∂ℓ c ∂β = 0 as below: The M-step is completed with the missing observations of M i replaced by δ(x i ; θ (t) , β (t) ) . Thus, iterative solution of the system of equations in (15) is given by

The information matrix
We first calculate the elements of expected Hessian matrix of ℓ with respect to the distribution of X. According to that, let a ij 's denote expected values of the second derivatives of ℓ with respect to θ , β where (i, j = 1, 2). Then we have Thus, Fisher information matrix, I n (θ , β) of sample size n for (θ , β ) is as follows: Inverse of the Fisher-information matrix of single observation, i.e., I −1 1 (θ , β) indicates asymptotic variance-covariance matrix of ML estimates of (θ , β). Hence, joint distribution of maximum likelihood estimator for (θ , β) is asymptotically normal with mean (θ , β) and variance-covariance matrix I −1 1 (θ , β). Namely, We have the 200 simulated data sets with sample size of n = 50 from the EMDL distribution with known parameters as θ = 0.6 and β = 0.3. Based on the asymptotic normal is a critical value of the chi-squared distribution with two degrees of freedom with upper percentiles 95 % (Fig. 7).

Simulation study
We conduct a simulation study generating 200 samples, each of which has a sample size of n = 10, 20, 50, 100. We computed the moment (using lower and upper bounds) and ML (Newton-Raphson and EM algorithm) estimates of the parameters for every sample size level with different values of θ and β. From each generated sample of a given size n the root mean square errors (RMSE) of four estimates are also calculated. These results are tabulated in Table 1. It is observed from the tables that when β > θ, the ML estimates of θ and β are better than the others with respect to the RMSE. When θ > β, the moment estimates (both bounds) are as good as ML and EM estimates. Even for small sample size n, moment estimates are a little better.

Applications
We illustrate the applicability of EMDL distribution by considering three different data sets which have been examined by a lot of other researchers. First data set is tried to be modeled by Transmuted Pareto and Lindley Distributions, second and third data sets  are tried to be modeled by the Exponential-Poisson (EP) and Exponential-Geometric (EG) distributions. In order to compare distributional models, we consider some criteria as K-S (Kolmogorow-Smirnow), −2LL(−2LogL), AIC (Akaike information criterion) and BIC (Bayesian information criterion) for the data sets.

Data Set1
The data consist of the exceedances of flood peaks (in m 3 /s) of the Wheaton River near Carcross in Yukon Territory, Canada. The data consist of 72 exceedances for the years 1958-1984, rounded to one decimal place. These data were analyzed by Choulakian and Stephens (2001) and are given in Table 2. Later on, Beta-Pareto distribution was applied to these data by Akinsete et al. (2008). Merovcia and Pukab (2014) made a comparison between Pareto and transmuted Pareto distribution. They showed that better model is the transmuted Pareto distribution (TP). Bourguignon et al. (2013) proposed Kumaraswamy (Kw) Pareto distribution (Kw-P). Tahir et al. (2014) have proposed weibull-Pareto distribution (WP) and made a comparison with Beta Exponentiated Pareto (BEP) distriubtion. Nasiru and Luguterah (2015) have proposed different type of weibull-pareto distribution (NWP). Mahmoudi (2011) concluded that the Beta-Generalized Pareto (BGP) distribution fits better to these data than the GP, BP, Weibull and Pareto models.
We fit data to EMDL distribution and get parameter estimates as θ = 0.7782, β = 0.0695. According to the model selection criteria (AIC, or BIC) tabulated in Table 3, it is said that EMDL takes fifth place in amongst 10 proposed models.