Mixture models for analyzing product reliability data: a case study
 S. Ruhi^{1}Email author,
 S. Sarker^{2} and
 M. R. Karim^{2}
Received: 16 July 2015
Accepted: 9 October 2015
Published: 22 October 2015
Abstract
In the case of manufactured products, there are situations where some components of a product are produced over a period of time by collecting items from different vendors, using different raw materials, machines, and manpower. The physical characteristics and the reliabilities of such components may be different, but sometimes it is difficult to distinguish them clearly. In such situations, mixtures of distributions are often used in the analysis of reliability data for these components. Here a twofold Weibull–Weibull mixture model is applied to analyze product reliability data that consist of both failure and censored lifetimes. The Expectation–Maximization (EM) algorithm is used to find the maximum likelihood estimates of the model parameters. As a case study, it analyses an Aircraft component (Windshield) failure data and various characteristics of the mixture model, such as the reliability function, B10 life, mean time to failure, etc., are estimated to assess the reliability of the component. Simulation studies are performed to investigate the properties and uses of the proposed method.
Keywords
Case study Data analysis EM algorithm Mixture model ReliabilityBackground
Reliability of a product is defined by the probability that the product will perform well it is intended function for a specified time period or usage limit under normal operating condition (Meeker and Escobar 1998; Blischke and Murthy 2000). Because of rapid advances in manufacturing technology, customers expect to purchase products that will be highly sophisticated, reliable and safe. In recent years many manufacturers are collecting and analyzing field failure data to enhance the reliability of their products and to improve goodwill and customer satisfaction (Blischke et al. 2011).

The graphical method yields very crude estimates unless applied repeated iteration and evaluated by vision. As such, they can be used as starting points for more sophisticated statistical methods.

They do not provide any statistical confidence limits for the estimated parameters.
To overcome these drawbacks, here we apply the Expectation–Maximization (EM) algorithm to find the maximum likelihood estimates of the twofold Weibull mixture model and investigate the performance of the proposed method over the method of Murthy et al. (2004). The performance of the method will be evaluated by numerical simulation studies.
The outline of the paper is as follows. “Mixture models” describes the assumed mixture model. “Maximumlikelihood estimation of model parameters” explains the parameter estimation method. “Case study: analysis of aircraft Windshield failure” data presents a case study based on aircraft Windshield failure data. “Simulation study” presents a simulation study to investigate the performance of the method and “Conclusion” concludes the paper. Finally Appendix provides R codes that used in the paper for estimating the parameters of the model.
Mixture models
Various types of statistical models have been applied extensively in the analysis of failure data for manufactured products. However, there are situations where some components of a product are produced over a period of time by collecting items from different vendors, using different raw materials, machines, and manpower. In such situations, mixtures of distributions are often used in the analysis of reliability data as the physical characteristics and the reliabilities of such components may be different and difficult to distinguish easily and clearly.
Special case: twofold Weibull mixture model
Maximumlikelihood estimation of model parameters
Estimation of mixing proportions using EM Algorithm
The Expectation–Maximization (EM) algorithm is an efficient iterative procedure to compute the Maximum Likelihood Estimates (MLEs) of the parameters of the distribution in the presence of missing or hidden data (Dempster et al. 1977; McLachlan and Krishnan 2008). Bordes and Chauveau (2012) discussed several iterative methods based on EM and stochastic EM methodology to estimate parametric or semi parametric mixture models for randomly right censored lifetime data, conditioned that they are identifiable. Here we discuss the EM algorithm for finding the MLEs of the parameters of a general Kfold mixture model with parameters \(\Theta = (p_{ 1} , \cdots , p_{K} , \theta_{ 1} , \cdots , \theta_{K} )\), where p _{ j } is mixing parameters and θ _{ j } is the parameters for the density function f _{ j }, \(j = 1, 2, \ldots , K\). Let \(y = \left( {t_{ 1} , \ldots , t_{n} } \right)^{\prime }\) denotes the observed random sample obtained from the mixture density. Let us introduce the unobservable or missing data vectors \(z = \left( {z_{ 1}^{\prime } , \ldots , z_{n}^{\prime } } \right)^{\prime }\) , where z _{ i } is a Kdimensional vector of zero–one indicator variables and where z _{ ij } is one or zero according to whethert _{ i }arose or did not arise from the jth component of the mixture \((i = 1, 2, \ldots , n; \;j = 1, 2, \ldots , K)\). The EM algorithm handles the unobservable data to the problem by working with the current conditional expectation of the completedata log likelihood given the observed data. Let us define the completedata vector x as \(x = \left( {y^{\prime } , z^{\prime } } \right)^{\prime } .\)
For some distributions, it is possible to get closedform analytical expressions for θ _{ j }. However, in the case of Weibull distributions with θ _{ j } = (α _{ j }, β _{ j }), \(j = 1,{ 2}, \ldots ,K\), we have to apply numerical procedures to find MLEs of the parameters. Here we apply the survreg function with weight (weight > 0) given in the survival package of the Rprogram. The algorithm proceeds by using the newly derived parameters as the guess for the next iteration. The E and Msteps are iterated until the algorithm converges.

Step 1 Begin with initial guesses of p _{ j } ^{(0)} , α _{ j } ^{(0)} and β _{ j } ^{(0)} for \(j = 1, 2, \ldots , K\)

Step 2 Using the initial values of p _{ j } ^{(0)} , α _{ j } ^{(0)} and β _{ j } ^{(0)} , at mth iteration calculate the conditional expectation of z _{ ij }, i.e., z _{ ij } ^{(m)} using (15).

Step 3 At themth iteration, find the MLEs of p _{ j } ^{(m+1)} , α _{ j } ^{(m+1} and β _{ j } ^{(m+1)} as follows:
 (a)
Find the MLE for p _{ j } ^{(m+1)} , using (17).
 (b)
Estimate α _{ j } ^{(m+1)} , and β _{ j } ^{(m+1)} using survreg function.
 (a)

Step 4 Repeat Steps 2 and 3 until the algorithm converges with a desired accuracy.
The applications of the EM algorithm are broad because of its flexibility in analyzing incomplete or missing data. In any fields, when it is difficult to maximize the complicated likelihood function, various extensions and modifications of the EM algorithm have been proposed to simplify the computations, e.g., see Wei and Tanner (1990), Meng and Rubin (1993) and Liu and Rubin (1994). More detailed theory and applications of the EM algorithm can be found in McLachlan and Krishnan (2008).
Case study: analysis of aircraft Windshield failure data
In this section, as a case study, we analyze a set of aircraft Windshield failure data. We apply twofold Weibull mixture model for the failure data and estimate various characteristics of the Windshield, such as the reliability function, B10 life, mean time to failure, etc. to assess the reliability of the Windshield.
Aircraft Windshield failure data
Windshield failure data
T  δ  T  δ  T  δ  T  δ  T  δ 

0.040  1  2.154  1  3.595  1  1.183  0  3.003  0 
0.301  1  2.190  1  3.699  1  1.244  0  3.102  0 
0.309  1  2.194  1  3.779  1  1.249  0  3.304  0 
0.557  1  2.223  1  3.924  1  1.262  0  3.483  0 
0.943  1  2.224  1  4.035  1  1.360  0  3.500  0 
1.070  1  2.229  1  4.121  1  1.436  0  3.622  0 
1.124  1  2.300  1  4.167  1  1.492  0  3.665  0 
1.248  1  2.324  1  4.240  1  1.580  0  3.695  0 
1.281  1  2.349  1  4.255  1  1.719  0  4.015  0 
1.281  1  2.385  1  4.278  1  1.794  0  4.628  0 
1.303  1  2.481  1  4.305  1  1.915  0  4.806  0 
1.432  1  2.610  1  4.376  1  1.920  0  4.881  0 
1.480  1  2.625  1  4.449  1  1.963  0  5.140  0 
1.505  1  2.632  1  4.485  1  1.978  0  
1.506  1  2.646  1  4.570  1  2.053  0  
1.568  1  2.661  1  4.602  1  2.065  0  
1.615  1  2.688  1  4.663  1  2.117  0  
1.619  1  2.823  1  4.694  1  2.137  0  
1.652  1  2.890  1  0.046  0  2.141  0  
1.652  1  2.902  1  0.140  0  2.163  0  
1.757  1  2.934  1  0.150  0  2.183  0  
1.795  1  2.962  1  0.248  0  2.240  0  
1.866  1  2.964  1  0.280  0  2.341  0  
1.876  1  3.000  1  0.313  0  2.435  0  
1.899  1  3.103  1  0.389  0  2.464  0  
1.911  1  3.114  1  0.487  0  2.543  0  
1.912  1  3.117  1  0.622  0  2.560  0  
1.914  1  3.166  1  0.900  0  2.592  0  
1.981  1  3.344  1  0.952  0  2.600  0  
2.010  1  3.376  1  0.996  0  2.670  0  
2.038  1  3.385  1  1.003  0  2.717  0  
2.085  1  3.443  1  1.010  0  2.819  0  
2.089  1  3.467  1  1.085  0  2.820  0  
2.097  1  3.478  1  1.092  0  2.878  0  
2.135  1  3.578  1  1.152  0  2.950  0 
Nonparametric estimate of reliability function
Parametric estimate of reliability function
Estimates of parameters of twofold Weibull mixture model
Parameters  Estimates based on WPP  Estimates based on EM algorithm 

\(\hat{\beta }_{1}\)  0.429  1.2098 
\(\hat{\alpha }_{1}\)  8.230  0.2541 
\(\hat{\beta }_{2}\)  2.990  2.7802 
\(\hat{\alpha }_{2}\)  3.210  3.4856 
\(\hat{p}\)  0.136  0.0176 
\(\left( {1  \hat{p}} \right)\)  0.864  0.9823 
From Fig. 2, we observe that the reliability function obtained by the EM algorithm method is much closer to the Kaplan–Meier estimate than that of the reliability function estimated by the WPP plot method. The plots of cdfs shown in Fig. 3 conclude the same. These indicate that the method of estimation with the EM algorithm procedure is better than the WPP plot procedure.
The estimates of adjusted Anderson–Darling (AD) test statistic based on WPP method and EM algorithm method are 412.5845 and 410.2851, respectively. This again indicates that the EM algorithm method provides better fit for the data set than the WPP method.
Reliability Characteristics of Windshield Data
Estimates of reliability characteristics of Windshield
Quantities  EM algorithm method  WPP Plot method 

MTTF  3.0525  5.5782 
B10Lifetime  1.5248  1.3125 
B50Lifetime  3.0046  2.9298 
Table 3 indicates that the estimates of MTTF obtained from maximum likelihood method via the EM algorithm and from WPP plot method are 3.0525 (thousand hours) and 5.5782 (thousand hours), respectively. Estimate of MTTF obtained by EM algorithm is very close to the nonparametric estimate of MTTF (3.03549 thousand hours) given in Fig. 1. The WPP method overestimates the MTTF in this case. From the estimates of B10lifetime and B50lifetime, we may conclude according to EM algorithm method that, 10 % of the total components fail approximately at 1524 h and 50 % fail at 3004 h.
Simulation study
In this section, we use computer simulation to evaluate the performance of the method numerically. Numerically generated twofold mixture data are used to develop the twofold Weibull mixture model and to find the ML estimates of model parameters under right censored data. Using simulated data, the ML estimates of the model parameters, the sample means (SMs) and the mean squared errors (MSEs) of estimates are computed. Simulation programming codes are written using statistical software package R.
Steps of simulation study

Step 1 We consider a set of true value for the 5 parameters θ = {β _{1}, α _{1}, β _{2}, α _{2}, p} of twofold Weibull mixture model. Under this set of parameter, we generate n = n _{1} + n _{2} samples from the twofold Weibull mixture model using the software RLanguage (version3.2.2). A desired percent (10, 20 and 30 %) of the largest generated sample out of 200, are considered as the right censored observations and remaining are assumed as failed lifetime.

Step 2 Based on the generated right censored data, we estimate the parameters via the EM algorithm assuming that the mixing subpopulations are unknown. The methodology is discussed in “Estimation of mixing proportions using EM algorithm”.

Step 3 The above Steps 1 and 2 are repeated 1000 times under two Cases:

Case (i) for a variety percent of censored observations (10, 20 and 30 %) and

Case (ii) for different sample sizes (n = 200, 400 and 600).
We compute the sample means (SMs) and mean squared errors (MSEs) of the estimates for the both Cases (i) and (ii).

Steps 4 Summarize and discuss the simulation results based on 1000 repetition.
Simulation output analysis
Sample means of the MLEs for different percent of censored observations
Parameters  True values  Sample means of the MLEs  

Set01 [N = 200; 10 % cens. obs.]  Set02 [N = 200; 20 % cens. Obs.]  Set03 [N = 200; 30 % cens. Obs.]  
\(\hat{\beta }_{1}\)  3.50  3.7945  3.9587  4.1968 
\(\hat{\alpha }_{1}\)  700.00  699.3536  700.1768  705.3863 
\(\hat{\beta }_{2}\)  1.20  1.1601  1.1454  1.1709 
\(\hat{\alpha }_{2}\)  850.00  944.5603  1055.9218  912.0363 
\(\hat{p}\)  0.30  0.3737  0.3692  0.3732 
\(\left( {1  \hat{p}} \right)\)  0.70  0.6263  0.6308  0.6268 
Sample means of the MLEs for different sample sizes
Parameters  True values  Sample means of the MLEs  

Set04 [N = 200; 20 % cens. obs.]  Set05 [N = 400; 20 % cens. obs.]  Set06 [N = 600; 20 % cens. obs.]  
\(\hat{\beta }_{1}\)  3.50  3.9587  3.8619  3.8365 
\(\hat{\alpha }_{1}\)  700.00  700.1768  703.0983  702.2486 
\(\hat{\beta }_{2}\)  1.20  1.1454  1.1664  1.1850 
\(\hat{\alpha }_{2}\)  850.00  1055.9218  891.5291  873.1364 
\(\hat{p}\)  0.30  0.3692  0.3398  0.3192 
\(\left( {1  \hat{p}} \right)\)  0.70  0.6308  0.6602  0.6808 
Tables 4 and 5 present the summary results of the simulations based on 1000 repetitions under the given true values. In these tables, the first column shows the parameters of the model and second column shows the true values of the parameters. Tables 4 and 5 give the sample means of the MLEs of parameters obtained by the EM algorithm. For all of the sets, the sample means of the estimated parameters are close to the corresponding true values of the parameters. If the percent of censored observations decrease (i.e., if number of failures increase), the sample means of the MLEs become more closers to the true values for all most all sets, as expected. Similarly, the sample means of the MLEs become more closers to the true values for increasing sample sizes.
Mean squared errors for different percent of censored observations
Parameters  True values  Mean squared errors (MSEs) of MLEs  

Set01 [N = 200; 10 % cens. obs.]  Set02 [N = 200; 20 % cens. obs.]  Set03 [N = 200; 30 % cens. obs.]  
\(\hat{\beta }_{1}\)  3.50  2.17953  2.6223  4.2156 
\(\hat{\alpha }_{1}\)  700.00  3232.01855  4873.0835  9777.0312 
\(\hat{\beta }_{2}\)  1.20  0.15011  0.0619  0.0711 
\(\hat{\alpha }_{2}\)  850.00  87452.76431  1614229.0410  269,062.7034 
\(\hat{p}\)  0.30  0.00114  0.0391  0.0500 
\(\left( {1  \hat{p}} \right)\)  0.70  0.03382  0.0391  0.0500 
Mean squared errors for different sample sizes
Parameters  True values  Mean squared errors (MSEs) of MLEs  

Set04 [N = 200; 20 % cens. obs.]  Set05 [N = 400; 20 % cens. obs.]  Set06 [N = 600; 20 % cens. obs.]  
\(\hat{\beta }_{1}\)  3.50  2.6223  1.9544  1.5033 
\(\hat{\alpha }_{1}\)  700.00  4873.0835  1907.8766  1574.6524 
\(\hat{\beta }_{2}\)  1.20  0.0619  0.0294  0.0210 
\(\hat{\alpha }_{2}\)  850.00  1614229.0410  25655.13287  10622.3852 
\(\hat{p}\)  0.30  0.0391  0.0242  0.0159 
\(\left( {1  \hat{p}} \right)\)  0.70  0.0391  0.0242  0.0159 
Conclusion
There are situations where variations in product reliability can be occurred across different component vendors. In such situations, mixture of distributions can model the variability resulting from parts being bought from K different suppliers with F _{ k }(t) denoting the failure distribution for parts obtained from supplier k, \(k = 1,{ 2}, \ldots ,K\). This paper has applied a twofold Weibull mixture model for analyzing product reliability data with failure and censored observations. It has proposed the Expectation–Maximization (EM) algorithm to find the maximum likelihood estimates of the parameters of mixture model and compared this method with a method based on Weibull Probability Paper plots. An aircraft component (Windshield) failure data is analyzed as an example and investigated that the performance of the proposed method of estimation is impressive. The results would be useful for managerial implications in assessing and predicting the reliability of the component more accurately.
The mixture model considered here is the twofold Weibull mixture model. The proposed method is easily extendable for other mixture models also. A scope of the future research with other mixture models and with various types of censored data would be interesting.
Declarations
Authors’ contributions
The authors with the consultation of each other have carried out this work and drafted the manuscript together. All authors have read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Blischke WR, Murthy DNP (2000) Reliability modelling, prediction, and optimization. Wiley, New YorkView ArticleGoogle Scholar
 Blischke WR, Karim MR, Murthy DNP (2011) Warranty data collection and analysis. Springer Verlag, LondonView ArticleGoogle Scholar
 Bordes L, Chauveau D (2012). EM and stochastic EM algorithms for reliability mixture models under random censoring, HalVousConsultezL’ArchiveGoogle Scholar
 Bucar T, Nagode M, Fajdiga M (2004) Reliability approximation using finite Weibull mixture distributions. Reliab Eng Syst Saf 84:241–251View ArticleGoogle Scholar
 Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1–38Google Scholar
 Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633–648View ArticleGoogle Scholar
 McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New YorkView ArticleGoogle Scholar
 Meeker WQ, Escobar LA (1998) Statistical methods for reliability data. Wiley, New YorkGoogle Scholar
 Meng XL, Rubin D (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278View ArticleGoogle Scholar
 Murthy DNP, Xie M, Jiang R (2004) Weibull model. Wiley, New YorkGoogle Scholar
 Ruhi S (2015) Application of mixture models for analyzing reliability data: a case study. Open Access Libr J 2:e1815Google Scholar
 Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704View ArticleGoogle Scholar