Selecting statistical model and optimum maintenance policy: a case study of hydraulic pump

Introduction Proper maintenance policy can play a vital role for effective investigation of product reliability. Every engineered object such as product, plant or infrastructure needs preventive and corrective maintenance. Case description In this paper we look at a real case study. It deals with the maintenance of hydraulic pumps used in excavators by a mining company. We obtain the data that the owner had collected and carry out an analysis and building models for pump failures. The data consist of both failure and censored lifetimes of the hydraulic pump. Discussion and evaluation Different competitive mixture models are applied to analyze a set of maintenance data of a hydraulic pump. Various characteristics of the mixture models, such as the cumulative distribution function, reliability function, mean time to failure, etc. are estimated to assess the reliability of the pump. Akaike Information Criterion, adjusted Anderson–Darling test statistic, Kolmogrov–Smirnov test statistic and root mean square error are considered to select the suitable models among a set of competitive models. The maximum likelihood estimation method via the EM algorithm is applied mainly for estimating the parameters of the models and reliability related quantities. Conclusions In this study, it is found that a threefold mixture model (Weibull–Normal–Exponential) fits well for the hydraulic pump failures data set. This paper also illustrates how a suitable statistical model can be applied to estimate the optimum maintenance period at a minimum cost of a hydraulic pump.

(mining company) had collected and carry out an analysis and build models for pump failures. The data given in Murthy et al. (2015) and  consist of both failure and censored lifetimes of the pump. Murthy et al. (2015) and  showed that the threefold Weibull mixture distribution is the best distribution for the data among the three competing distributions (single Weibull, twofold Weibull mixture and threefold Weibull mixture). In this paper we search a suitable distribution for the data from a set of competitive mixture models (based on Weibull, Exponential, Normal and Lognormal distributions). Finally the selected distribution is used to find out the optimum time at which the expected cost for maintenance of the pump will be minimum.
The remainder of the article is organized as follows: "Hydraulic pump failure data" section describes a set of hydraulic pump failure data which will be analyzed in this paper. "Mixture models for modeling failure data" section presents the mixture models for modeling failure data. "Parameter estimation" section presents the MLEs of the parameters of mixture models by applying the Expectation-Maximization (EM) algorithm. "Model selection" section describes about the model selection for the data through graphical and statistical approaches. "Optimum maintenance cost" section expresses a procedure in which we have tried to find out the optimum time at which the expected cost for maintenance of the pump will be minimum. Finally, "Conclusion" section concludes the article with a discussion of the key findings.

Hydraulic pump failure data
The hydraulic pumps considered here are used in excavators by a mining company. In open cut mines, coal and overburden are transported using excavators and dump trucks. An excavator is a complex machine consisting of several systems. The hydraulic system is one of the important systems comprised of several hydraulic pumps (for linear and rotational motions), hydraulic oil filters and several hydraulic lines. A pump is considered to have failed if it cannot provide the required flow rate at the required pressure. The data recorded by the maintenance department consist of the failure times (for units that have failed and required Corrective Maintenance action) and service times (for units that have not failed yet and were sent for Preventive Maintenance action) for 102 U and presented in Table 1. The column, labeled "Age" means the age (in hours) of the item at the end of the data collection period and the column labeled "Type" indicates whether the data is a failure data (denoted by 1) or censored data (denoted by 0). As can be seen the data consists of 45 failures and 57 censored ages. More detail description of the data can be found in Murthy et al. (2015) and .

Mixture models for modeling failure data
A variety of statistical models have been developed and studied extensively in the analysis of product failure data (Kalbfleisch and Prentice 1980;Meeker and Escobar 1998;Blischke and Murthy 2000;Lawless 2003;Murthy et al. 2004). A set of mixture models that have been used to analyze the pump failure data, given in Table 1, are discussed below.
The cumulative distribution function (cdf ) of a general n-fold mixture model involves n subpopulations is given by where p i > 0 and n i=1 p i = 1. Here F i (t) is the cdf of the i-th sub-population and p i is the mixing probability of the i-th sub-population. The corresponding probability density function (pdf ) is given by where f i (t) is the pdf associated with F i (t). And the reliability function is The cumulative distribution functions, probability density functions and reliability functions for the various twofold and threefold mixture models can be obtained from Eqs. (1)-(3) by putting n = 2 and n = 3, respectively. Ruhi et al. (2015) applied a twofold Weibull mixture model for analyzing failure data. More literatures on the applications of mixture models can be found in Titterington et al. (1985), Mendenhall and Hader (1958), Ahmad and Abdelrahman (1994), and Murthy et al. (2004).

Parameter estimation
We estimate the parameters of different mixture models by applying the maximum likelihood estimation method. We apply the Expectation-Maximization (EM) algorithm to find the maximum likelihood estimates (MLEs) of the parameters. Details on the application of EM algorithm for mixture models with censored data can be found in Ateya (2012), Bordes and Chauveau (2012) and Ruhi, et al. (2015).  have applied single Weibull, twofold Weibull mixture and threefold Weibull mixture models for this data set and suggested the threefold Weibull mixture model as the best fitted model on the basis of various graphical and statistical approaches. In addition to threefold Weibull mixture model, here we have assumed two other threefold mixture models (Weibull-Normal-Exponential and Normal-Lognormal-Weibull) for the data. Our aim is to find out whether any other threefold mixture model fits this data set better than the threefold Weibull mixture model or not. And if the distribution changed, what would be its effect on optimal maintenance policy.
The parameters of these three mixture models are estimated by applying maximum likelihood method via the Expectation-Maximization (EM) algorithm. R programming codes are written for all computations of the paper. Programming codes for analyzing the data with Weibull-Normal-Exponential mixture model are given in the "Appendix". The given codes can be used for other two models after simple modifications, mainly related to the functions dweibull(), pweibull(), dnorm(), pnorm(), dexp() and pexp() and the parameter vector theta.
The MLEs of the parameters are displayed in Table 2. In Table 2, the parameters, p 1 , p 2 , and p 3 represent the mixing probabilities of the 1st, 2nd and 3rd sub-populations, respectively.

Table 2 MLEs of the parameters of assumed models
Threefold mixture models MLEs of parameters

Model selection
This section applies the graphical and statistical approaches for selecting the best fitted model for the data set among three competitive threefold mixture models listed in Table 2. A relatively straightforward approach to select a tentative model is to utilize the plotting methodology where the cdfs obtained from parametric estimates are compared with the empirical distribution function. More detail about this comparison can be found in Blischke et al. (2011). The cdfs of threefold Weibull, Weibull-Normal-Exponential and Normal-Lognormal-Weibull mixture models are compared with the empirical distribution function (nonparametric estimate of cdf from Kaplan-Meier (KM) estimate) and the results are displayed in Fig. 1. Figure 1 indicates that all the cdfs obtained from the three different mixture models give approximately same result, except at the right tail of the figure of cdfs, where the cdfs of Weibull-Normal-Exponential and Normal-Lognormal-Weibull mixture models belong slightly closer to the nonparametric estimate of cdf than that of the cdf of threefold Weibull mixture model. Hence we may consider both the Weibull-Normal-Exponential and Normal-Lognormal-Weibull mixture models for the data set.
The statistical approaches provide a more rigorous method for model selection and validation. Various statistics [such as adjusted Anderson-Darling (AD*) value, Kolmogrov-Smirnov (KS) test statistic, Akaike Information Criterion (AIC) and root mean square error (RMSE)] are applied for model selection and validation. The estimates of AIC, AD*, KS test statistic and RMSE for the three competitive models are given in Table 3.  Table 3, we found that the Weibull-Normal-Exponential mixture model contains the smallest values of AIC and RMSE and the Normal-Lognormal-Weibull mixture model contains the smallest value of AD* test statistic among all of the mixture models. Hence, it can be concluded that, among these mixture models, Weibull-Normal-Exponential mixture model can be selected as the best model for hydraulic pump failure data according to the values of AIC and RMSE.
We have also applied the Kolmogrov-Smirnov (KS) test statistic as a goodness-of-fit test for these threefold mixture models. At the 5 % level of significance, with n = 102, the critical value of the Kolmogorov-Smirnov one-sample test is 1.36/ √ 102 = 0.135 (Siegel and Castellan 1988). Since the observed value of the KS test statistic for all the threefold mixture models (given in Table 3) are less than the critical value, we cannot reject the null hypothesis, H 0 , that the observed data are from a population specified by these threefold mixture distribution. But we may consider that among all these three mixture models the Weibull-Normal-Exponential mixture model gives the smallest value for the KS test statistic.
According to , let us introduce the following notations:

q:
Probability that the pump is scrapped and replaced by a new one under service exchange 1 -q: Probability that the pump is not scrapped and reconditioned under service exchange p: Probability that the item used in service exchange is installed correctly 1 -p: Probability that the item used in service exchange is not installed correctly F N (t): Failure distribution of new item installed correctly F R (t): Failure distribution of reconditioned item installed correctly F I (t): Failure distribution of incorrectly installed item (new or reconditioned) It is easily seen (using the conditional approach) that the time to failure of an item used in service exchange is given by a distribution function  Note that the MTTF (mean time to failure) for a new item installed correctly > MTTF for a reconditioned item installed correctly > MTTF for an item (new or reconditioned) installed incorrectly. If we select the Weibull (β, η)-Normal (μ, σ)-Exponential (δ) mixture model as the best model for the data, then according to the (5) p 3 = (1 − p), p 1 = (1 − q)p and p 2 = qp (6) F 3 (t; δ) = F I (t), F 1 (t; β, η) = F R (t) and F 2 (t; µ, σ ) = F N (t) Using the estimates of p 1 , p 2 and p 3 from Table 2 in Eq. (5), we get the estimates of p = 0.8326 and q = 0.6096.

Optimum maintenance cost
Obtaining the solution to the problem involves building a model and deciding on the optimal age for PM action requires an objective function. The objective function is the asymptotic expected cost per unit time. Note that every time instant an exchanged pump is put into operation can be viewed as a renewal point for a renewal process characterizing the replacements of pumps over time. The time between two successive renewal points defines a cycle. The asymptotic expected cost per unit time can be obtained as the ratio of the expected cycle cost (ECC) and the expected cycle length (ECL).
The time to failure for a pump, X, is a random variable with distribution function F(x). A PM action results if X ≥ T in which case the cycle length is T with probability R(T). A CM action results when X < T and the cycle length is X. As a result ECL is given by Let C f and C p denote the average cost of a CM and a PM replacement respectively. We will discuss the derivation of this cost later in the section. As a result ECC is given by From (7) and (8) we have the asymptotic average cost per unit time given by T * , the optimal T, is the value that yields a minimum for J (T ; F (.)).
The optimal T depends on the average cost of each CM and PM. Like , we use the following additional notations and assumptions.
C n : Sale price for new pump ($80,000). C r : Cost (charged by the service agent) for reconditioning a pump under CM or PM action ($60,000). ξ: Additional cost (due to downtime, loss in revenue, etc.) resulting from CM action. We look at values of ξ = $70,000, $90,000, $110,000 and $130,000.
A maintenance action involves replacement by a new item or a reconditioned item with probabilities q and (1 − q) respectively. As a result, the average cost of a PM action is C p = qC n + (1 − q)C r and of a CM action is C f = C p + ξ. The optimal T * is obtained using (9) with threefold mixture cdf F (t) = G 3 (t) and the optimal expected cost per unit time is given by J (T * ; F (.)) i.e., J (T * ; G 3 (·)).
Here we can see that, the optimal T * depend on the additional cost ξ. The optimal T * and optimal expected cost per unit time J (T * ) on various values of ξ for the three different threefold mixture models has been estimated. These results are given in Table 4, from where it can be seen, for every model, the optimal T * decrease and optimal J (T * ) increasing with ξ increases, as to be expected. Table 4 indicates that the threefold Weibull mixture model gives a bit larger optimal maintenance period T * than other two models, however the Weibull-Normal-Exponential model shows a reduction in the maintenance cost than the threefold Weibull mixture model for all ξ.

Conclusion
Proper data management (data collection and analysis) is very important for effective maintenance of any engineered object. Data is critical for building and selecting suitable statistical models and model provides new insights for improvements to maintenance operations. This paper has dealt with a real case study to illustrate how statistical models can be selected and applied for estimating optimum maintenance period and cost of a hydraulic pump. It is recommended that the Weibull-Normal-Exponential mixture model can be selected as the best model for hydraulic pump failure data among three competitive models. This model suggests the optimum maintenance period for the pump that reduces the maintenance cost. Annotated R code is provided for analyzing hydraulic pump failure data with Weibull-Normal-Exponential mixture model. The code can be modified easily to apply other threefold mixture models.