Piecewise modelling and parameter estimation of repairable system failure rate

Lifetime failure data usually presents a bathtub-shaped failure rate, and random failure phase dominates its life cycle. Relatively large errors always existed when single distribution model is used for analysis and modelling. Therefore, a bathtub-shaped reliability model based on piecewise intensity function was proposed for repairable system. Parameter estimation was studied by using maximum likelihood method in conjunction with two-stage Weibull process interval estimation and Artificial Bee Colony Algorithm. The proposed model and estimation approach fit the failure rate under bathtub curve very well, adequately reflect the duration of random failure phase, and guarantee the feasibility of quantitative analysis. Moreover, the estimated critical point of bathtub curve could be calculated. At last, the proposed model was applied in the reliability analysis of a repairable Computer Numerical Control system, and the validity of the proposed model and its parameter estimation method were verified.

exponentiated exponential distribution (Gupta and Kundu 1999), the exponential-Poisson distribution (Kus 2007), the exponentiated exponential-poisson distribution (Barreto-Souza and Cribari-Neto 2009), those improved models also achieve description of the progressive increasing or decreasing of failure rate. All of which have been used monotonic function to model the failure patterns for the repairable system. However, the lifetime failure data are often non-monotonic, and have a bathtubshaped failure rate, which can be divided into three phases-early failure, random failure, and wear-out failure. For this reason, many scholars studied the phenomenon. Chen (2000) put forward a new two-parameter Weibull distribution model, in which the failure rate function followed the trend of bathtub curve when the shape parameter satisfied a certain condition. Later, Xie et al. (2002) and Zhang (2004) expanded the two-parameter model above by introducing the third parameter, and proposed the three-parameter Weibull distribution models independently. The results indicated that those models can reach high accuracy in fitting the trend of bathtub curve more easily than traditional models. Lai et al. (2003), based on Weibull distribution and Extreme Value Type I distribution, proposed an improved Weibull distribution model, which also described the trend of bathtub curve very well. They also completed the model's parameter estimation by means of Weibull probability plot. El-Gohary et al. (2013) gave a new distribution known as the generalized Gompertz distribution by dealing with a new generalization of exponential. Generalized Gompertz distribution, and generalized exponential distributions, which had bathtub curve failure rate depending upon the shape parameter. And the maximum likelihood estimators of the parameters were derived using a simulations study. Wang et al. (2015) proposed a new four parameter interval life model to describe the bathtub curve. Cordeiro et al. (2014) extended the modified Weibull distribution, and established a five-parameter Weibull distribution. Aiming at the failure censored data of industrial devices, Using Newton-Raphson type algorithm for parameter estimation, finding that the extended model fitted the bathtub curve very well.
The above models fitted the trend of bathtub-shaped failure rate well, but random failure phase was not described in detail, as well as only one critical point of early failure phase and random failure phase was included. In fact, random failure phase is the main part of life cycle for most repairable system. Single distribution model may be inadequate for analysis and modelling as well as relatively errors may occur. Parameter estimation of multi parameter model, because of the unknown parameters, is difficult to solve the problem directly by using the maximum likelihood estimation.
Therefore, a bathtub-shaped reliability model based on piecewise intensity function was proposed for repairable system. According to the characteristics of failure rate at three phases of bathtub curve, using non-homogeneous Poisson process (NHPP) and homogeneous Poisson process (HPP), the failure rate of different phases was fitted respectively. In view of the complexity of parameter estimation, relevant parameter estimation was studied by virtue of maximum likelihood method and artificial bee colony algorithm. At last, the approach was applied and verified by the reliability analysis of a repairable Computer Numerical Control (CNC) system. The remainder of the paper is organized as follows. "Set up piecewise model" section builds up a piecewise model under bathtub curve. "Parameter estimation" section studies the parameter estimation of the model. "Application" section gives an example application for verifying the proposed method. Finally, the paper is concluded.

Set up piecewise model
Currently, there are two main methods for studying reliability of repairable system: Counting Process and Markov Process (Rausand and Høyland 2004). Markov Process is mainly targeted at the multi-state problem of repairable system. Counting Process focuses on two states which are normal operation and fault of the system, and record normal operating time of system, fault occurrence time as well as frequency, which is consistent with the trend of fault rate in this research. Consequently, Counting Process was selected for piecewise reliability modelling of trend of bathtub curve. Process is as follows:

Early failure phase
At early failure phase, intensity function presents a decline trend. Due to the defects in the process of selection, manufacture and assembly of components, failure rate of system is high in the initial phase, but decreases rapidly as the run time increases and maintenance is provided. Therefore, we assume that the repair process at early failure phase is minimal repair, NHPP is used for modelling, intensity function complies with the most common Weibull process, and b 1 < 1, namely: 2. Random failure phase At random failure phase, early defects of a system have been corrected, and failure intensity is tending towards stability. Thus, we assume that the repair process at random failure phase is perfect repair, HPP and exponential distribution are adopted for modelling, and intensity function remains unchanged (as a constant): where, t 0 is the critical point between early failure phase and random failure phase. 3. Wear-out failure phase As run time lapses, the system slowly enters wear-out failure phase due to loss of components and mechanical fatigue, and its intensity function is increasing. Just like early failure phase, we assume that the repair process at wear-out failure phase is minimal repair, NHPP is used for modelling, and intensity function complies with Weibull process. Thus, the system's intensity function is: where, t 1 is the critical point between random failure phase and wear-out failure phase. To sum up, the system's intensity function throughout the life cycle is: Then, the system's cumulative intensity function is:

Parameter estimation
There are a lot of methods for parameter estimation of reliability model, such as maximum likelihood estimation, moment estimation, least square method, and Bayesian method. Among these methods, maximum likelihood estimation is the most widely used owing to its good theoretical basis and high estimation accuracy. Therefore, maximum likelihood estimation was applied for parameter estimation of the proposed reliability model.

Maximum likelihood estimation
First, the likelihood function shall be determined. Based on the reliability model above, the cumulative distribution function of the ith Mean Time Between Failure (MTBF) is: where, S i is the moment when the ith failure occurs. Then, the conditional probability density function of S i is: Equations (4) and (5) are plugged into (7), and we obtain: When S i ≤ t 0 , When the system's failure time data is Type-II censored data, the likelihood function is: where n is the total number of failures, θ = (a1, b1, t0, a2, b2, t1, ρ), S k < t 0 < S k+1 < ⋯ < S l < t 1 < S l+1 .
When the system's failure time data is Type-I censored data, the likelihood function is: Where t s is censoring time, To sum up, the likelihood function is: Where δ is defined as: Given that the specific equation of likelihood function is related to the interval of failure time data where t 0 and t 1 are located, and it is difficult to solve the complicated equation of likelihood function by calculating logarithmic derivative. Hence, artificial bee colony algorithm was used for parameter optimization to maximize likelihood in accordance with the piecewise processing of t 0 and t 1 . The optimization model is as follows: In addition, the range of variable parameter shall be optimized to improve efficiency and accuracy of the model's parameter optimization. First, the value range of t 0 and t 1 , i.e. k and l, shall be determined based on the system's failure time data and trend estimation. Second, the first k time data at early failure phase and the last n-l time data at wear-out failure phase are adopted for interval estimation of Weibull process, and the (17) δ = 0 Type-II censored data 1 Type I censored data According to the system's failure time data and trend estimation to determine the value range of t 0 and t 1 , i.e. k and l Initialize the value of parameters θ=(a 1 , b 1 , t 0 , a 2 , b 2 , t 1 ) Solve: L(S 1 ,S 2 ,…,S n |θ) Max L(S 1 ,S 2 ,…,S n |θ) ?
Get the optimal estimation parameters θ=(a 1 , b 1 , t 0 , a 2 , b 2 , t 1 ) N Y Set: S k <t 0 ≤S K+1, S l <t 1 ≤S l+1 Determine the optimization range of parameters a 1 , b 1 , a 2 and b 2 Piecewise interval estimation of Weibull process Optimize θ and do iteration Fig. 1 Flowchart of parameter estimation of reliability model extreme value of estimation interval is regarded as the optimization range of parameters a 1 , b 1 , a 2 , and b 2 . The flowchart of parameter estimation of reliability model is shown in Fig. 1.

Weibull process interval estimation
Interval estimation includes two parts-point estimation and margin of error that describes estimation accuracy. Therefore, to complete the interval estimation of Weibull process, the point estimation of parameters shall be first performed, followed by the calculation of its margin of error.
1. Point estimation Based on Literature (Ebeling 2004), the maximum likelihood point estimation of Weibull process parameters a and b is as follows: When the failure time data is Type-II censored data: When the failure time data is Type-I censored data: 2. Margin of error Generally, in the case of large sample (≥30), the maximum likelihood point estimation presents consistency and asymptotically normal distribution; In the case of small sample, the logarithm of point estimation of parameter is much closer to normal distribution. Specific equations are as follows: In the case of large sample: In the case of small sample: where D(â) and D(b) are the variance of parameters a and b, respectively. Their value can be obtained using Fisher Information Matrix below (Kijima 1989;Ye 2003): According to normal distribution, the confidence interval under 1−α: In the case of large sample: In the case of small sample: Solving Eq. (25) can obtain the confidence interval for a and b: In the case of large sample: In the case of small sample:

Parameter optimization based on artificial bee colony algorithm
With respect to the proposed model for parameter optimization and the range optimized by Weibull process interval estimation, artificial bee colony algorithm was used for solving optimization problem. Artificial bee colony algorithm is characterized by strong global optimization, rapid rate of convergence, and suitability in solving different problems, compared with other swarm intelligence optimization algorithms such as evolutionary algorithm, artificial immune algorithm, particle swarm optimization, and ant colony algorithm (Chen 2015). The correspondence between bee's search for nectar source and optimization in artificial bee colony algorithm is shown in Table 1.
The algorithm process is as follows:

Bee searches for nectar source Optimization
Location of nectar source Value of all θ = (a 1 , b 1 , t 0 , a 2 , b 2 , t 1 ) Quality of nectar source Fitness value corresponding to all θ, i.e. L(S 1 ,S 2 ,…,S n |θ) Speed of gathering honey Solving speed Maximum fitness max L(S 1 ,S 2 ,…,S n |θ)

Initialization of parameter
First, the two basic parameters in artificial bee colony algorithm shall be initialized. One parameter is the quantity of nectar source (S n ), which represents the number of solution. Besides, S n is also the number of leaders and followers; the other parameter is the limit value of gathering nectar source (limit). When the gathering times of a nectar source exceed the limit, the nectar source will be abandoned.

Initialization of solution space
Random initialization produces the primary S n solutions. Let θ i = (a 1i , b 1i , t 0i , a 2i , b 2i , t 1i ) = (θ i1 , θ i2 , θ i3 , θ i4 , θ i5 , θ i6 ), which represents the location of the ith nectar source. The initialization formula for each value is as follows: where r is the random number between [0, 1], and LB j and UB j are the value range of θ j , i.e. minimum and maximum. Later, the leaders start to gather these nectar sources at random, and relevant fitness value is calculated.

Stage of leaders
When the leaders decide to gather a nectar source at random, they will search new nectar sources at random around the nectar source. Their search is in line with the equation below: The fitness value corresponding to new nectar sources θ new is then calculated and compared with previous nectar sources to select the solution with higher fitness.

Stage of followers
In terms of followers, their follow probability is calculated based on normalization and fitness value of each nectar source. The equation is as follows: where, fit i represents the fitness value of nectar source at θ i . Larger fit i indicates greater probability that the followers select the nectar source. When the followers select a nectar source, they also search new nectar sources at random around the nectar source based on Eq. (29) just as the leaders do, and then choose a better nectar source by comparing with the nectar source they follow.

Stage of scouters
At this stage, if a nectar source θ i is still not improved after being gathered limit times, it will be abandoned. The leaders here will become scouters, and search a new nectar source at random based on Eq. (28).

Iterations
The nectar source with the maximum fitness is recorded. Whether iterations reach the maximum set point is judged. If the maximum set point is reached, the algorithm ends and the optimal nectar source, namely optimal solution, is output. Otherwise, return to (3) to continue the cycle.

Application
The proposed model was applied in reliability analysis of a repairable CNC system containing servo drive unit. In order to test the reliability of CNC system, a long-term multi-sample test (Type-I censored) under the same environment was carried out, and the environment of test laboratory as shown in Fig. 2. This test has been taken to record the failures of CNC system in the past two years. The failure time data of 18 sets of CNC system with 50 failures were recorded, and processed by the Total Test Time method (Barlow and Campo 1975), as shown in Table 2.
Based on the data above, the system's failure trend estimation and test were carried out, and TTT diagram (Bergman 1979) was drawn, as shown in Fig. 3. Where, the total number of failures n = 50.
As can be seen in Fig. 3, the TTT scatter diagram is concave under the diagonal of unit square at the beginning, indicating that the failure rate presents a decline at early phase. Later, the diagram fluctuates slightly along a straight line. The failure rate remains  unchanged, and the system is gradually stabilized. After a period of time, the diagram looks like a convex under the diagonal of unit square, indicating that the failure rate increases. Meanwhile, Laplace test (Louit et al. 2009) and Anderson-Darling test (Pulcini 2001) were used for trend test of failure data in Table 2. The significance level was set as α = 0.05. The results of statistical tests are shown in Table 3.
The results of trend test are consistent with the estimation output of TTT diagram. The system failure time data follows the trend of bathtub curve, and undergoes three phases-early failure, random failure and wear-out failure. Then, the system reliability model was set up using the piecewise function above, and parameter estimation was performed.
First, the optimum ranges of t 0 and t 1 were processed in segment, and within the scope of (0, n) select the values of k and l in turn. But in order to improve calculation efficiency, by observing the total failure time data and the inflection point trend of TTT diagram, the range of values for k and l can be reduced and make a preliminary judgment. So, the value of k may be 6, 7, 8, and 9, while the value of l may be 33, 34, 35, and 36. And, the value range of t 0 and t 1 is S k < t 0 < S k+1 < S l < t 1 < S l+1 .
Second, parameter optimization was conducted in accordance with different k and l. Based on the failure time sequence S 1 , S 2 , S 3 ,…, S k (considered as Type-II censored data), interval estimation of minimal repair model at early failure phase was conducted. In the condition that the given confidence was 99.73 %, the confidence interval was a 1 ∈ [a 1min , a 1max ], b 1 ∈ [b 1min , b 1max ]. Similarly, based on the failure time sequence S l+1 ,…,  S 50 , t s (considered as Type-I censored data), interval estimation of minimal repair model at wear-out failure phase was conducted. In the condition that the given confidence was 99.73 %, the confidence interval was a 2 ∈ [a 2min , a 2max ], b 2 ∈ [b 2min , b 2max ]. The improved optimization model was: Artificial bee colony algorithm was then used for parameter optimization of the optimization model above. The algorithm's iterations, size of bee colony, and limit were set. The optimal parameter in the case of k and l was θ = (a 1 , b 1 , t 0 , a 2 , b 2 , t 1 ).
The goodness of fit of the two-stage Weibull process above was tested using Cramer-Von Mises test (Ebeling 2004). The given significance level was α = 0.05. The test results are shown in Table 4.
In conclusion, 18 sets of CNC system intensity functions were obtained: Meanwhile, the results suggest that the critical point between early failure phase and random failure phase of a single system was located at about t 0 /18 = 920 h, while the critical point between random failure phase and wear-out failure phase was located at about t 1 /18 = 9347 h.
The intensity function of a single system was: The curve of intensity function is shown in Fig. 4. Based the instantaneous MTBF(t) = 1/ω(t), we found that the MTBF of the system at random failure phase was calculated as MTBF = 1/(1.7893 × 10 −4 ) = 5589 h. Therefore, the example above proves that the proposed piecewise reliability model can fit the failure rate under bathtub curve very well and obtain two critical points of bathtub curve. It provides an important basis for reliability analysis, design and test.

Conclusions
A modelling study on the phenomenon that the failure rate of lifetime failure data presents a bathtub curve was carried out for repairable system. The following achievements were made: 1. Considering the characteristics of failure rate at three phases of bathtub curve, and combining homogeneous Poisson process and non-homogeneous Poisson process during Counting Process, a bathtub-shaped reliability model based on piecewise intensity function was proposed. Moreover, with respect to the difficulty of common parameter estimation methods in solving equations, maximum likelihood estimation and artificial bee colony algorithm were combined to estimate the model-related parameters and guarantee the feasibility of the model in application. 2. Given that extensive range and excessive time consumption of parameter optimization during parameter estimation, the interval estimation of two-stage Weibull process was studied, and the range of model optimization was improved. As a result, the efficiency of parameter optimization was substantially increased. 3. The proposed model, as well as estimation approach, fits the failure rate under bathtub curve very well and adequately reflects the duration of random failure phase. Besides, two critical points of bathtub curve are obtained. 4. At last, the practical use of the proposed model is demonstrated by a set of failure data of a repairable CNC system, and the validity of the proposed model and its parameter estimation method was verified. The study will be useful for reliability analysis of repairable systems and provide an important basis for relevant reliability research.