Empirical mode decomposition with missing values
 Donghoh Kim^{1}Email author and
 HeeSeok Oh^{2}
Received: 2 May 2016
Accepted: 17 November 2016
Published: 25 November 2016
Abstract
This paper considers an improvement of empirical mode decomposition (EMD) in the presence of missing data. EMD has been widely used to decompose nonlinear and nonstationary signals into some components according to intrinsic frequency called intrinsic mode functions. However, the conventional EMD may not be efficient when missing values are present. This paper proposes a modified EMD procedure based on a novel combination of empirical mode decomposition and selfconsistency concept. The selfconsistency provides an effective imputation method of missing data, and hence, the proposed EMD procedure produces stable decomposition results. Simulation studies and the image analysis demonstrate that the proposed method produces substantially effective results.
Keywords
Empirical mode decomposition Imputation Missing Multiscale method SelfconsistencyBackground
Multiscale methods for decomposing a signal into several significant modes with simple forms have been widely studied for last two decades. Since the decomposition procedure reduces the complexity of a signal and at the same time, enhances interpretability of its components, the information embedded in a signal can be easily recovered.
Spectral analysis (Priestley 1981) and wavelet analysis (Mallat 2009; Daubechies 1992; Vidakovic 1999) are popular multiscale methods for signal decomposition. Huang et al. (1998) proposed a dataadaptive procedure called empirical mode decomposition (EMD), and Daubechies et al. (2011) proposed an alternative method of EMD, termed synchrosqueezed wavelet transforms, which are based on reassignment methods of wavelet coefficients. These multiscale decomposition methods implicitly assume that a signal is observed at equally spaced time points. Since the local behavior of a signal can evolve over time, by utilizing the local information observed at equally spaced time points, it is useful for identifying the amount of variation at different scale and time location and for extracting each superimposed component. However, for many signals, missing values occur quite common. In practice, large amount of missing values may occur at random, for example, for intermittent wireless signal caused by malfunction of network device for onedimensional signal and partial fingerprint due to incomplete touch in digital scanner for twodimensional image. The problem we concern in this paper is that when some observations are missing, most multiscale decomposition methods produce ineffective outcome. Especially, since EMD depends on the behavior of local extrema, missing of local extrema causes severe distorted results. The brief review of EMD procedure will reveal this aspect more clearly.
In the literature, there have been many studies to enhance the performance of the conventional EMD. Boudraa and Cexus (2007) separated the highfrequency components using a filtering method. Wu and Huang (2009) developed the ensemble EMD (EEMD) by averaging the simulated signals. The variants of EEMD have been proposed by several authors. The complementary ensemble EMD (CEEMD) (Yeh et al. 2010) was introduced by adding pairs of positive and negative noises into a signal and applying EEMD. Torres et al. (2011) proposed the complete ensemble EMD with adaptive noise (CEEMDAN). EEMD is applied to each stage of decomposition by adding a noise to a signal and a residue after each IMF extraction. The improved complete ensemble EMD (ICEEMD) (Colominas et al. 2014) controlled noise level between the added noise and a residue for CEEMDAN process. Xu et al. (2009) proposed a hybrid extrema estimation algorithm based on Fourier interpolation to decompose signals with lower sampling rate. Diop et al. (2010) suggested a PDEbased approach to compute envelopes. Barnhart et al. (2011, 2012) provided a methodology for discontinuous data by applying EMD on each individual continuous data segment, and by adapting mirroring approach for the discontinuous data gaps. Kim et al. (2012a) introduced the statistical EMD adapting smoothing of local extrema instead of interpolation. Komaty et al. (2014) suggested a signalfiltering approach based on a combination of EMD and a similarity measure for noise removal. Park et al. (2015) applied a quantile smoothing method to a signal itself instead of interpolating local extrema of a signal for sifting. The extension of EMD to twodimensional image has been developed by several authors. Damerval et al. (2005) employed moving window and Nunes et al. (2005) used morphological operation for twodimensional extrema detection. Bhuiyan et al. (2008) proposed orderstatistics filter method for envelope estimation of twodimensional image. Kim et al. (2012b) proposed a twodimensional EMD through the smoothing sifting of twodimensional local extrema.
As observed in the aforementioned EMD procedure, when some missing values are present, EMD produces distorted decomposition results due to two reasons: (1) when the observations are not equally spaced, it is difficult that the local behavior of a signal can be captured in a balanced way and (2) especially if missing occurs in local extrema, the sifting fails to capture upper and lower envelopes properly.
To improve EMD algorithm in the presence of missing values, we propose a new method by adapting the concept of selfconsistency that recursively imputes missing values and decomposes the imputed signal efficiently under EMD framework. For practical implementation, we provide a modified EMD algorithm which consists of two alternating steps, imputation and decomposition. In addition, we discuss some remarks of the algorithm such as the fitting method, the selection of smoothing parameter in the fitting method, the choice of initial values, and so on. Furthermore, we extend the proposed method to a twodimensional signal with missing values, so that this extension provides a meaningful influence on image decomposition.
The rest of the paper is organized as follows. In “Methods” section, we briefly review the selfconsistency principle, and propose a new method for signal decomposition in presence of missing values with a practical algorithm. To evaluate empirical performance of the proposed method, simulation studies for onedimensional signal are conducted in “Numerical study” section, and a real data example is presented in this section. Furthermore, in “Extension to image” section, the extension to twodimensional signals is discussed. Lastly, conclusions are addressed in “Conclusions” section.
Methods
Review: selfconsistency
Tarpey and Flury (1996) introduced the selfconsistency as a fundamental concept in statistics, which is inspired by Hastie and Stuetzle (1989) for developing principal curves.
Definition 1
(Tarpey and Flury 1996) A random signal f is selfconsistent for g if \(E(fg)=g\) almost surely.
Proposed algorithm
 1.Iterate, until convergence, the following alternating steps for \(\ell =1,2,\ldots ,\)
 1.1.
Imputation Step: Fit \({\hat{f}}^{(\ell 1)}(t_o)\) at the observed locations \(t_o\), impute by prediction \({\hat{s}}^{(\ell )}(t_m):={\hat{f}}^{(\ell 1)}(t_m)\) at the missing locations \(t_m\) and construct a complete data \({\hat{s}}^{(\ell )}(t_c):=(s(t_o),{\hat{s}}^{(\ell )}(t_m))\).
 1.2.Decomposition Step: Apply EMD procedure to \({\hat{s}}^{(\ell )}(t_c)\) and obtain$$\begin{aligned} {\hat{s}}^{(\ell )}(t_c)=\sum _{i=1}^{n} {\widehat{{\mathrm{imf}}}}_{i}^{(\ell )}(t_c) + {\hat{r}}^{(\ell )}(t_c). \end{aligned}$$
 1.3.
The iteration stops if \(\frac{{\hat{s}}^{(\ell + 1)}{\hat{s}}^{(\ell )}}{{\hat{s}}^{(\ell )}} \le \delta\) for some tolerance level \(\delta > 0\).
 1.1.
 2.
Take the converged IMFs as the final IMFs.

For the fitting method, we consider various nonparametric function estimation methods such as smoothing splines, kernel smoothing, and the local polynomial regression. The asymptotic results of the equivalent kernel described in Silverman (1984) support the fact that both a splinetype estimator and kernel smoother including local polynomial regression estimator can be writtenwhere \(w(t,t_i)\) denote weights at time point \(t_i\). In this study, we employ the smoothing splines with a smoothing parameter chosen by generalized crossvalidation.$$\begin{aligned} {\hat{f}}(t)=\frac{1}{n}\sum _{i=1}^nw(t,t_i)s(t_i), \end{aligned}$$

In this study, we use the local mean values as \({\hat{s}}^{(0)}(t_m)\) for the choice of initial values.

Note that the imputation step does not depend on the dimension of s. Thus it can be easily extended to twodimensional image. The only modification required is to replace the 1dimensional smoothing method in the imputation step by a 2dimensional method such as thinplate smoothing splines.
Numerical study
Simulation study
 1.
EMD.obs: the conventional EMD algorithm with observed data,
 2.
EMD.self: the proposed EMD algorithm described in “Proposed algorithm” section, and
 3.
EMD.com: the conventional EMD algorithm with imaginary complete data.
To evaluate how the proposed algorithm performs according to missing data percentage, we consider five different missing percentages: 10, 20, 30, 40 and 50%. In addition, we consider two cases of missing pattern: (a) The first one is missing at random where missing locations were randomly selected from inside 90% of the time domain and missing values do not exist near boundaries. (b) The second one is missing at random where missing locations were randomly selected over the entire domain including boundaries.
Figures 3 and 4 show box plots of MSE values for two missing patterns. The proposed EMD.self is comparable to EMD.com when the missing data percentage is up to 50%. Even when missing exists near the boundary, EMD.self works well due to effective imputation by the proposed algorithm.
\(imf_1\)  \(imf_2\)  \(imf_3\)  

EMD.obs  EMD.self  EMD.obs  EMD.self  EMD.obs  EMD.self  
Noisefree \(S_1\)  0.0030  0.0010  0.0035  0.0026  0.0045  0.0040 
Noisy \(S_1\)  0.0134  0.0061  0.0141  0.0071  0.0095  0.0052 
Real data example
Extension to image
Bidimensional EMD for twodimensional signals such as images has been proposed by some studies (Damerval et al. 2005; Nunes et al. 2005; Bhuiyan et al. 2008; Kim et al. 2012b). To construct the upper and lower envelopes of twodimensional signals, interpolation is done with the scattered sparse extrema. Therefore, missing aggravates insufficient sampling rate, and causes more obstacle to estimate candidate IMFs. Twodimensional extension is straightforward by recursively imputing the twodimensional missing values through thinplate spline and decomposing imputed image by an existing twodimensional EMD procedure.
Conclusions
In this paper, we have proposed a modified empirical mode decomposition to deal with missing data problems. The proposed method is based on imputation using the selfconsistency principle. We have presented an effective algorithm for implementation of the proposed method. The empirical performance of the proposed method has been evaluated throughout various numerical experiments including both one and twodimensional settings. Results from these experiments illustrate the proposed method possesses good empirical properties.
Declarations
Authors' contributions
HO introduced the initial idea, and provided the theoretical background. DK set up methodological procedure, and designed numerical studies. Both authors contributed to write the final manuscript. Both authors read and approved the final manuscript.
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2016R1D1A1B03930463), and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (2015R1D1A1A01056854 and 20110030811).
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Barnhart BL, Eichinger WE, Prueger JH (2012) Introducing an Ogive method for discontinuous data. Agric For Meteorol 162–163:58–62View ArticleGoogle Scholar
 Barnhart BL, Nandage HKW, Eichinger WE (2011) Assessing discontinuous data using ensemble empirical mode decomposition. Adv Adapt Data Anal 3:483–491MathSciNetView ArticleGoogle Scholar
 Bhuiyan SMA, Adhami RR, Khan JF (2008) Fast and adaptive bidimensional empirical mode decomposition using orderstatistics filter based envelope estimation. EURASIP J Adv Signal Process 2008:728356View ArticleGoogle Scholar
 Boudraa AO, Cexus JC (2007) EMDbased signal filtering. IEEE Trans Instrum Meas 56:2196–2202View ArticleGoogle Scholar
 Colominas MA, Schlotthauer G, Torres ME (2014) Improved complete ensemble EMD: a suitable tool for biomedical signal processing. Biomed Signal Process Control 14:19–29View ArticleGoogle Scholar
 Damerval C, Meignen S, Perrier V (2005) A fast algorithm for bidimensional EMD. IEEE Signal Process Lett 12:701–704ADSView ArticleGoogle Scholar
 Daubechies I (1992) Ten lectures on wavelets. SIAM, PhiladelphiaView ArticleMATHGoogle Scholar
 Daubechies I, Lu J, Wu HT (2011) Synchrosqueezed wavelet transforms: an empirical mode decompositionlike tool. Appl Comput Harmonic Anal 30:243–261MathSciNetView ArticleMATHGoogle Scholar
 Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via EM algorithm. J R Stat Soc B 39:1–38 (with discussion) MathSciNetMATHGoogle Scholar
 Diop EHS, Alexandre R, Boudraa AO (2010) Analysis of intrinsic mode functions: a PDE approach. IEEE Signal Process Lett 17:398–401ADSView ArticleGoogle Scholar
 Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84:502–516MathSciNetView ArticleMATHGoogle Scholar
 Huang NE, Shen Z, Long SR, Wu ML, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH (1998) The empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series analysis. Proc R Soc Lond A 454:903–995ADSMathSciNetView ArticleMATHGoogle Scholar
 Kim D, Kim KO, Oh HS (2012a) Extending the scope of empirical mode decomposition using smoothing. EURASIP J Adv Signal Process 2012:168ADSView ArticleGoogle Scholar
 Kim D, Park M, Oh HS (2012b) Bidimensional statistical empirical mode decomposition. IEEE Signal Process Lett 19:191–194ADSView ArticleGoogle Scholar
 Komaty A, Boudraa AO, Auiger B, DaréEmzivat D (2014) EMDbased filtering using similarity measure between probability density functions of IMFs. IEEE Trans Instrum Meas 63:27–34View ArticleGoogle Scholar
 Lee TCM, Meng XL (2007) Self consistency: a general recipe for wavelet estimation with irregularlyspaced and/or incomplete data. arXiv:0701196
 Mallat S (2009) A wavelet tour of signal processing. Academic Press, New YorkMATHGoogle Scholar
 Nunes JC, Guyot S, Delchelle E (2005) Texture analysis based on local analysis of the bidimensional empirical mode decomposition. Mach Vis Appl 16:177–188View ArticleGoogle Scholar
 Park M, Kim D, Oh HS (2015) Quantilebased empirical mode decomposition: an efficient way to decompose noisy signals. IEEE Trans Instrum Meas 64:1802–1813MathSciNetView ArticleGoogle Scholar
 Priestley MB (1981) Spectral analysis and time series. Academic Press, New YorkMATHGoogle Scholar
 Silverman BW (1984) Spline smoothing: the equivalent variable kernel method. Ann Stat 12:898–916MathSciNetView ArticleMATHGoogle Scholar
 Tarpey T, Flury B (1996) Selfconsistency: a fundamental concept in statistics. Stat Sci 11:229–243MathSciNetView ArticleMATHGoogle Scholar
 Torres ME, Colominas MA, Schlotthauer G, Flandrin P (2011) A complete ensemble empirical mode decomposition with adaptive noise. In: Proceedings of 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4144–4147Google Scholar
 Vidakovic B (1999) Statistical modeling by wavelets. Wiley, New YorkView ArticleMATHGoogle Scholar
 Wu Z, Huang NE (2009) Ensemble empirical mode decomposition: a noise assisted data analysis method. Adv Adapt Data Anal 1:1–41View ArticleGoogle Scholar
 Xu Z, Huang B, Zhang F (2009) Improvement of empirical mode decomposition under low sampling rate. Signal Process 89:2296–2303View ArticleMATHGoogle Scholar
 Yeh JR, Shieh JS, Huang NE (2010) Complementary ensemble empirical mode decomposition: a novel noise enhanced data analysis method. Adv Adapt Data Anal 2:135–156MathSciNetView ArticleGoogle Scholar