 Research
 Open Access
 Published:
Feature selection using angle modulated simulated Kalman filter for peak classification of EEG signals
SpringerPlus volume 5, Article number: 1580 (2016)
Abstract
In the existing electroencephalogram (EEG) signals peak classification research, the existing models, such as Dumpala, Acir, Liu, and Dingle peak models, employ different set of features. However, all these models may not be able to offer good performance for various applications and it is found to be problem dependent. Therefore, the objective of this study is to combine all the associated features from the existing models before selecting the best combination of features. A new optimization algorithm, namely as angle modulated simulated Kalman filter (AMSKF) will be employed as feature selector. Also, the neural network random weight method is utilized in the proposed AMSKF technique as a classifier. In the conducted experiment, 11,781 samples of peak candidate are employed in this study for the validation purpose. The samples are collected from three different peak eventrelated EEG signals of 30 healthy subjects; (1) single eye blink, (2) double eye blink, and (3) eye movement signals. The experimental results have shown that the proposed AMSKF feature selector is able to find the best combination of features and performs at par with the existing related studies of epileptic EEG events classification.
Background
The use of electroencephalogram (EEG) signals for measurements has become a growing interest in research for various applications such as braincomputer interface (NicolasAlonso and GomezGil 2012), human–machine interface (Ramli et al. 2015), diagnosing and monitoring epilepsy (Acir 2005), and tracking eye gaze (Adam et al. 2014). Nowadays, the utilization of an advanced processing method makes the EEG signals has efficiently been used in a wide range of applications.
In general, a peak point is defined by a point that holds the highest value located at a specific time and location on EEG signals. A peak point can be observed in EEG signals because of the response of brain on human activities. Such responses of the brain on human activities that triggers a peak in EEG signals are eye movements, epilepsy, and eventrelated potentials. However, EEG signals are also very sensitive to noises that come from heart bit, EEG electrodes and some movements of the body. The presence of various noises in EEG signals generates a large number of false peaks in the signals and makes the classification of desired peak points difficult. Moreover, this problem could be worse because the amplitude of peaks of the signals is different from one subject to another, which can vary from 600 to 1100 µV (Iwasaki et al. 2005), resulting a high variance value of peak features in data collection.
At present, researchers have used several combinations of peak features based on a timedomain characteristic of the peak in EEG signals (Dumpala et al. 1982; Acir et al. 2005; Acir and Guzelis 2004; Liu et al. 2002; Dingle et al. 1993). Those peak features were obtained from different amplitudes, widths, and slopes. For instance, the peaktopeak amplitude of the first and second half waves, peak width, ascending peak slopes at the first half wave, and descending peak slope at the second half wave, can be used as the peak features. The peak features are selected to make sure that only relevant features are used for classification. The combinations of the selected features, however, are problem dependent and only efficiently used for a specific application. Furthermore, to properly determine the best and generalized combination of peak features in EEG signals are still open problems for further research.
To avoid the slow learning speed and iteratively learning problems of conventional neural networks learning algorithm (i.e., gradient descent and LevenbergMarquart), a neural network with random weights (NNRW) is employed as a classifier. The NNRW is a fast, simple, and noniterative learning algorithm of a single layer feedforward neural network (SLFN). The NNRW was firstly introduced by Schmidt (1992). The network of NNRW consists of three layers that are input, hidden, and output layers. The learning concept of NNRW is that the input weights and the biases at the hidden layer in the network are chosen randomly with a specific interval, whereas the output weights are estimated by the Moore–Penrose generalized inverse method (Rao and Mit 1971). The input weights are assigned randomly between −1 and 1. Also, the biases in the hidden layer are assigned randomly between 0 and 1. Both parameters follow the setup parameters that have been suggested by Cao et al. (2015). A similar concept of NNRW was further developed by Pao and Takefuji (1992), knowingly as random vector functionallink (RVFL) nets. Variations of extended RVFL were introduced to establish the theoretical results of the RVFL concept (Pao et al. 1994; Igelnik and Pao 1995).
Populationbased metaheuristic optimization algorithms provide a satisfactory solution in a relatively shorter time. These algorithms are also efficient and effective to solve large and complex realworld problems and can be applied to solve almost any optimization problems (Xiong et al. 2015). A variety populationbased metaheuristic optimization algorithms have been invented, such as genetic algorithm (Hooker 1995), simulated annealing (Johnson et al. 1989), particle swarm optimization (Kennedy and Eberhart 1995), ant colony optimization (Dorigo et al. 1996), big bangbig crunch optimization (Erol and Eksin 2006), intelligent water drops algorithm (ShahHosseini 2007), honey bee mating optimization (Marinakis et al. 2011), firefly algorithm (Yang 2010b), gravitational search algorithm (Rashedi et al. 2009), harmonic search optimization (Yang 2009), bat algorithm (Yang 2010a), and black hole algorithm (Hatamlou 2013). So far, those optimization algorithms have been already applied as an effective technique for feature selection in various realworld applications such as power system (Ahila et al. 2015), manufacturing (Zhang et al. 2015), and medical (Bababdani and Mousavi 2013; Adam et al. 2014).
Recently, a new metaheuristic optimization algorithm has been introduced by Ibrahim et al. (2015) that is inspired by the state estimation process of Kalman filter. The new optimizer is namely as a simulated Kalman filter (SKF) algorithm. The principle of Kalman filter consists of the following main processes: states prediction, state measurement, and state estimation. In the SKF algorithm, each agent acts as an individual Kalman filter and holds a vector state. Through the prediction, measurement, and estimation state processes, new states are estimated and new locations of agents are updated. The processes are iteratively looped until it reaches the maximum iteration. Regarding the final experimental results by Ibrahim et al. (2015), the SKF algorithm has the capability to find efficiently the most optimal solution and the performance are comparable to gravitational search algorithm and black hole algorithm for unimodal optimization problems. The original SKF algorithm, however, cannot be used for solving discrete optimization problems. To solve this problem, Md Yusof et al. (2016) have introduced an angle modulated SKF (AMSKF) algorithm. Based on the capability of the AMSKF algorithm for solving discrete problems, AMSKF is employed as a feature selection method in this study.
The key contributions of this study are expressed as follows: (1) to employ a recently introduced populationbased metaheuristic optimization algorithm for feature selection in EEG signals peak classification using AMSKF, (2) to firstly employ the NNRW into peak detection algorithm for classification and feature selection, (3) to propose a new generalized peak model for EEG signals peak classification based on the features selected by AMSKF, and (4) to apply the proposed method of AMSKF model on epileptic EEG signals. For the benchmarking purpose, four existing peak models are considered. The experimental results show the new combination of peak features that are produced by the proposed AMSKF technique performs better accuracy compared to the NNRW with conventional peak models.
Data descriptions
Eye eventrelated EEG data
The peak candidate data of eye eventrelated were collected from three different eventrelated EEG signals that producing peaks. The first peak eventrelated is labelled as single eye blink signals. The second peak eventrelated is labelled as double eye blink signals. The third peak eventrelated is labelled as eye movement signals. The first and second peaks eventrelated of EEG signals recording were conducted using the g.USBamp biological signals acquisition system. While, the third peak eventrelated of EEG signals recording were conducted using the g.MOBIlab portable biological signals acquisition system. The scalp electrodes arrangement of the three different signals is placed using the 10–20 international electrode placement system. The sampling frequency for those signals was set to 256 Hz.
The single blink and double blink signals were recorded from F9 channel. The reference electrode was located on the ear. The ground electrode was located on channel AFz. In total, only three electrodes were used. The electrodes from the F9 channels are positioned for detecting EEG peaks associated with the brain response of commanded single and double eye blink. Single means the eye are blinking once while double means the eye are blinking twice. The eyes blink that produces some peaks in the signals on channel F9 is archived as raw data for analysis.
The eye movement signals were recorded from C3 and C4 channels. The channel CZ was used as a reference. The ground electrode was located on FPz channel. In total, only four electrodes were used. The electrodes from the C3 and C4 channels are positioned for detecting EEG peaks associated with the brain response of commanded horizontal eye gaze direction. The eye gaze directions that produce some peaks in the signals on channels C3 and C4 are archived as raw data for analysis.
Figure 1a–c shows three different EEG signals that were named as a single eye blink, double eye blink, and eye movement signals. The dotted red vertical lines show the actual peak point location, as manually assigned by a researcher. The descriptions of those EEG signals are tabulated in Table 1.
The single eye blink signals have 30 signals, 10s length per signal, 2560 sampling points per signal, and each signal containing two known peak points and various additional signal patterns. The additional signal patterns are the edge transitions which represent the eye movements. The known peak pattern in this signal represents a single eye blink. The peak pattern of a single eye blink is useful as an additional feature for controlling an electric wheelchair (Lin and Yang 2012). The total training and testing sampling points are 38,400 and 38,400, respectively. From the total sampling points, 3238 sampling point locations are identified as the locations of peak candidates, 60 sampling point locations are identified as the locations of true peaks, and 3178 sampling point locations are identified as the locations of false peaks.
The double eye blink signals have five signals, 80s length per signal, 20,480 sampling points per signal, and each signal containing eight known peak points and some additional signal patterns. The additional signal patterns are the edge transitions that represent the horizontal eye movements. The signals occasionally contain a peak of the single eye blink. The total training and testing sampling points are 51,200 and 51,200, respectively. From the total sampling points, 4662 sampling point locations are identified as the locations of peak candidates, 40 sampling point locations are identified as the locations of true peaks, and 4622 sampling point locations are identified as the locations of false peaks.
Figure 1c shows the eye movement signals. The eye movement signals have 40 signals of C3 and C4 channels, 10s length per signal, 2560 sampling points per signal, and each signal containing one known actual peak point location. The known peak pattern in this signal represents the horizontal eye gaze direction, either to the left or the right. In total, the data collection of this signal has 40s length and 102,400 sampling points. From 102,400 sampling points, 3881 candidate peak locations were recognized where the known actual peak point locations are 40 and the remaining sampling points are the known actual nonpeak point location.
From the collected raw data of the three EEG signals, 11,781 peak candidate samples with their associated features were archived as EEG data for experiments. From 11,781 peak candidate samples, 140 were assigned as true peaks and the other 11,461 were assigned as false peaks.
Epileptic EEG data
The second data used in this study is available and published in Bonn University EEG database (Andrzejak et al. 2001). The EEG recording was prepared using standard 10–20 electrode placement system. The datasets have five different sets, which are named as set A, set B, set C, set D, and set E. Each set contains 100 EEG segments that were selected from continuous multichannel EEG recordings after removing muscle activity or eye movement artifacts. Each EEG segment consists of 4097 sampling points and the duration is about 23.6 s. Sets A and B consist of EEG segments taken from surface EEG recording collected from five healthy subjects. Subjects were relaxed in an awaken state with eyes open (A) and eyes closed (B), respectively. Sets C, D, and E were taken from EEG archive of presurgical diagnosis. Segments in set D were recorded from the epileptogenic zone. Set C is recorded from hippocampal formation of opposite hemisphere of brain. Sets C and D contain only activity measured during epilepticfree intervals. Set E contains only epileptic events. Data is recorded within 128channel amplifier system and digitized at 173.61 Hz sampling rate and 12 bit A/D resolution. To select the EEG signal of desired band a bandpass filter having a pass band of 0.53–40 Hz (12 dB/oct) was used. In this study, only set A and set E were used. Set A represents as nonepileptic peak events while set E denotes as epileptic peak events.
From the collected EEG raw data of the two sets EEG signals (set A and set E), 20,000 peak candidate samples with their associated features were archived as EEG data for experiments. From 20,000 peak candidate samples, 10,000 were assigned as epileptic peaks event from set E. The other 10,000 were assigned as nonepileptic peaks event from set A. 100 peak candidate samples were randomly selected from each segment of both set. The fourfold crossvalidation process is used to produce four groups of EEG data. The class distribution of the peak candidate sample and event is summarized in Table 2.
Methods
The methods for peak detection consist of three main processes: (1) feature extraction, (2) feature selection, and (3) classification. In feature extraction stage, threepoints sliding window method (Dumpala et al. 1982; Billauer 2012) is employed to identify all possible peak candidates. The AMSKF feature selector is used to select the best combination of features for all possible peak candidates. All identified peak candidates with the selected associated features are then classified by the NNRW classifier. The choice of classification method was supported by two reasons: (1) the NNRW provides fast learning speed. (2) The fast learning speed capability in the proposed AMSKF technique can minimize the computational complexity.
Feature extraction
So far, to the best of our knowledge, only four models in the time domain analysis have typically been used in various eventrelated signals for peak classification (e.g., Dumpala et al. 1982; Acir and Guzelis 2004; Liu et al. 2002; Dingle et al. 1993). In general, all existing peak models (i.e., Dumpala, Acir, Liu, and Dingle models) have their associated features. All 16 peak features of the existing models can be calculated using the defined eight parameter points as shown in Fig. 2.
After the ith candidate peak point, PP _{ i }, and the two associated valley points, \(VP1_{i}\) and \(VP2_{i}\), are identified using threepoints sliding window method (Dumpala et al. 1982; Billauer 2012), the other five parameter points {i.e., the half point at first half wave (HP1_{ i }), the half point at second half wave (HP2_{ i }), the turning point at first half wave (TP1_{ i }), the turning point at second half wave (TP2_{ i }), and the moving average curve point [MAC(PP _{ i })]} can be identified. For example, the half point at first half wave can be defined as the point located in the middle between the \(PP_{i}\) and \(VP1_{i}\) while the half point at the second half wave as the point based in the midst between the \(PP_{i}\) and \(VP2_{i}\). The turning point can be recognized when the slope decreases more than 50 % as compared to the slope of the preceding point. The MAC(PP _{ i }) point is located at the intersection between the \(PP_{i}\) and MAC(PP _{ i }) points.
After all eights parameter points are identified, 16 peak features are then calculated based on the listed equation in Table 3. All peak features can be categorized into three groups, namely amplitude, width, and slope, resulting in five different amplitudes (i.e., f _{1}, f _{2}, f _{3}, f _{4}, f _{5}), seven different widths (i.e., f _{6}, f _{7}, f _{8}, f _{9}, f _{10}, f _{11}, f _{12}), and four different slopes (i.e., f _{13}, f _{14}, f _{15}, f _{16}). The descriptions of all the 16 features are also explained in Table 3.
Table 4 presents the list of different peak models with their associated features. The Dingle model is produced by four features: f _{5}, f _{6}, f _{13}, and f _{14}. The associated features of Dumpala model are denoted as f _{1}, f _{6}, f _{13}, and f _{14}. Acir model consists of six features: f _{1}, f _{2}, f _{7}, f _{8}, f _{13}, and f _{14}. The considerably more complex model of Liu et al. (2002) entails 11 features: f _{1}, f _{2}, f _{3}, f _{4}, f _{6}, f _{9}, f _{12}, f _{12}, f _{14}, f _{15}, and f _{16}.
Neural network with random weights (NNRW) classifier
The NNRW classifier has recently gained attention as a fast learning and generalized technique for classification (Cao et al. 2016; Lang et al. 2015). The fundamental aspect of this method is that the NNRW can be represented as a linear system (Schmidt 1992). The linear system of NNRW is mathematically modeled as \(H\beta = T\) where β is the L × m matrix of output weights and T is the N × m matrix of target outputs. m is the number of output neurons. The β and T matrixes are denoted as
and
respectively. The output function of NNRW classifier of a given unknown sample, x can be mathematically described as \(fc(x) = h(x)\beta\). The output matrix of the hidden layer, H, is calculated as follows:
where g is an activation function of the hidden neuron, x is the N × L matrix of inputs, a is the d × L matrix of random input weights, b is the 1 × L matrix of random biases in the hidden layer, N is an arbitrary distinct sample, L is the number of hidden neurons (L = 1000 in this study), and d is the number of inputs (where d depends on the number of the selected features in this study). The ith column of H is the output of the ith hidden neuron with respect to inputs x _{1}, x _{2}, until x _{ d }. The sigmoidal function \(g(x) = {1 \mathord{\left/ {\vphantom {1 {(1 + e^{  x} )}}} \right. \kern0pt} {(1 + e^{  x} )}}\) was used in this study as an activation function in the hidden layer for normalization while a linear function is located inside the neuron in the output layer.
To find the least square solution, β of the linear system, \(H\beta = T\), the minimumnorm leastsquares solution is computed as follows:
It is well known that the smallest norm leastsquares solution of Eq. (4) is
where H ^{+} is the Moore–Penrose pseudoinverse of H. The summary of the training stages of the NNRW classifier is listed as follows:

Stage 1 Assign randomly the input weights, a _{ i } and biases in the hidden neurons, b _{ i }.

Stage 2 Calculate the output matrix of the hidden layer, H.

Stage 3 Calculate the output weights, \(\beta = H^{ + } T\).
In the output layer, two neurons are used in the network to classify the output into two classes (output): class 1 and class 0. For two classes (m > 1), the predicted class label is the ith number of the output neurons which the maximum value of output neuron. The predicted class label of a given unknown sample x is defined as follows.
The performance of the classifier is evaluated using a fourfold crossvalidation process. The fourfold crossvalidation accuracy of the classifier is computed using Gmean (Guo et al. 2008). The Gmean is calculated as follows:
where any true peak (TP) is the correctly detected apex point of a peak candidate, a true nonpeak (TN) is any correctly detected nonpeak point of a peak candidate, a false peak (FP) is an incorrectly designated nonpeak point of a peak candidate, a false nonpeak (FN) is any incorrectly detected true peak point of peak candidate, TPR is the true peak rate, and TNR is the true nonpeak rate.
Simulated Kalman filter (SKF) for continuous optimization problems
The SKF algorithm (Ibrahim et al. 2015) was originally invented for solving continuous optimization problems. The algorithm follows several steps as shown in Fig. 3: (1) generate an initial population, (2) calculation of the fitness evaluation function for each agent, (3) update the best fitness value among agents at every iteration (X_{best}) and the best solution compared to the current X_{best} (X_{true}), (4) perform state prediction, measurement, and estimation, and (5) perform termination based on a stopping criterion.
In the initialization step, several initial SKF parameters such as the initial value of error covariance estimate, P(0), the process noise value, Q, and the measurement noise value, R, are initialized. Further settings, such as, the number of n agents and a maximum number of iterations, \(t_{\hbox{max} }\), are also determined. The states values of each agent are given randomly within a specific interval.
Next, the fitness evaluation function is computed to obtain initial solutions for every agent. The best fitness value among each agent at every iteration t, X_{best}(t) can be either in the maximization problem, \(\max_{i \in \,1, \ldots ,n} fit\left( {(X(t)} \right)\) or minimization problem \(\min_{i \in \,1, \ldots ,n} fit\left( {(X(t)} \right).\)
The X_{best}(t) value at every iteration t is compared and the best among the X_{best}(t) value, which is X _{true} is updated. For a maximization problem, X_{true} is only updated when X_{best}(t) at current iteration is greater than X_{true}. Whereas, for a minimization problem, X_{true} is only updated when X_{best}(t) at current iteration is lower than X_{true}.
Referring to Fig. 4, the next following steps including the state prediction, measurement, and estimation. The state prediction follows the following equations:
where, \(X_{i} \left( {t  1} \right)\) and \(X_{i} \left( {tt  1} \right)\) are the previous state and transition state, respectively. \(P\left( {tt  1} \right)\) and \(P\left( {t  1} \right)\) are previous error covariant estimate and transition error covariant estimate, respectively.
In the state measurement step, the following equation, \(Z_{i} (t)\), is used, which gives some feedbacks to the estimation process.
In Eq. (12), the \(\sin \left( {rand \times 2\pi } \right)\) term offers the stochastic element of SKF algorithm which having a random probability distribution to the measurement value and \(rand\) is a uniformly distributed random number in the range of [0 1].
Next, the Kalman gain, K(t), is computed based on the calculated value of the transition error covariant estimate, \(P\left( {tt  1} \right)\) and the measurement noise value, R. The equation of K(t) is given as follows.
Here, the equation for estimating the next state, \(X_{i} (t)\), is given in Eq. (14) and the error covariant is updated based on Eq. (15). Finally, the processes are iteratively looped until the maximum number of iteration is reached.
Angle modulated simulated Kalman filter (AMSKF) for discrete optimization problems
For solving discrete optimization problems, the angle modulated concept is embedded into SKF algorithm (Md Yusof et al. 2016). Referring to Fig. 4, additional two steps of the angle modulated into SKF are described as follows. After the initialization step, the continuous signals, g(x) with four coefficient parameters (a, b, c, and d) are generated for each agent. So, the state of the ith agent in a population at iteration t is denoted as \(X_{i} (t) = \left\{ {a_{i} ,b_{i} ,c_{i} ,d_{i} } \right\}\). As mentioned before, the state values which are a, b, c, and d are given randomly in an initial stage. The function g(x) with the four coefficient parameters is defined as follows,
An example plot of function, g(x) for the case of a = 0, b = 1, c = 1, and d = 0 is given in Fig. 5. From the signals, the sampling time, T, is chosen to generate a bit string of length n in the next step. The bit 1 is generated when g(x) value is greater than 0 while, the bit 0 is generated when g(x) value is lower than 0. The length of the bit string depends on the given problem. For example, if the length of the full feature set is 100, so the length of the bit string is 100. The generated bit string of each agent is employed to calculate the fitness value for each agent. Then, AMSKF follows similar steps as SKF until it returns the final solution. Using the angle modulated approach, the AMSKF algorithm only tunes the four coefficient parameters for getting the best solution.
The proposed AMSKF feature selection algorithm
The proposed feature selection algorithm for EEG signals peak detection is based on AMSKF algorithm. Also, the NNRW classifier is employed for peak classification. The combination of both methods is illustrated in the flowchart as shown in Fig. 6.
From Fig. 6, the proposed AMSKF technique begins with initialization of a population and then calculation of a g(x) function. The maximum number of iteration was set to 500 and the number of agents was set to 10. The initial value of the error covariance estimate, P, process noise value, Q, and measurement noise value, R, are 10,000, 0.5, and 0.5, respectively. To employ AMSKF algorithm for feature selection in EEG peak classification, a total of 16bit string is generated since the selection of one feature is determined by onebit value. If AMSKF assigns bit value 1 to an ith feature, the ith feature is selected. Otherwise, the ith feature is not selected.
In the calculation process of the fitness evaluation function, the selected features are used to prepare the training and validation sets, as shown in Fig. 6. To calculate the fitness evaluation function, at first, the classifier has to be trained by the given training data. Then, the trained classifier is tested using the validation set. The detection performance of the training and validation sets are computed based on Gmean (Guo et al. 2008). The Gmean of validation set is set as fitness value for AMSKF algorithm.
In Fig. 6, after fitness value is calculated, the process continues to the next following processes; update X_{best} (t) and X_{true}, state measurement, state prediction, and state estimation. Next, new 16 bits solutions are determined and those processes are looped until maximum iteration is reached. Finally, the best peak model associated with the NNRW was obtained.
Experimental results and discussions
In this section, three main experiments were conducted. The first experiment aimed to investigate the classification performance of the individual NNRW under various number of hidden neurons. This experiment was also evaluated the performance of the individual NNRW over the four existing peak models. The optimum number of hidden neurons was selected to perform the experiment of the proposed AMSKF technique. The second experiment was assigned to study the search capability of the proposed AMSKF technique to find the best combination of peak features. The first and second experiments were conducted on eye eventrelated EEG data. The third experiment was conducted to apply the best combination of peak features on epileptic EEG classification events application.
Performance of NNRW under various number of hidden neurons
One advantage of the NNRW classifier is that the learning algorithm is less difficult than other conventional neural network classifier (i.e., gradient descent, LevenbergMarquart, and particle swarm optimizationbased learning algorithms). So that, with an enormous number of hidden neurons is possible to perform using the NNRW classifier. However, the optimal number of neurons of the NNRW classifier is required to be firstly identified for offering better generalization ability of the NNRW classifier. To find the optimal number of hidden neuron, an experiment is executed by varying the number of hidden neuron from 100 to 1200 in steps of 100.
To prepare the experiment data of the individual NNRW classifier, the EEG dataset are randomly divided into four groups, equally distributes the twoclass ratio, by fourfold crossvalidation process. Every group alternately assigned as the testing set and the other three groups are combined to be a training set. The mean value of testing results from the four groups is calculated. This experiment is repeated 30 times, so that the mean of the training and testing results can be measured as shown in Table 5.
The variation of testing accuracy with respect to a different number of hidden neurons is graphically illustrated in Fig. 7. Referring to Fig. 3, the testing accuracy of all four peak models increased up to 1200 neurons. Three peak models (e.g., Dumpala, Acir, and Liu models) except Dingle model offer the optimal accuracy when the numbers of hidden neurons are between 900 and 1200. Hence, the number of hidden neurons for our experiment was set to 1000. The final results in Fig. 7 indicate that the selection of the best combination features is necessary for providing the best and generalizes performance in EEG signals peak classification.
Experimental results for AMSKF feature selection algorithm
To prepare the experiment data of the proposed AMSKF feature selection algorithm, the fourfold crossvalidation process is used to produce four groups of EEG data: each group consists of training and testing sets. Next, the training set is randomly divided into two: training and validation sets. Both datasets are equally distributed to the twoclass ratio. The ratio size of training and validation was set to 0.5:0.5. The testing set is utilized as unseen EEG data. After all four groups are evaluated by the algorithm, the maximum value of testing results from the four groups is measured and the best peak model is recorded. This entire fourfold cross validation process is repeated 30 times to obtain the final statistical results (e.g., average, maximum, minimum, and standard deviation) for this experiment.
Table 6 shows the 30 independent runs experimental results of the proposed AMSKF feature selection algorithm using the EEG data that is collected from the three recorded EEG signals (i.e., single eye blink, double eye blink, and eye movement signals). Table 6 gives the best peak model with the highest training, validation, and testing accuracies for the NNRW classifier at every run. In this experiment, the bestgeneralized peak model is chosen based on the maximum accuracy of testing data over 30 runs.
In Table 6, it is found that the feature set of the best peak model is f _{1}, f _{2}, f _{7}, f _{8}, f _{9}, f _{10}, f _{11}, f _{12}, f _{13}, f _{14}, and f _{15}, with 72.7 % of testing accuracy. From those associated features, two of features are peak amplitudes (e.g., f _{1} and f _{2}), six of features are peak widths (e.g., f _{7}, f _{8}, f _{9}, f _{10}, f _{11}, and f _{12}), and three of features are peak slopes (e.g., f _{13}, f _{14}, and f _{15}). For overall of testing accuracy, the average, maximum, minimum, and STDEV over 30 runs are 61.7, 72.7, 53, and 4.1 %, respectively.
The results in Table 6 show that the higher value of fitness of validation set cannot produce the best classification accuracy of testing set as expected. Also, the feature set that contain lower feature subset length cannot give better performance. These results have exhibited that the peak eventrelated EEG signals are very problem dependant.
In this experiment, the proposed AMSKF algorithm was iteratively executed with maximum 500 iterations. To observe the result of convergence of the proposed AMSKF, one example is taken from this experiment, as illustrated in Fig. 8. From Fig. 8, it can be seen that the AMSKF algorithm can reach convergence within 20 iterations.
To evaluate the effectiveness of the proposed algorithm and the selected best combination of features, some comparisons are performed regarding percentage of the testing classification accuracy between the results of the existing four peak detection models and with the proposed AMSKF model. The comparison results are comparatively presented in Table 7. For a fair performance evaluation, the four existing peak models with their associated features are performed using the similar parameters setting of the NNRW of the proposed AMSKF technique.
The experimental results in Table 6 are obtained from the experiment in “Performance of NNRW under various number of hidden neurons” section, with the hidden neuron of the NNRW is 1000. The performance of the best combination of features is taken from the maximum testing accuracy in Table 6. As seen from Table 7, the performance of the best combination of features that are produced by AMSKF algorithm exceeds the performance of the other existing four models.
In Table 7, it can be seen that there is a large different value between training and testing accuracies. The proposed method of the AMSKF model has only achieved 73 % of testing accuracy. In this study, the ratio between true peak and false peak is 140:11,461. This means the dataset has extremely imbalanced dataset ratio. In this case, the conventional NNRW classifier may fail to offer high accuracy of performance for imbalanced dataset problem. Other contributing factor is the collected EEG data is affected by various noises and the peak features have a large different value from one subject to another subject. This factor is the cause to the high variation of peak features. The consequent of this factor is that the NNRW classifier may fail to correctly classify the true peak and false peak.
The results of the peak models are further analyzed by using nonparametric Friedman statistical analysis. The statistical analysis is required to demonstrate the significant difference in testing accuracy in terms of average value for the five models. The experiments are conducted based on statistical procedures designed especially for multiple N × N comparisons with five models executed in the KEEL data mining system (AlcalaFdez et al. 2009).
Table 8 shows the average ranking of Friedman’s test of the Dumpala, Acir, Liu, Dingle, and AMSKF models. The statistical results show that the lowest average ranking is obtained by AMSKF model that represents ranking first among the five models for EEG data. While, the NNRW with Acir model ranking second, the NNRW with Dumpala model ranking third, the NNRW with Liu model ranking fourth, and the NNRW with Dingle model ranking fifth.
Next, p values for unadjusted values and adjusted p values for Nemenyi, Holm’s, Shaffer, and BergmannHommel test for N × N comparisons for all possible ten pairs of model with the peak models are presented in Table 9. The p values below 0.05 represent that the particular peak model differ significantly in testing accuracy. The p values below 0.05 were marked with the italic font.
From Table 9, it can be observed that p values for unadjusted values and adjusted p values for Holm’s, Shaffer and BergmannHommel offer for eliminating nine hypotheses. However, Nemenyi lets for eliminating only seven hypotheses. Based on unadjusted p values and adjusted p values for Nemenyi, Holm’s, Shaffer, and BergmannHommel test, the AMSKF model revealed significantly better performance than other models.
Application of the proposed AMSKF model to epileptic and nonepileptic EEG event classification
Two EEG events have been assigned which are epileptic and nonepileptic events. 100 nonepileptic events are collected from set A while 100 epileptic peak events from set E. Each EEG event is a segment that consists of 4097 sampling points and the duration is about 23.6 s. The best combination of peak feature and the trained NNRW classifier with 500 hidden neurons are used to perform the classification. To distinguish between epileptic and nonepileptic events, the voting method is used. The epileptic event is recognized when more than 50 peaks are identified in within an event. Whereas, the nonepileptic event is recognized when lower than 50 peaks are identified.
Table 10 demonstrates the confusion matrix of epileptic and nonepileptic event classification using the proposed AMSKF model. It can be observed that the AMSKF model obtains 98 % of total accuracy, with 100 % of the nonepileptic event rate, and 96 % of the epileptic event rate. There are four misclassifications of epileptic event.
The performance comparisons have been done to observe the efficiency of the proposed method. Table 11 gives the classification accuracy of this study and the existing methods on Bonn University EEG database. Referring to Table 11, the classification accuracy of this study using the NNRW method is lower than AIRSPCAFFT and WaveletANFIS methods. However, the classification accuracy of the NNRW using AMSKF model is higher than other methods.
An example of epileptic and nonepileptic events classification is illustrated in Fig. 9. As can be seen that, there are more than 50 peaks (red dotted) have been identified in epileptic segment (the right side) within the region from 4000 to 8000 sampling points. Figure 10 shows an example of misclassification of epileptic event in record S083. The number of detected peaks obviously can be seen is lower than 50. Consequently, the actual epileptic event is classified as nonepileptic event.
Conclusions and future works
In this study, a new generalized peak model for EEG signals peak classification has been identified using a novel AMSKF feature selection approach. The proposed algorithm considered 11,781 peak candidate samples of real EEG data, which were collected from 30 healthy subjects instructed to direct their single eye blink, double eye blink, and horizontal eye gaze. The detection performance of the NNRW with four different peak detection models and new AMSKF model are compared. In general, the experimental results showed that the accuracy of the NNRW with new AMSKF model is better than the NNRW with other models. The statistical analysis showed that the detection performance of the NNRW with the new AMSKF model is significantly better in terms of testing accuracy compared to other models.
A published EEG database from Bonn University was selected to evaluate the proposed method and at the same time applied the relevant combination of peak features for epileptic EEG signals application. From set A and set E of the published EEG database, 20,000 peak candidate samples consist of epileptic peak and nonepileptic peak points were archived as EEG data for analysis. The major finding of this chapter is that the proposed generalized AMSKF model and NNRW classifier perform at par than the existing methods.
This study may provide a significant contribution to medical diagnostic, human–machine interface (HMI), braincomputer interface (BCI), and harmonic detection in digital and audio signal processing as these applications share a common peak detection problem. For example, an EEG peak in response to a change of horizontal eye gaze direction might be useful for patients with lockedin syndrome or other disabilities for controlling the direction of computer cursor in BCI applications. (Belkacem et al. 2014). This approach might also be translatable for EEGbased command of the movement of a robotic arm or wheelchair in HMI applications (Postelnicu et al. 2011; Ramli et al. 2015; Aziz et al. 2014).
References
Acir N (2005) Automated system for detection of epileptiform patterns in EEG by using a modified RBFN classifier. Expert Syst Appl 29(2):455–462. doi:10.1016/j.eswa.2005.04.040
Acir N, Guzelis C (2004) Automatic spike detection in EEG by a twostage procedure based on support vector machines. Comput Biol Med 34(7):561–575. doi:10.1016/j.compbiomed.2003.08.003
Acir N, Oztura I, Kuntalp M, Baklan B, Guzelis C (2005) Automatic detection of epileptiform events in EEG by a threestage procedure based on artificial neural networks. IEEE Trans Bio Med Eng 52(1):30–40. doi:10.1109/TBME.2004.839630
Adam A, Shapiai MI, Mohd Tumari MZ, Mohamad MS, Mubin M (2014) Feature selection and classifier parameters estimation for EEG signals peak detection using particle swarm optimization. Sci World J 2014 (Article ID 973063):973063. doi:10.1155/2014/973063
Ahila R, Sadasivam V, Manimala K (2015) An integrated PSO for parameter determination and feature selection of ELM and its application in classification of power system disturbances. Appl Soft Comput 32:23–37. doi:10.1016/j.asoc.2015.03.036
AlcalaFdez J, Sanchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernandez JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318. doi:10.1007/s005000080323y
Andrzejak RG, Lehnertz K, Mormann F, Rieke C, David P, Elger CE (2001) Indications of nonlinear deterministic and finitedimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys Rev E 64(6 Pt 1):061907. doi:10.1103/PhysRevE.64.061907
Aziz F, Arof H, Mokhtar N, Mubin M (2014) HMM based automated wheelchair navigation using EOG traces in EEG. J Neural Eng 11(5):056018. doi:10.1088/17412560/11/5/056018
Bababdani BM, Mousavi M (2013) Gravitational search algorithm: a new feature selection method for QSAR study of anticancer potency of imidazo[4,5b]pyridine derivatives. Chemometr Intell Lab 122:1–11. doi:10.1016/j.chemolab.2012.12.002
Belkacem AN, Hirose H, Yoshimura N, Shin D, Koike Y (2014) Classification of four eye directions from EEG signals for eyemovementbased communication systems. J Med Biol Eng 34(6):581–588. doi:10.5405/jmbe.1596
Billauer E (2012) peakdet: Peak detection using MATLAB. http://billauer.co.il/peakdet.html
Cao FL, Ye HL, Wang DH (2015) A probabilistic learning algorithm for robust modeling using neural networks with random weights. Inf Sci 313(C):62–78. doi:10.1016/j.ins.2015.03.039
Cao FL, Wang DH, Zhu HY, Wang YG (2016) An iterative learning algorithm for feedforward neural networks with random weights. Inf Sci 328:546–557. doi:10.1016/j.ins.2015.09.002
Dingle AA, Jones RD, Carroll GJ, Fright WR (1993) A multistage system to detect epileptiform activity in the EEG. IEEE Trans Biomed Eng. doi:10.1109/10.250582
Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern B Cybern 26(1):29–41. doi:10.1109/3477.484436
Dumpala SR, Reddy SN, Sarna SK (1982) An algorithm for the detection of peaks in biological signals. Comput Programs Biomed 14(3):249–256. doi:10.1016/0010468X(82)900307
Erol OK, Eksin I (2006) A new optimization method: big bang big crunch. Adv Eng Softw 37(2):106–111. doi:10.1016/j.advengsoft.2005.04.005
Guler I, Ubeyli ED (2005) Adaptive neurofuzzy inference system for classification of EEG signals using wavelet coefficients. J Neurosci Methods 148(2):113–121. doi:10.1016/j.jneumeth.2005.04.013
Guler NF, Ubeyli ED, Guler I (2005) Recurrent neural networks employing Lyapunov exponents for EEG signals classification. Expert Syst Appl 29(3):506–514. doi:10.1016/j.eswa.2005.04.011
Guo X, Yin Y, Dong C, Yang G, Zhou G (2008) On the class imbalance problem. In: Fourth international conference on natural computation (ICNC 08), Jinan, China, 25–27 August 2008. pp 192–201. doi:10.1109/ICNC.2008.871
Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184. doi:10.1016/j.ins.2012.08.023
Hooker CA (1995) Adaptation in natural and artificial systems—Holland, Jh. Philos Psychol 8(3):287–299. doi:10.1080/09515089508573159
Ibrahim Z, Abdul Aziz H, Abdul Aziz A, Razali S, Shapiai MI, Nawawi SW, Mohamad MS (2015) A Kalman filter approach for solving unimodal optimization problems. ICIC Express Lett 9(12):3415–3422
Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functionallink net. IEEE Trans Neural Netw 6(6):1320–1329. doi:10.1109/72.471375
Iwasaki M, Kellinghaus C, Alexopoulos AV, Burgess RC, Kumar AN, Han YH, Luders HO, Leigh RJ (2005) Effects of eyelid closure, blinks, and eye movements on the electroencephalogram. Clin Neurophysiol 116(4):878–885. doi:10.1016/j.clinph.2004.11.001
Johnson DS, Aragon CR, Mcgeoch LA, Schevon C (1989) Optimization by simulated annealing—an experimental evaluation. 1. Graph partitioning. Oper Res 37(6):865–892. doi:10.1287/opre.37.6.865
Kannathal N, Choo ML, Acharya UR, Sadasivan PK (2005) Entropies for detection of epilepsy in EEG. Comput Methods Programs Biomed 80(3):187–194. doi:10.1016/j.cmpb.2005.06.012
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks (ICW), Perth, Western Australia, 27 November–1 December 1995, pp 1942–1948
Lang K, Zhang M, Yuan Y (2015) Improved neural networks with random weights for shortterm load forecasting. PLoS ONE 10(12):e0143175. doi:10.1371/journal.pone.0143175
Lin JS, Yang WC (2012) Wireless braincomputer interface for electric wheelchairs with EEG and eyeblinking signals. Int J Innov Comput Inf Control 8(9):6011–6024
Liu HS, Zhang T, Yang FS (2002) A multistage, multimethod approach for automatic detection and classification of epileptiform EEG. IEEE Trans Bio Med Eng 49(12 Pt 2):1557–1566. doi:10.1109/TBME.2002.805477
Marinakis Y, Marinaki M, Dounias G (2011) Honey bees mating optimization algorithm for the Euclidean traveling salesman problem. Inf Sci 181(20):4684–4698. doi:10.1016/j.ins.2010.06.032
Md Yusof Z, Ibrahim Z, Ibrahim I, Mohd Azmi KZ, Abd Aziz NA, Abd Aziz NH, Mohamad MS (2016) Angle modulated simulated Kalman filter algorithm for combinatorial optimization problems. ARPN J Eng Appl Sci 11(7):4854–4859
NicolasAlonso LF, GomezGil J (2012) Brain computer interfaces, a review. Sensors (Basel) 12(2):1211–1279. doi:10.3390/s120201211
Pao YH, Takefuji Y (1992) Functionallink net computing—theory, system architecture, and functionalities. Computer 25(5):76–79. doi:10.1109/2.144401
Pao YH, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functionallink net. Neurocomputing 6(2):163–180. doi:10.1016/09252312(94)900531
Polat K, Gunes S (2008) Artificial immune recognition system with fuzzy resource allocation mechanism classifier, principal component analysis and FFT method based new hybrid automated identification system for classification of EEG signals. Expert Syst Appl 34(3):2039–2048. doi:10.1016/j.eswa.2007.02.009
Postelnicu CC, Talaba D, Toma MI (2011) Controlling a robotic arm by brainwaves and eye movement. In: Technological innovation for sustainability, vol 349. IFIP advances in information and communication technology, pp 157–164. doi:10.1007/9783642191701_17
Ramli R, Arof H, Ibrahim F, Mokhtar N, Idris MYI (2015) Using finite state machine and a hybrid of EEG signal and EOG artifacts for an asynchronous wheelchair navigation. Expert Syst Appl 42(5):2451–2463. doi:10.1016/j.eswa.2014.10.052
Rao CR, Mit SK (1971) Generalized inverse of matrices and its applications. Wiley, New York
Rashedi E, NezamabadiPour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248. doi:10.1016/j.ins.2009.03.004
Subasi A (2007) EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Syst Appl 32(4):1084–1093. doi:10.1016/j.eswa.2006.02.005
Schmidt WF (1992) Feed forward neural networks with random weights. In: 11th IAPR international conference on pattern recognition methodology and systems, The Hague, IEEE, pp 1–4. doi:10.1109/ICPR.1992.201708
ShahHosseini H (2007) Problem solving by intelligent water drops. 2007 IEEE congress on evolutionary computation, vol. 1–10, proceedings: 3226–3231
Xiong N, Molina D, Ortiz ML, Herrera F (2015) A walk into metaheuristics for engineering optimization: principles, methods and recent trends. Int J Comput Int Sys 8(4):606–636. doi:10.1080/18756891.2015.1046324
Yang XS (2009) Harmony search as a metaheuristic algorithm. In: Geem Z (ed) Musicinspired harmony search algorithm, vol 191. Studies in computational intelligence. Springer, Berlin, pp 1–14. doi:10.1007/9783642001857_1
Yang XS (2010a) A new metaheuristic batinspired algorithm. In: González J, Pelta D, Cruz C, Terrazas G, Krasnogor N (eds) Nature inspired cooperative strategies for optimization (NICSO 2010), vol 284. Studies in computational intelligence. Springer, Berlin, pp 65–74. doi:10.1007/9783642125386_6
Yang XS (2010b) Firefly algorithm, levy flights and global optimization. In: Research and development in intelligent systems XXVI, pp 209–218. doi:10.1007/9781848829831_15
Zhang XL, Chen W, Wang BJ, Chen XF (2015) Intelligent fault diagnosis of rotating machinery using support vector machine with ant colony algorithm for synchronous feature selection and parameter optimization. Neurocomputing 167:260–279. doi:10.1016/j.neucom.2015.04.069
Authors’ contributions
AA conceived the study, participated in the design of the algorithm, carries out collected the data, conducted experiments, performed the statistical analysis, and drafted the manuscript. ZI participated in the design of the study, coordination and helped to draft the manuscript. NM prepared the facilities in the laboratory, financing, and participated in the design of the study. MIS contributed to the design of the study, manuscript preparation, manuscript editing, and the experiments facilities. IS contributed to finance the publication fees. MB contributed to the facilities in the laboratory. All authors read and approved the final manuscript.
Acknowledgements
This research is funded by High Impact Research Fund (UM.C/HIR/MOHE/ENG/16 Account code: D00001616001), Matching Grant (Q.K130000.3043.00M79), Internal UMP Grant (GRS1503120) awarded by Ministry of Higher Education Malaysia to University of Malaya, Universiti Teknologi Malaysia, and Universiti Malaysia Pahang, respectively. This research is also funded in part by the Artificial Intelligence Research Unit (AiRU) of Universiti Malaysia Sabah (UMS). The first author would like to thank the Ministry of Education Malaysia for supporting his study by awarding him a MyPhD scholarship.
Competing interests
The authors declare that they have no competing interests
Ethics approval and consent to participate
The eye eventrelated EEG signals were obtained in the Applied Control and Robotic (ACR) Laboratory, Department of Electrical Engineering, Faculty of Engineering, University of Malaya, Malaysia. Thirty healthy subjects were involved voluntarily in these data collection sessions which were undergraduate and postgraduate students in the Faculty of Engineering. All subjects are informed to sign a consent form in advance. The experimental protocol was approved by the medical ethics committee of the University of Malaya Medical Centre.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Neural network with random weights (NNRW)
 Kalman filtering
 Simulated Kalman filter (SKF)
 Electroencephalogram (EEG)
 Peak detection algorithm
 Pattern recognition