Feature selection using angle modulated simulated Kalman filter for peak classification of EEG signals

In the existing electroencephalogram (EEG) signals peak classification research, the existing models, such as Dumpala, Acir, Liu, and Dingle peak models, employ different set of features. However, all these models may not be able to offer good performance for various applications and it is found to be problem dependent. Therefore, the objective of this study is to combine all the associated features from the existing models before selecting the best combination of features. A new optimization algorithm, namely as angle modulated simulated Kalman filter (AMSKF) will be employed as feature selector. Also, the neural network random weight method is utilized in the proposed AMSKF technique as a classifier. In the conducted experiment, 11,781 samples of peak candidate are employed in this study for the validation purpose. The samples are collected from three different peak event-related EEG signals of 30 healthy subjects; (1) single eye blink, (2) double eye blink, and (3) eye movement signals. The experimental results have shown that the proposed AMSKF feature selector is able to find the best combination of features and performs at par with the existing related studies of epileptic EEG events classification.

various noises in EEG signals generates a large number of false peaks in the signals and makes the classification of desired peak points difficult. Moreover, this problem could be worse because the amplitude of peaks of the signals is different from one subject to another, which can vary from 600 to 1100 µV (Iwasaki et al. 2005), resulting a high variance value of peak features in data collection.
At present, researchers have used several combinations of peak features based on a time-domain characteristic of the peak in EEG signals (Dumpala et al. 1982;Acir et al. 2005;Acir and Guzelis 2004;Liu et al. 2002;Dingle et al. 1993). Those peak features were obtained from different amplitudes, widths, and slopes. For instance, the peak-to-peak amplitude of the first and second half waves, peak width, ascending peak slopes at the first half wave, and descending peak slope at the second half wave, can be used as the peak features. The peak features are selected to make sure that only relevant features are used for classification. The combinations of the selected features, however, are problem dependent and only efficiently used for a specific application. Furthermore, to properly determine the best and generalized combination of peak features in EEG signals are still open problems for further research.
To avoid the slow learning speed and iteratively learning problems of conventional neural networks learning algorithm (i.e., gradient descent and Levenberg-Marquart), a neural network with random weights (NNRW) is employed as a classifier. The NNRW is a fast, simple, and non-iterative learning algorithm of a single layer feedforward neural network (SLFN). The NNRW was firstly introduced by Schmidt (1992). The network of NNRW consists of three layers that are input, hidden, and output layers. The learning concept of NNRW is that the input weights and the biases at the hidden layer in the network are chosen randomly with a specific interval, whereas the output weights are estimated by the Moore-Penrose generalized inverse method (Rao and Mit 1971). The input weights are assigned randomly between −1 and 1. Also, the biases in the hidden layer are assigned randomly between 0 and 1. Both parameters follow the setup parameters that have been suggested by Cao et al. (2015). A similar concept of NNRW was further developed by Pao and Takefuji (1992), knowingly as random vector functional-link (RVFL) nets. Variations of extended RVFL were introduced to establish the theoretical results of the RVFL concept (Pao et al. 1994;Igelnik and Pao 1995).
Population-based metaheuristic optimization algorithms provide a satisfactory solution in a relatively shorter time. These algorithms are also efficient and effective to solve large and complex real-world problems and can be applied to solve almost any optimization problems (Xiong et al. 2015). A variety population-based metaheuristic optimization algorithms have been invented, such as genetic algorithm (Hooker 1995), simulated annealing (Johnson et al. 1989), particle swarm optimization (Kennedy and Eberhart 1995), ant colony optimization (Dorigo et al. 1996), big bang-big crunch optimization (Erol and Eksin 2006), intelligent water drops algorithm (Shah-Hosseini 2007), honey bee mating optimization (Marinakis et al. 2011), firefly algorithm (Yang 2010b), gravitational search algorithm (Rashedi et al. 2009), harmonic search optimization (Yang 2009), bat algorithm (Yang 2010a), and black hole algorithm (Hatamlou 2013). So far, those optimization algorithms have been already applied as an effective technique for feature selection in various real-world applications such as power system (Ahila et al. 2015), manufacturing , and medical (Bababdani and Mousavi 2013;Adam et al. 2014).
Recently, a new metaheuristic optimization algorithm has been introduced by Ibrahim et al. (2015) that is inspired by the state estimation process of Kalman filter. The new optimizer is namely as a simulated Kalman filter (SKF) algorithm. The principle of Kalman filter consists of the following main processes: states prediction, state measurement, and state estimation. In the SKF algorithm, each agent acts as an individual Kalman filter and holds a vector state. Through the prediction, measurement, and estimation state processes, new states are estimated and new locations of agents are updated. The processes are iteratively looped until it reaches the maximum iteration. Regarding the final experimental results by Ibrahim et al. (2015), the SKF algorithm has the capability to find efficiently the most optimal solution and the performance are comparable to gravitational search algorithm and black hole algorithm for unimodal optimization problems. The original SKF algorithm, however, cannot be used for solving discrete optimization problems. To solve this problem, Md Yusof et al. (2016) have introduced an angle modulated SKF (AMSKF) algorithm. Based on the capability of the AMSKF algorithm for solving discrete problems, AMSKF is employed as a feature selection method in this study.
The key contributions of this study are expressed as follows: (1) to employ a recently introduced population-based metaheuristic optimization algorithm for feature selection in EEG signals peak classification using AMSKF, (2) to firstly employ the NNRW into peak detection algorithm for classification and feature selection, (3) to propose a new generalized peak model for EEG signals peak classification based on the features selected by AMSKF, and (4) to apply the proposed method of AMSKF model on epileptic EEG signals. For the benchmarking purpose, four existing peak models are considered. The experimental results show the new combination of peak features that are produced by the proposed AMSKF technique performs better accuracy compared to the NNRW with conventional peak models.

Eye event-related EEG data
The peak candidate data of eye event-related were collected from three different eventrelated EEG signals that producing peaks. The first peak event-related is labelled as single eye blink signals. The second peak event-related is labelled as double eye blink signals. The third peak event-related is labelled as eye movement signals. The first and second peaks event-related of EEG signals recording were conducted using the g.USBamp biological signals acquisition system. While, the third peak event-related of EEG signals recording were conducted using the g.MOBIlab portable biological signals acquisition system. The scalp electrodes arrangement of the three different signals is placed using the 10-20 international electrode placement system. The sampling frequency for those signals was set to 256 Hz.
The single blink and double blink signals were recorded from F9 channel. The reference electrode was located on the ear. The ground electrode was located on channel AFz. In total, only three electrodes were used. The electrodes from the F9 channels are positioned for detecting EEG peaks associated with the brain response of commanded single and double eye blink. Single means the eye are blinking once while double means the eye are blinking twice. The eyes blink that produces some peaks in the signals on channel F9 is archived as raw data for analysis.
The eye movement signals were recorded from C3 and C4 channels. The channel CZ was used as a reference. The ground electrode was located on FPz channel. In total, only four electrodes were used. The electrodes from the C3 and C4 channels are positioned for detecting EEG peaks associated with the brain response of commanded horizontal eye gaze direction. The eye gaze directions that produce some peaks in the signals on channels C3 and C4 are archived as raw data for analysis. Figure 1a-c shows three different EEG signals that were named as a single eye blink, double eye blink, and eye movement signals. The dotted red vertical lines show the actual peak point location, as manually assigned by a researcher. The descriptions of those EEG signals are tabulated in Table 1.
The single eye blink signals have 30 signals, 10-s length per signal, 2560 sampling points per signal, and each signal containing two known peak points and various additional signal patterns. The additional signal patterns are the edge transitions which represent the eye movements. The known peak pattern in this signal represents a single eye blink. The peak pattern of a single eye blink is useful as an additional feature for controlling an electric wheelchair (Lin and Yang 2012). The total training and testing sampling points are 38,400 and 38,400, respectively. From the total sampling points, 3238 sampling point locations are identified as the locations of peak candidates, 60 sampling point locations are identified as the locations of true peaks, and 3178 sampling point locations are identified as the locations of false peaks.
The double eye blink signals have five signals, 80-s length per signal, 20,480 sampling points per signal, and each signal containing eight known peak points and some additional signal patterns. The additional signal patterns are the edge transitions that represent the horizontal eye movements. The signals occasionally contain a peak of the single eye blink. The total training and testing sampling points are 51,200 and 51,200, respectively. From the total sampling points, 4662 sampling point locations are identified as the locations of peak candidates, 40 sampling point locations are identified as the locations of true peaks, and 4622 sampling point locations are identified as the locations of false peaks. Figure 1c shows the eye movement signals. The eye movement signals have 40 signals of C3 and C4 channels, 10-s length per signal, 2560 sampling points per signal, and each signal containing one known actual peak point location. The known peak pattern in this signal represents the horizontal eye gaze direction, either to the left or the right. In total, the data collection of this signal has 40-s length and 102,400 sampling points. From 102,400 sampling points, 3881 candidate peak locations were recognized where the known actual peak point locations are 40 and the remaining sampling points are the known actual non-peak point location.
From the collected raw data of the three EEG signals, 11,781 peak candidate samples with their associated features were archived as EEG data for experiments. From 11,781 peak candidate samples, 140 were assigned as true peaks and the other 11,461 were assigned as false peaks.

Epileptic EEG data
The second data used in this study is available and published in Bonn University EEG database (Andrzejak et al. 2001). The EEG recording was prepared using standard 10-20 electrode placement system. The datasets have five different sets, which are named as set A, set B, set C, set D, and set E. Each set contains 100 EEG segments that were selected from continuous multi-channel EEG recordings after removing muscle activity or eye movement artifacts. Each EEG segment consists of 4097 sampling points and the duration is about 23.6 s. Sets A and B consist of EEG segments taken from surface EEG recording collected from five healthy subjects. Subjects were relaxed in an awaken state with eyes open (A) and eyes closed (B), respectively. Sets C, D, and E were taken from EEG archive of presurgical diagnosis. Segments in set D were recorded from the epileptogenic zone. Set C is recorded from hippocampal formation of opposite hemisphere of brain. Sets C and D contain only activity measured during epileptic-free intervals. Set E contains only epileptic events. Data is recorded within 128-channel amplifier system and digitized at 173.61 Hz sampling rate and 12 bit A/D resolution. To select the EEG signal of desired band a band-pass filter having a pass band of 0.53-40 Hz (12 dB/oct) was used.
In this study, only set A and set E were used. Set A represents as non-epileptic peak events while set E denotes as epileptic peak events. From the collected EEG raw data of the two sets EEG signals (set A and set E), 20,000 peak candidate samples with their associated features were archived as EEG data for experiments. From 20,000 peak candidate samples, 10,000 were assigned as epileptic peaks event from set E. The other 10,000 were assigned as non-epileptic peaks event from set A. 100 peak candidate samples were randomly selected from each segment of both set. The four-fold cross-validation process is used to produce four groups of EEG data. The class distribution of the peak candidate sample and event is summarized in Table 2.

Methods
The methods for peak detection consist of three main processes: (1) feature extraction, (2) feature selection, and (3) classification. In feature extraction stage, three-points sliding window method (Dumpala et al. 1982;Billauer 2012) is employed to identify all possible peak candidates. The AMSKF feature selector is used to select the best combination of features for all possible peak candidates. All identified peak candidates with the selected associated features are then classified by the NNRW classifier. The choice of classification method was supported by two reasons: (1) the NNRW provides fast learning speed. (2) The fast learning speed capability in the proposed AMSKF technique can minimize the computational complexity.

Feature extraction
So far, to the best of our knowledge, only four models in the time domain analysis have typically been used in various event-related signals for peak classification (e.g., Dumpala et al. 1982;Acir and Guzelis 2004;Liu et al. 2002;Dingle et al. 1993). In general, all existing peak models (i.e., Dumpala, Acir, Liu, and Dingle models) have their associated features. All 16 peak features of the existing models can be calculated using the defined eight parameter points as shown in Fig. 2. After the ith candidate peak point, PP i , and the two associated valley points, VP1 i and VP2 i , are identified using three-points sliding window method (Dumpala et al. 1982;Billauer 2012), the other five parameter points {i.e., the half point at first half wave (HP1 i ), the half point at second half wave (HP2 i ), the turning point at first half wave (TP1 i ), the turning point at second half wave (TP2 i ), and the moving average curve point [MAC(PP i )]} can be identified. For example, the half point at first half wave can be defined as the point located in the middle between the PP i and VP1 i while the half point at the second half wave as the point based in the midst between the PP i and VP2 i . The turning point can be recognized when the slope decreases more than 50 % as compared to the slope of the preceding point. The MAC(PP i ) point is located at the intersection between the PP i and MAC(PP i ) points.
After all eights parameter points are identified, 16 peak features are then calculated based on the listed equation in Table 3. All peak features can be categorized into three groups, namely amplitude, width, and slope, resulting in five different amplitudes (i.e., f 1 , f 2 , f 3 , f 4 , f 5 ), seven different widths (i.e., f 6 , f 7 , f 8 , f 9 , f 10 , f 11 , f 12 ), and four different slopes (i.e., f 13 , f 14 , f 15 , f 16 ). The descriptions of all the 16 features are also explained in Table 3. Table 4 presents the list of different peak models with their associated features. The

Neural network with random weights (NNRW) classifier
The NNRW classifier has recently gained attention as a fast learning and generalized technique for classification (Cao et al. 2016;Lang et al. 2015). The fundamental aspect of this method is that the NNRW can be represented as a linear system (Schmidt 1992). The linear system of NNRW is mathematically modeled as H β = T where β is the L × m matrix of output weights and T is the N × m matrix of target outputs. m is the number of output neurons. The β and T matrixes are denoted as  Amplitudes Peak-to-peak amplitude of the first half wave Amplitude between the magnitude of peak and the magnitude of valley at the first half wave Peak-to-peak amplitude of the second half wave Amplitude between the magnitude of peak and the magnitude of valley of the second half wave Turning point amplitude of the first half wave Amplitude between the magnitude of peak and the magnitude of turning point at the first half wave Turning point amplitude at the second half wave Amplitude between the magnitude of peak and the magnitude of turning point at the second half wave Moving average amplitude Slope between a peak point and valley point at the first half wave Peak slope at the second half wave Slope between a peak point and valley point at the second half wave Turning point slope at the first half wave The slope between peak point and turning point at the first half wave Turning point slope at the second half wave The slope between peak point and turning point at the second half wave and respectively. The output function of NNRW classifier of a given unknown sample, x can be mathematically described as fc(x) = h(x)β. The output matrix of the hidden layer, H, is calculated as follows: where g is an activation function of the hidden neuron, x is the N × L matrix of inputs, a is the d × L matrix of random input weights, b is the 1 × L matrix of random biases in the hidden layer, N is an arbitrary distinct sample, L is the number of hidden neurons (L = 1000 in this study), and d is the number of inputs (where d depends on the number of the selected features in this study). The ith column of H is the output of the ith hidden neuron with respect to inputs x 1 , x 2 , until x d . The sigmoidal function g(x) = 1 (1 + e −x ) was used in this study as an activation function in the hidden layer for normalization while a linear function is located inside the neuron in the output layer.
To find the least square solution, β of the linear system, H β = T, the minimum-norm least-squares solution is computed as follows: It is well known that the smallest norm least-squares solution of Eq. (4) is where H + is the Moore-Penrose pseudo-inverse of H. The summary of the training stages of the NNRW classifier is listed as follows: Stage 1 Assign randomly the input weights, a i and biases in the hidden neurons, b i . Stage 2 Calculate the output matrix of the hidden layer, H. Stage 3 Calculate the output weights, β = H + T.
In the output layer, two neurons are used in the network to classify the output into two classes (output): class 1 and class 0. For two classes (m > 1), the predicted class label is the ith number of the output neurons which the maximum value of output neuron. The predicted class label of a given unknown sample x is defined as follows. ( The performance of the classifier is evaluated using a four-fold cross-validation process. The four-fold cross-validation accuracy of the classifier is computed using Gmean (Guo et al. 2008). The Gmean is calculated as follows: where any true peak (TP) is the correctly detected apex point of a peak candidate, a true non-peak (TN) is any correctly detected non-peak point of a peak candidate, a false peak (FP) is an incorrectly designated non-peak point of a peak candidate, a false non-peak (FN) is any incorrectly detected true peak point of peak candidate, TPR is the true peak rate, and TNR is the true non-peak rate.

Simulated Kalman filter (SKF) for continuous optimization problems
The SKF algorithm  was originally invented for solving continuous optimization problems. The algorithm follows several steps as shown in Fig. 3: (1) generate an initial population, (2) calculation of the fitness evaluation function for each agent, (3) update the best fitness value among agents at every iteration (X best ) and the best solution compared to the current X best (X true ), (4) perform state prediction, measurement, and estimation, and (5) perform termination based on a stopping criterion.
In the initialization step, several initial SKF parameters such as the initial value of error covariance estimate, P(0), the process noise value, Q, and the measurement noise value, R, are initialized. Further settings, such as, the number of n agents and a maximum number of iterations, t max , are also determined. The states values of each agent are given randomly within a specific interval.
Next, the fitness evaluation function is computed to obtain initial solutions for every agent. The best fitness value among each agent at every iteration t, X best (t) can be either in the maximization problem, max i∈ 1,...,n fit((X(t)) or minimization problem min i∈ 1,...,n fit((X(t)).
The X best (t) value at every iteration t is compared and the best among the X best (t) value, which is X true is updated. For a maximization problem, X true is only updated when X best (t) at current iteration is greater than X true . Whereas, for a minimization problem, X true is only updated when X best (t) at current iteration is lower than X true .
Referring to Fig. 4, the next following steps including the state prediction, measurement, and estimation. The state prediction follows the following equations: where, X i (t − 1) and X i (t|t − 1) are the previous state and transition state, respectively. P(t|t − 1) and P(t − 1) are previous error covariant estimate and transition error covariant estimate, respectively.
In the state measurement step, the following equation, Z i (t), is used, which gives some feedbacks to the estimation process.
In Eq. (12), the sin (rand × 2π ) term offers the stochastic element of SKF algorithm which having a random probability distribution to the measurement value and rand is a uniformly distributed random number in the range of [0 1].
Next, the Kalman gain, K(t), is computed based on the calculated value of the transition error covariant estimate, P(t|t − 1) and the measurement noise value, R. The equation of K(t) is given as follows.
Here, the equation for estimating the next state, X i (t), is given in Eq. (14) and the error covariant is updated based on Eq. (15). Finally, the processes are iteratively looped until the maximum number of iteration is reached.

Angle modulated simulated Kalman filter (AMSKF) for discrete optimization problems
For solving discrete optimization problems, the angle modulated concept is embedded into SKF algorithm (Md Yusof et al. 2016). Referring to Fig. 4, additional two steps of the angle modulated into SKF are described as follows. After the initialization step, the continuous signals, g(x) with four coefficient parameters (a, b, c, and d) are generated for each agent. So, the state of the ith agent in a population at iteration t is denoted as X i (t) = a i , b i , c i , d i . As mentioned before, the state values which are a, b, c, and d are given randomly in an initial stage. The function g(x) with the four coefficient parameters is defined as follows, An example plot of function, g(x) for the case of a = 0, b = 1, c = 1, and d = 0 is given in Fig. 5. From the signals, the sampling time, T, is chosen to generate a bit string of length n in the next step. The bit 1 is generated when g(x) value is greater than 0 while, the bit 0 is generated when g(x) value is lower than 0. The length of the bit string (15) P(t) = (1 − K (t)) × P(t|t − 1) depends on the given problem. For example, if the length of the full feature set is 100, so the length of the bit string is 100. The generated bit string of each agent is employed to calculate the fitness value for each agent. Then, AMSKF follows similar steps as SKF until it returns the final solution. Using the angle modulated approach, the AMSKF algorithm only tunes the four coefficient parameters for getting the best solution.

The proposed AMSKF feature selection algorithm
The proposed feature selection algorithm for EEG signals peak detection is based on AMSKF algorithm. Also, the NNRW classifier is employed for peak classification. The combination of both methods is illustrated in the flowchart as shown in Fig. 6.
From Fig. 6, the proposed AMSKF technique begins with initialization of a population and then calculation of a g(x) function. The maximum number of iteration was set to 500 and the number of agents was set to 10. The initial value of the error covariance estimate, P, process noise value, Q, and measurement noise value, R, are 10,000, 0.5, and 0.5, respectively. To employ AMSKF algorithm for feature selection in EEG peak classification, a total of 16-bit string is generated since the selection of one feature is determined by one-bit value. If AMSKF assigns bit value 1 to an ith feature, the ith feature is selected. Otherwise, the ith feature is not selected.
In the calculation process of the fitness evaluation function, the selected features are used to prepare the training and validation sets, as shown in Fig. 6. To calculate the fitness evaluation function, at first, the classifier has to be trained by the given training data. Then, the trained classifier is tested using the validation set. The detection performance of the training and validation sets are computed based on Gmean (Guo et al. 2008). The Gmean of validation set is set as fitness value for AMSKF algorithm.
In Fig. 6, after fitness value is calculated, the process continues to the next following processes; update X best (t) and X true , state measurement, state prediction, and state estimation. Next, new 16 bits solutions are determined and those processes are looped until maximum iteration is reached. Finally, the best peak model associated with the NNRW was obtained.

Experimental results and discussions
In this section, three main experiments were conducted. The first experiment aimed to investigate the classification performance of the individual NNRW under various number of hidden neurons. This experiment was also evaluated the performance of the individual NNRW over the four existing peak models. The optimum number of hidden neurons was selected to perform the experiment of the proposed AMSKF technique. The second experiment was assigned to study the search capability of the proposed AMSKF technique to find the best combination of peak features. The first and second experiments were conducted on eye event-related EEG data. The third experiment was Fig. 6 Flowchart of the proposed AMSKF feature selection algorithm conducted to apply the best combination of peak features on epileptic EEG classification events application.

Performance of NNRW under various number of hidden neurons
One advantage of the NNRW classifier is that the learning algorithm is less difficult than other conventional neural network classifier (i.e., gradient descent, Levenberg-Marquart, and particle swarm optimization-based learning algorithms). So that, with an enormous number of hidden neurons is possible to perform using the NNRW classifier. However, the optimal number of neurons of the NNRW classifier is required to be firstly identified for offering better generalization ability of the NNRW classifier. To find the optimal number of hidden neuron, an experiment is executed by varying the number of hidden neuron from 100 to 1200 in steps of 100.
To prepare the experiment data of the individual NNRW classifier, the EEG dataset are randomly divided into four groups, equally distributes the two-class ratio, by fourfold cross-validation process. Every group alternately assigned as the testing set and the other three groups are combined to be a training set. The mean value of testing results from the four groups is calculated. This experiment is repeated 30 times, so that the mean of the training and testing results can be measured as shown in Table 5.
The variation of testing accuracy with respect to a different number of hidden neurons is graphically illustrated in Fig. 7. Referring to Fig. 3, the testing accuracy of all four peak models increased up to 1200 neurons. Three peak models (e.g., Dumpala, Acir, and Liu models) except Dingle model offer the optimal accuracy when the numbers of hidden neurons are between 900 and 1200. Hence, the number of hidden neurons for our experiment was set to 1000. The final results in Fig. 7 indicate that the selection of the best combination features is necessary for providing the best and generalizes performance in EEG signals peak classification.

Experimental results for AMSKF feature selection algorithm
To prepare the experiment data of the proposed AMSKF feature selection algorithm, the four-fold cross-validation process is used to produce four groups of EEG data: each group consists of training and testing sets. Next, the training set is randomly divided into two: training and validation sets. Both datasets are equally distributed to the two-class ratio. The ratio size of training and validation was set to 0.5:0.5. The testing set is utilized as unseen EEG data. After all four groups are evaluated by the algorithm, the maximum value of testing results from the four groups is measured and the best peak model is recorded. This entire four-fold cross validation process is repeated 30 times to obtain the final statistical results (e.g., average, maximum, minimum, and standard deviation) for this experiment. Table 6 shows the 30 independent runs experimental results of the proposed AMSKF feature selection algorithm using the EEG data that is collected from the three recorded EEG signals (i.e., single eye blink, double eye blink, and eye movement signals). Table 6 gives the best peak model with the highest training, validation, and testing accuracies for the NNRW classifier at every run. In this experiment, the best-generalized peak model is chosen based on the maximum accuracy of testing data over 30 runs.
In Table 6, it is found that the feature set of the best peak model is f 1 , f 2 , f 7 , f 8 , f 9 , f 10 , f 11 , f 12 , f 13 , f 14 , and f 15 , with 72.7 % of testing accuracy. From those associated features, two of features are peak amplitudes (e.g., f 1 and f 2 ), six of features are peak widths (e.g., f 7 , f 8 , f 9 , f 10 , f 11 , and f 12 ), and three of features are peak slopes (e.g., f 13 , f 14 , and f 15 ). For overall of testing accuracy, the average, maximum, minimum, and STDEV over 30 runs are 61.7, 72.7, 53, and 4.1 %, respectively.  Table 6 Best testing results over 30 runs using the proposed AMSKF feature selection algorithm on eye event-related EEG data The results in Table 6 show that the higher value of fitness of validation set cannot produce the best classification accuracy of testing set as expected. Also, the feature set that contain lower feature subset length cannot give better performance. These results have exhibited that the peak event-related EEG signals are very problem dependant.
In this experiment, the proposed AMSKF algorithm was iteratively executed with maximum 500 iterations. To observe the result of convergence of the proposed AMSKF, one example is taken from this experiment, as illustrated in Fig. 8. From Fig. 8, it can be seen that the AMSKF algorithm can reach convergence within 20 iterations.
To evaluate the effectiveness of the proposed algorithm and the selected best combination of features, some comparisons are performed regarding percentage of the testing classification accuracy between the results of the existing four peak detection models and with the proposed AMSKF model. The comparison results are comparatively presented in Table 7. For a fair performance evaluation, the four existing peak models with their associated features are performed using the similar parameters setting of the NNRW of the proposed AMSKF technique.
The experimental results in Table 6 are obtained from the experiment in "Performance of NNRW under various number of hidden neurons" section, with the hidden neuron of the NNRW is 1000. The performance of the best combination of features is taken from the maximum testing accuracy in Table 6. As seen from Table 7, the performance of the best combination of features that are produced by AMSKF algorithm exceeds the performance of the other existing four models.
In Table 7, it can be seen that there is a large different value between training and testing accuracies. The proposed method of the AMSKF model has only achieved 73 % of testing accuracy. In this study, the ratio between true peak and false peak is 140:11,461. This means the dataset has extremely imbalanced dataset ratio. In this case, the conventional NNRW classifier may fail to offer high accuracy of performance for imbalanced dataset problem. Other contributing factor is the collected EEG data is affected by various noises and the peak features have a large different value from one subject to another subject. This factor is the cause to the high variation of peak features. The consequent of this factor is that the NNRW classifier may fail to correctly classify the true peak and false peak.
The results of the peak models are further analyzed by using nonparametric Friedman statistical analysis. The statistical analysis is required to demonstrate the significant  Table 8 shows the average ranking of Friedman's test of the Dumpala, Acir, Liu, Dingle, and AMSKF models. The statistical results show that the lowest average ranking is obtained by AMSKF model that represents ranking first among the five models for EEG data. While, the NNRW with Acir model ranking second, the NNRW with Dumpala model ranking third, the NNRW with Liu model ranking fourth, and the NNRW with Dingle model ranking fifth.
Next, p values for unadjusted values and adjusted p values for Nemenyi, Holm's, Shaffer, and Bergmann-Hommel test for N × N comparisons for all possible ten pairs of model with the peak models are presented in Table 9. The p values below 0.05 represent that the particular peak model differ significantly in testing accuracy. The p values below 0.05 were marked with the italic font.
From Table 9, it can be observed that p values for unadjusted values and adjusted p values for Holm's, Shaffer and Bergmann-Hommel offer for eliminating nine hypotheses. However, Nemenyi lets for eliminating only seven hypotheses. Based on unadjusted p values and adjusted p values for Nemenyi, Holm's, Shaffer, and Bergmann-Hommel test, the AMSKF model revealed significantly better performance than other models.

Application of the proposed AMSKF model to epileptic and non-epileptic EEG event classification
Two EEG events have been assigned which are epileptic and non-epileptic events. 100 non-epileptic events are collected from set A while 100 epileptic peak events from set E. Each EEG event is a segment that consists of 4097 sampling points and the duration is about 23.6 s. The best combination of peak feature and the trained NNRW classifier with 500 hidden neurons are used to perform the classification. To distinguish between epileptic and non-epileptic events, the voting method is used. The epileptic event is recognized when more than 50 peaks are identified in within an event. Whereas, the nonepileptic event is recognized when lower than 50 peaks are identified. Table 10 demonstrates the confusion matrix of epileptic and non-epileptic event classification using the proposed AMSKF model. It can be observed that the AMSKF model obtains 98 % of total accuracy, with 100 % of the non-epileptic event rate, and 96 % of the epileptic event rate. There are four misclassifications of epileptic event.
The performance comparisons have been done to observe the efficiency of the proposed method. Table 11 gives the classification accuracy of this study and the existing methods on Bonn University EEG database. Referring to Table 11, the classification accuracy of this study using the NNRW method is lower than AIRS-PCA-FFT and Wavelet-ANFIS methods. However, the classification accuracy of the NNRW using AMSKF model is higher than other methods.
An example of epileptic and non-epileptic events classification is illustrated in Fig. 9. As can be seen that, there are more than 50 peaks (red dotted) have been identified in epileptic segment (the right side) within the region from 4000 to 8000 sampling points. Figure 10 shows an example of misclassification of epileptic event in record S083. The number of detected peaks obviously can be seen is lower than 50. Consequently, the actual epileptic event is classified as non-epileptic event.

Conclusions and future works
In this study, a new generalized peak model for EEG signals peak classification has been identified using a novel AMSKF feature selection approach. The proposed algorithm considered 11,781 peak candidate samples of real EEG data, which were collected from 30 healthy subjects instructed to direct their single eye blink, double eye blink, and horizontal eye gaze. The detection performance of the NNRW with four different peak detection models and new AMSKF model are compared. In general, the experimental results showed that the accuracy of the NNRW with new AMSKF model is better than the NNRW with other models. The statistical analysis showed that the detection Table 9 Adjusted p value for N × N comparisons of peak models over 30runs The p values below 0.05 were marked with the italic font  performance of the NNRW with the new AMSKF model is significantly better in terms of testing accuracy compared to other models. A published EEG database from Bonn University was selected to evaluate the proposed method and at the same time applied the relevant combination of peak features