Skip to main content

Pathological brain detection in MRI scanning by wavelet packet Tsallis entropy and fuzzy support vector machine


An computer-aided diagnosis system of pathological brain detection (PBD) is important for help physicians interpret and analyze medical images. We proposed a novel automatic PBD to distinguish pathological brains from healthy brains in magnetic resonance imaging scanning in this paper. The proposed method simplified the PBD problem to a binary classification task. We extracted the wavelet packet Tsallis entropy (WPTE) from each brain image. The WPTE is the Tsallis entropy of the coefficients of the discrete wavelet packet transform. The, the features were submitted to the fuzzy support vector machine (FSVM). We tested the proposed diagnosis method on 3 benchmark datasets with different sizes. A ten runs of K-fold stratified cross validation was carried out. The results demonstrated that the proposed WPTE + FSVM method excelled 17 state-of-the-art methods w.r.t. classification accuracy. The WPTE is superior to discrete wavelet transform. The Tsallis entropy performs better than Shannon entropy. The FSVM excels standard SVM. In closing, the proposed method “WPTE + FSVM” is effective in PBD.


Pathological brain detection (PBD) was of essential importance. It can help physicians make decisions, and to avoid wrong judgements on subjects. Magnetic resonance imaging (MRI) features in high-resolution on soft tissues in the subjects’ brains, generating a mass dataset (Zhang et al. 2015a). At present, there are numerous works on using brain MR images for solving PBD problems (Goh et al. 2014; Yu et al. 2015b).

Recent computer-aided diagnosis (CAD) systems of PBD consisted of two types (LaViolette et al. 2014): to detect pathological from healthy brains, and to differentiate severity degrees. In this study, we research on the former one. A type of promising approach is to use discrete wavelet transform (DWT) that presents the solutions of simultaneous analysis in domains of both time and frequency (Lee et al. 2013; Dong et al. 2014; Zhang et al. 2015c; Yu et al. 2015c). DWT and its variants achieved good results; however, DWT are translation-variant, hence, the coefficients behaved unpredictably if the input signal is translated slightly. In PBD problem, the subject’s head usually have slightly move during the scan, which will cause the translation of MR images.

Another problems is the classifier. Current scholars tend to use either artificial neural network (ANN) or support vector machine (SVM). Nevertheless, both of them are sensitive to outliers and noises. That means, if the training set contains noises or outliers, the classifier will still treat it as important as normal data.

We suggested three improvements with the aim of solving above problems. First, we employed the discrete version of wavelet packet transform (WPT), which is an extension of standard discrete wavelet transform (DWT). Second, we introduced Tsallis entropy (TE), to replace with Shannon entropy (SE). (iii) We introduced the fuzzy support vector machine (FSVM) that combines the SVM with fuzzy logic approach (Ashkezari et al. 2013) and has the advantage of reducing the effect from outliers and noises.

The structure of the rest is organized as follows. "State-of-the-art" presents the state-of-the-art. "Materials" introduces the materials used in this study.  “Feature extraction" discusses the features. "Classifier" gives the classifier.  "Implementation and experiments" shows the implementation of the whole method, and designs the experiments. "Results and dicussion" contains the results and discussions. "Conclusion and future research" offers conclusion and future research. We explain the nomenclatures in Abbreviations at the end of the paper.


Chaplot et al. (2006) was the first to solve PBD problem. They used the approximation coefficients from DWT, and utilized the support vector machine (SVM) and self-organizing map (SOM). El-Dahshan et al. (2010) extracted all coefficients of all subbands of a three-level discrete wavelet transform (DWT). Then, they reduced the size of features by principal component analysis (PCA). Finally, two classifiers, K-nearest neighbors (KNN) and feed-forward back-propagation ANN (FP-ANN), were employed. Wu and Wang (2011) followed EI-Dahshan’s method, but suggest to use a feed-forward neural network (FNN) as the classifier, which was trained by scaled chaotic artificial bee colony (SCABC). Dong et al. (2011) proposed to employed scaled conjugate gradient (SCG) method to take place of SCABC. Zhang and Wu (2012) suggested to utilize kernel support vector machine (KSVM). 3 kernels were provided such as homogeneous and inhomogeneous polynomial, and radial basis function (RBF). Das et al. (2013) developed a novel method as Ripplet transform (RT) + principal component analysis (PCA) + least square support vector machine (LS-SVM). Their five-fold cross validation results showed promising classification accuracies. Saritha et al. (2013) proposed a novel feature of wavelet-entropy (WE), and employed spider-web plots (SWP) to further reduce features. Afterwards, they used the probabilistic neural network (PNN). Yu et al. (2015d) commented on Saritha’s paper and stated that dropping the SWP can obtain the same results. Zhang et al. (2013) suggested to use particle swarm optimization to train KSVM. Padma and Sukanesh (2014) used combined wavelet statistical texture features, to segment and classify AD benign and malignant tumor slices. El-Dahshan et al. (2014) used the feedback pulse-coupled neural network for image segmentation, the DWT for features extraction, the PCA for reducing the dimensionality of the wavelet coefficients, and the FBPNN to classify inputs into normal or abnormal. Wang et al. (2014) used kernel support vector machine decision tree. Zhou et al. (2015) used wavelet-entropy as the feature space, then they employed a Naive Bayes classifier (NBC) classification method. Their results over 64 images showed that the sensitivity of the classifier was 94.50 %, the specificity 91.70 %, the overall accuracy 92.60 %. Damodharan and Raghavan (2015) combined tissue segmentation and neural network for brain tumor detection. Yang et al. (2015) selected wavelet-energy as the features, and introduced biogeography-based optimization (BBO) to train the SVM. Their method reached 97.78 % accuracy on 90 T2-weighted MR brain images. Nazir et al. (2015) suggested to use filters for the removal of noises, and extracted color moments as mean features. Finally, they achieved an overall accuracy of 91.8 %. Dong et al. (2015) suggested to use a 3D eigenbrain method to detect subjects and brain regions related to AD. The accuracy achieved 92.36 ± 0.94. Harikumar and Kumar (2015) analyzed the performance of ANN, in terms of classification of medical images, using wavelets as feature extractor. Their classification accuracy achieved 96 %. Wang et al. (2015a) suggested to use stationary wavelet transform (SWT) to replace DWT, and then they proposed a Hybridization of Particle swarm optimization and Artificial bee colony (HPA) algorithm to train the classifier. Farzan et al. (2015) used longitudinal percentage of brain volume changes (PBVC) in two-year follow up and its intermediate counterparts in early 6-month and late 18-month as features. Their experiment results obtained accuracy of 91.7 %. Munteanu et al. (2015) employed Proton Magnetic Resonance Spectroscopy (MRS) data, with the aim of detecting MCI and AD. They used a single-layer perceptron with only two spectroscopic voxel volumes obtained in the left hippocampus, with an AUROC value of 0.866. Zhang et al. (2015d) combined wavelet entropy with Hu moment invariants (HMI). The feature number is in total 14. They also used GEPSVM as the classifier.


Magnetic resonance brain image dataset

Three benchmark magnetic resonance brain image datasets with various image numbers: D-66, D-160, and D-255, were were downloaded from the website of Harvard University. Those data contain T2-weighted images obtained along axial plane. Their sizes are all 256 × 256. Those three datasets are commonly used in PBD test. Except healthy brain images, D-66 and D-160 consisted of 7 types of brain diseases: AD, AD plus visual agnosia, glioma, meningioma, sarcoma, Huntington’s disease (HD), and Pick’s disease (PiD). D-255 introduced four other diseases as cerebral toxoplasmosis, subdural hematoma (SDH), multiple sclerosis (MS), and herpes encephalitis. Figure 1 shows samples of brain MR images.

Fig. 1
figure 1

Sample of magnetic resonance brain image dataset a Healthy brain, b Meningioma, c Glioma, d Sarcoma, e SDH, f PiD, g AD, h HD, i AD with visual agnosia, j Herpes encephalitis, k Cerebral toxoplasmosis, l MS

The costs of two kinds of misclassifications are different. The cost of predicting a pathological brain to a healthy one is very serious. It will defer the necessary treatment, whereas the misprediction of a healthy brain to a pathological one can be second-checked by other techniques. Hence, we intentionally create the three imbalanced datasets, which covers more pathological brains than usual, so the PBD system is biased to detect pathological ones, with the aim of addressing this cost-sensitive task.

Statistical setting

Cross validation (CV) is commonly used for statistical test. Stratification is embedded to CV so that each fold contains nearly the same class distributions. In this work, six-fold stratified CV (SCV) was utilized for the smallest dataset (D-66), and five-fold SCV for the other datasets (D-160 and D-255). Table 1 lists the SCV setting of all datasets.

Table 1 SCV setting of our datasets

Feature extraction

Co-registration was unnecessary since many publications about PBD did not use it with excellent classification results, comparative with the results that employed coregistration (Ribbens et al. 2014; Schwarz and Kasparek 2014).

Wavelet packet transform

Compared to standard discrete wavelet transform (DWT), the wavelet packet transform (WPT) is an extension where the signal is passed through more filters than DWT. The DWT calculate each level by passing only the previous approximation coefficients to quadrature mirror filters (QMF). Nevertheless, the WPT passes all coefficients (both approximation and detail) through QMF to create a full binary tree. Therefore, more features can be generated by WPT at different levels to obtain more information. The mathematical equation of WPT is given below

$$S_{p}^{m,d} = \int_{ - \infty }^{\infty } {x(t)\psi_{m} (2^{ - d} t - p){\text{d}}t}$$

where m represents the index of channel, p the position parameter, d the decomposition level, ψ the wavelet function, and S the decomposition coefficients. 2d sequences will be yielded at the d level. The decomposition equations of next level is provided as

$$S_{k}^{2m,d + 1} = \sum\limits_{p \in Z} {h(p - 2k)S_{p}^{m,d} }$$
$$S_{k}^{2m + 1,d + 1} = \sum\limits_{p \in Z} {l(p - 2k)S_{p}^{m,d} }$$

Suppose a d-level decomposition, DWT produces (3d + 1) coefficient sets, while the WPT produces 2d different coefficients sets. Note that the number of coefficients of WPT is still the same of DWT, because of the downsampling process (Fig. 2).

Fig. 2
figure 2

Flowchart of 2-level 1D-WPT

Shannon and Tsallis entropy

Shannon entropy (SE) is defined as a measure of uncertainty regarding the information content (IC):

$$E = - \sum\limits_{k = 1}^{Z} {p_{k} \log_{2} (p_{k} )}$$

here E represents the entropy, Z the total number of greylevels, k the greylevel, and pk the probability of k. Shannon entropy can merely describes scenarios with simple effective microscopic interactions and short-ranged microscopic memory (Campos 2010). Assume a physical system can be broken down into two independent subsystems X and Y, then the Shannon entropy (SE) exists the additivity property as

$$E(X + Y) = E(X) + E(Y)$$

Nevertheless, realistic scenarios are usually usually involved with long-time memory and long-range interactions, therefore, Tsallis (2009) proposed a generalization of SE. He termed it as Tsallis entropy (TE) with following form

$$E_{q} = \frac{{\sum\limits_{k = 1}^{q} {(p_{k} )^{q} } - 1}}{1 - q}$$

here q is a real number, representing the nonextensivity degree. For a statistical dependent system, the Tsallis entropy (TE) is defined as (Zhang and Wu 2011)

$$E_{q} (X + Y) = E_{q} (X) + E_{q} (Y) + (1 - q) \times E_{q} (X) \times E_{q} (Y)$$

This equation obeys the pseudo additivity rule. Further, three different entropies can be deduced and listed in Table 2, when q is assigned with different values (Tsallis 2011). In this study, TE was employed to extract features from 16 subbands of WPT coefficients of MR brain images.

Table 2 Properties of TE change with q

Wavelet packet Tsallis entropy

We employed both Shannon entropy (SE) and Tsallis entropy (TE) to extract wavelet-packet decomposition coefficients. The final extracted features were dubbed as Wavelet Packet Tsallis Entropy (WPTE), which degraded to Wavelet Packet Shannon Entropy (WPSE) when q equals to 1. The pseudocodes of feature extraction were listed in Table 3.

Table 3 Pseudocode of WPTE


Support vector machine

Let us suppose there is an N-size training samples of p-dimensional vector in two classes (−1 or +1), and the goal is to create a (p − 1)-dimensional hyperplane. Assume the dataset takes the form of (Wang et al. 2014)

$$\left\{ {(x_{n} ,y_{n} )|x_{n} \in {\mathbb{R}}^{p} ,y_{n} \in \{ + 1, -1\} } \right\},n = 1,2,3,..,N$$

where y n takes the value of −1 for class −1, or +1 for class +1. The x n denotes a training point that is a p-dimensional vector (Zhang et al. 2013). The maximum-margin hyperplane that separates the two classes is the desired SVM. Considering any hyperplane is in the form of \({\mathbf{wx}} - {\mathbf{b}} = 0\), we need to select the optimal b and w, with the aim of maximizing the distance between the two parallel hyperplanes, while it can yet separate the data of the two classes.

$$\begin{array} {l} \mathop {\hbox{min} }\limits_{{{\mathbf{b}},{\mathbf{w}}}} \frac{1}{2}\left\| {\mathbf{w}} \right\|^{2} \hfill \\ {\text{s}} . {\text{t}} .\, { }y_{n} \left( {{\mathbf{w}}x_{n} - {\mathbf{b}}} \right) \ge 1, \,\, n = 1,2,3, \ldots ,N \hfill \\ \end{array}$$

Positive slack vector ξ = (ξ 1, …, ξ n , …, ξ N ) are utilized to measure the misclassification rate of sample x n (the distance between the margin and the vectors x n on the wrong side). The optimal hyperplane can be deduced by solving:

$$\begin{array}{*{20}l} \mathop {\hbox{min} }\limits_{{{\mathbf{w}},\xi ,{\mathbf{b}}}} \frac{1}{2}\left\| {\mathbf{w}} \right\|^{2} + Ce^{T} {\varvec{\upxi}} \hfill \\ s.t. \, \left\{ {\begin{array}{*{20}c} {y_{n} \left( {{\mathbf{w}}^{T} x_{n} - {\mathbf{b}}} \right) \ge 1 - \xi_{n} } \\ {\xi_{n} \ge 0} \\ \end{array} } \right., \,\, n = 1, \ldots ,N \hfill \\ \end{array}$$

where C represents the error penalty and e a vector of ones of N-dimension. Therefore, the optimization turns to a trade-off between a large margin and a small error penalty. The constraint optimization problem can be solved using “Lagrange multiplier” as

$$\mathop {\hbox{min} }\limits_{{{\mathbf{w}},\xi ,{\mathbf{b}}}} \mathop {\hbox{max} }\limits_{\alpha ,\beta } \left\{ {\frac{1}{2}\left\| {\mathbf{w}} \right\|^{2} + Ce^{T} {\varvec{\upxi}} - \sum\limits_{n = 1}^{N} {\alpha_{n} \left[ {y_{n} \left( {{\mathbf{w}}^{T} x_{n} - {\mathbf{b}}} \right) - 1 + \xi_{n} } \right] - \sum\limits_{n = 1}^{N} {\beta_{n} \xi_{n} } } } \right\}$$

The min–max problem is not easy to solve, so dual form technique is commonly proposed to solve it as

$$\begin{array}{l} \mathop {\hbox{max} }\limits_{\alpha } \sum\limits_{n = 1}^{N} {\alpha_{n} } - \frac{1}{2}\sum\limits_{n = 1}^{N} {\sum\limits_{m = 1}^{N} {\alpha_{m} \alpha_{n} y_{m} y_{n} x_{m}^{T} x_{n} } } \hfill \\ {\text{s}} . {\text{t}} .\left\{ {\begin{array}{*{20}c} {0 \le \alpha_{n} \le C} \\ {\sum\limits_{n = 1}^{N} {\alpha_{n} y_{n} } = 0} \\ \end{array} , \,\, n = 1, \ldots ,N} \right. \hfill \\ \end{array}$$

The key advantage of the dual form function is that the slack variables ξ n vanish from the dual problem, with the constant C appearing only as an additional constraint on the Lagrange multipliers.

Fuzzy SVM

Fuzzy SVM (FSVM) is more effective than standard SVM in predict or classify real-world data, in which a part of training points are less important than other points. We would like to force that the meaningful training points must be classified correctly and meaningless points like noises or outliers can be treated with less weight (Lin and Wang 2002).

FSVM applies a fuzzy membership function (FMF) s to each training data (Xian 2010), so that the training set is transformed into a fuzzy set, which can be expressed as

$$\left\{ {(x_{n} ,s_{n} ,y_{n} )|x_{n} \in {\mathbb{R}}^{p} ,\,0 < s_{n} \le 1,y_{n} \in \{ + 1, - 1\} } \right\},n = 1, \ldots ,N$$

where s n is the altitude of the corresponding training point toward one class and (1 − s n ) is the attitude of meaning less. The optimal hyperplane problem of FSVM is defined as:

$$\begin{array}{l} \mathop {\hbox{min} }\limits_{{{\mathbf{w}},\xi ,{\mathbf{b}}}} \frac{1}{2}\left\| {\mathbf{w}} \right\|^{2} + C{\mathbf{s}}^{T} {\varvec{\upxi}} \hfill \\ {\text{s}} . {\text{t}} . { }\left\{ {\begin{array}{*{20}c} {y_{n} \left( {{\mathbf{w}}^{T} x_{n} - {\mathbf{b}}} \right) \ge 1 - \xi_{n} } \\ {\xi_{n} \ge 0} \\ \end{array} } \right., \, n = 1, \ldots ,N \hfill \\ \end{array}$$

where s = (s 1, s 2, …, s N ) represents the fuzzy membership vector. A smaller s n reduces the effect of the parameter ξ n , such that the corresponding point x n is treated less important. In a similar way, we construct the Lagrangian

$$\mathop {\hbox{min} }\limits_{{{\mathbf{w}},\xi ,{\mathbf{b}}}} \mathop {\hbox{max} }\limits_{\alpha ,\beta } \left\{ {\frac{1}{2}\left\| {\mathbf{w}} \right\|^{2} + C{\mathbf{s}}^{T} {\varvec{\upxi}} - \sum\limits_{n = 1}^{N} {\beta_{n} \xi_{n} } - \sum\limits_{n = 1}^{N} {\alpha_{n} \left[ {y_{n} \left( {{\mathbf{w}}^{T} x_{n} - {\mathbf{b}}} \right) - 1 + \xi_{n} } \right]} } \right\}$$

Again, the dual form is used to transform problem (15) to

$$\begin{aligned} \mathop {\hbox{max} }\limits_{\alpha } \sum\limits_{n = 1}^{N} {\alpha_{n} } - \frac{1}{2}\sum\limits_{n = 1}^{N} {\sum\limits_{m = 1}^{N} {\alpha_{m} \alpha_{n} y_{m} y_{n} x_{m}^{T} x_{n} } } \hfill \\ s.t.\left\{ {\begin{array}{*{20}c} {0 \le \alpha_{n} \le s_{n} C} \\ {\sum\limits_{n = 1}^{N} {\alpha_{n} y_{n} } = 0} \\ \end{array} , \,\, n = 1, \ldots ,N} \right. \hfill \\ \end{aligned}$$

Fuzzy membership

We set the FMF as a distance function between the point and its class center. Suppose the mean of class +1 as x + and the mean of class −1 as x . Then we can get the radius of two classes as

$$r_{ - } = \mathop {\hbox{max} }\limits_{{\{ x_{n} :y = - 1\} }} \left| {x_{ - } - x_{n} } \right|$$
$$r_{ + } = \mathop {\hbox{max} }\limits_{{\{ x_{n} :y = + 1\} }} \left| {x_{ + } - x_{n} } \right|$$

The fuzzy membership s n is defined as a function of the radius and mean of each class (Lin and Wang 2002)

$$s_{n} = \left\{ {\begin{array}{*{20}c} {1 - \left| {x_{ + } - x_{n} } \right|/(r_{ + } + \delta )} & {y_{n} = + 1} \\ {1 - \left| {x_{ - } - x_{n} } \right|/(r_{ - } + \delta )} & {y_{n} = - 1} \\ \end{array} } \right.$$

where δ > 0 is used to guarantee s n  > 0.

Implementation and experiments


Figure 3 shows the diagram of the proposed PBD system. In the offline learning phase, the users expect to select the optimal q (to determine the value of q*), and train the classifier. In the online prediction phase, the users will get the prediction results for each query image.

Fig. 3
figure 3

Diagram of the proposed PBD system

Experiment design

In this study, we developed four different methods. “WPSE + SVM”, “WPSE + FSVM”, “WPTE + SVM”, and “WPTE + FSVM”. Theoretically, the last one will perform the best since WPSE in a special case of WPTE, and FSVM is an extension of SVM with additional ability to reduce influences from noises and outliers.

We need to prove it by experiments. In this work, we designed five tasks. (1) We gave a comparison between DWT and WPT. A healthy brain and a pathological brain were used. We use a 2-level Haar wavelet decomposition. (2) We compared the proposed WPSE and WPTE features with traditional DWT and “DWT + PCA”. All used SVM as classifiers (3) We compared the four proposed classifiers, to check whether FSVM is superior to SVM. (4) We selected the best of proposed methods, and compared it with state-of-the-art approaches. (5) We used grid searching to find the optimal parameter of q.

Results and discussions

The experiments were carried out on the platform of IBM machine with 3 GHz core i3 processor and 8 GB random access memory (RAM), running under Windows 7 operating system (OS). The algorithm was developed by ourselves based on the platform of Matlab 2014a (The Mathworks ©).

WPT versus DWT

In the first experiment, we compared DWT with WPT on a healthy brain and an Alzheimer’s disease brain, respectively. The second column shows the original image, the third column the DWT decomposition results, and the final column the WPT results. Pink colormap is employed for better view (Fig. 4).

Fig. 4
figure 4

Decompositions comparison between DWT and WPT

Feature comparison

In the second experiment, we compared the proposed WPSE and WPTE (q is set to 0.8, please refer to "Optimal parameter q"), with two types of traditional features: (i) DWT and (ii) DWT + PCA. (Note that Chaplot et al. (2006) proposed the DWT + SVM method, Zhang and Wu (2012) proposed DWT + PCA + SVM method). For fair comparison, we choose the same classifier—SVM.

Table 4 shows that both “WPSE + SVM” achieves accuracies of 98.64, 97.12, and 97.02 % on D-66, D-160, and D-255, respectively. The “WPTE + SVM” achieves accuracies of 99.09, 98.94, and 98.39 % over three datasets. The results are better than those obtained either by “DWT + SVM (Chaplot et al. 2006)” or by “DWT + PCA + SVM (Zhang and Wu 2012)”. Therefore, we can conclude that WPSE and WPTE excel traditional feature extraction methods of “DWT” and “DWT + PCA”. Particularly, WPTE is better than WPSE. The reason is three-fold: (1) TE is a generalization of traditional SE (Tsallis 2014), and TE had been successfully applied in brain images (Amaral-Silva et al. 2014; Venkatesan and Parthiban 2014; Khader and Ben Hamza 2011). (2) The combination of TE and wavelet transform had proven to perform better than either TE or DWT in other applications (Hussain 2014; Liu et al. 2014; Chen and Li 2014). (3) Brain images entail long-range interaction and fractal-type structure, because of the self-similarity observed brain structures imaged with a finite resolution, which can be easily extracted by the corresponding wavelet packet coefficients. In summary, there are similarities at different spatial scales in brain images, which makes WPTE more suitable than WPSE in describing brains.

Table 4 Feature comparison with SVM as classifier (K-fold SCV)

Classifier comparison

To compare the classification performance between SVM and FSVM. We set the features as WPSE and WPTE (q = 0.8). Then, we applied both SVM and FSVM for classification. The 10 runs of K-fold SCV results are listed below in Table 5.

Table 5 SVM versus FSVM (10xK-fold SCV)

Results in Table 5 shows that “WPSE + FSVM” obtains accuracies of 99.85, 99.69, 98.94 % over three datasets, which are higher than those obtained by “WPSE + SVM”. The similar results occur between “WPTE + FSVM” and “WPTE + SVM” in the way that the classification accuracy increases after SVM is replaced with FSVM. The reason is FSVM applies a FMF to each training data, so FSVM can reduce the influence of noises and outliers. In addition, the “WPTE + FSVM” performs the best among all four proposed approaches. It will be used as the default proposed method in following text.

Comparison with state-of-the-art

We compared the best proposed method (WPTE + FSVM), with 17 recent proposed methods, which consist of DWT + SOM (Chaplot et al. 2006), DWT + SVM (Chaplot et al. 2006), DWT + SVM + RBF (Chaplot et al. 2006), DWT + SVM + POLY (Chaplot et al. 2006), DWT + PCA + KNN (El-Dahshan et al. 2010), DWT + PCA + FP-ANN (El-Dahshan et al. 2010), DWT + PCA + SCG-FNN (Dong et al. 2011), DWT + PCA + SVM (Zhang and Wu 2012), DWT + PCA + SVM + RBF (Zhang and Wu 2012), DWT + PCA + SVM + IPOL (Zhang and Wu 2012), DWT + PCA + SVM + HPOL (Zhang and Wu 2012), RT + PCA + LS-SVM (Das et al. 2013), DWT + SE + SWP + PNN (Saritha et al. 2013), PCNN + DWT + PCA + BPNN (El-Dahshan et al. 2014), SWT + PCA + IABAP-FNN (Wang et al. 2015a), SWT + PCA + ABC-SPSO-FNN (Wang et al. 2015a), and WE + HMI + GEPSVM (Zhang et al. 2015d).

We averaged the results of 10 runs of K-fold SCV. The comparison results are listed in Table 6, in which some old approaches ran five times in their papers with results extracted from literature (Das et al. 2013). This experiment ran ten times to get more robust results than a five-time run.

Table 6 Classification comparison

The value of q was again assigned with 0.8 (The reason can be found in “Optimal parameter q”). The regularization constant C were obtained via grid-search method.

Table 6 shows the proposed “WPTE + FSVM” performed better than existing state-of-the-art methods, obtaining perfect classification for the first two datasets and an accuracy of 99.49 % for D-255. This demonstrated the effectiveness of FSVM, which can reduce the effect of noise and outliers in the training points, yielding a more reliable hyperplane than standard SVM. The second best classifier is “RT + PCA + LS-SVM” (Das et al. 2013) that achieved 99.39 % for D-255.

Finally, the average evaluations based on 10 runs of the proposed WPTE + FSVM method were listed in Table 7. For D-66 and D-160, the WPTE + FSVM yielded perfect classification. For the D-255, its performance slightly decreased with sensitivity of 99.50 %, specificity of 99.43 %, precision of 99.91 %, and accuracy of 99.49 %.

Table 7 Average evaluation of WPTE + FSVM method based on 10 runs

Optimal parameter q

The parameter q influences the extracted features, so it also influences classification performance. Its value should be no more than 1, since the brain image is subextensive, containing complicated regions. In this final experiment, we varied the value of q in the set of [0.1, 0.2, 0.3, …, 0.1, 1] (Note q = 1 degrades WPTE to WPSE), and ran the offline training for each value. We recorded the average accuracy over 10 runs on the dataset D-255 by the proposed “WPTE + FSVM”. The results are shown in Fig. 5 and Table 8.

Fig. 5
figure 5

Effect of q on average accuracy

Table 8 The average accuracy changes with the value of q

Figure 5 demonstrates the value of q yields slight but discernible effect on average accuracy of 10 runs. As q increases to 0.8, the curve increases gradually till the highest. As q increases to 0.1, the average accuracy decreases sharply. The result again validates that WPTE (q = 0.8) is better than WPSE (q = 1).

This optimal result (q = 0.8) in this work exactly identical to three recent literatures: Sturzbecher et al. (2009), Cabella et al. (2009), and Zhang et al. (2015b). Furthermore, Diniz et al. (2010) found the fact that q = 1.5 for gray matter (GM), 0.1 for white matter (WM), and 0.2 for cerebrospinal fluid (CSF). Here we treat the whole brain as a single, so we must assign a single value to q. The optimal q of 0.8 can be regarded as an average of best q of GM, WM, and CSF.

Discussion on the proposed method

There were three causes to use WPT, TE, and FSVM. (1) WPT yields more features than DWT does. (2) Entropy can efficiently represent the complexity of subband coefficients, and TE is a better feature descriptor for brain structures than SE. (3) FSVM applies a FMF to each training data, so it can reduce the influence of noises and outliers.

The contributions of this work centered in three points: (i) We employed WPTE that offered better information description than WPSE. (ii) We employed FSVM that can deal with noises and outliers compared to plain SVM; and (iv) We proved the proposed “WPTE + FSVM” approach obtained superior average accuracy to 17 state-of-the-art approaches.

Conclusion and future research

In this study, we treated the PBD as a binary classification problem as pathological and healthy. To solve it, we proposed a novel feature WPTE, which used WPT to replace traditional DWT method and used TE to replace traditional SE method, and fed WPTE into FSVM. The experiments showed the proposed “WPTE + FSVM” method yielded superior performance to state-of-the-art methods.

Future work should focus on the following four aspects: (i) we will include other imaging techniques, such as DTI, FMRI and MRSI; (ii) the classification performance may increase by using other advanced variants of SVMs, such as GEPSVM (Yu et al. 2015a) and Twin SVM (Jayadeva et al. 2007). (iii) we will check the effect produced by other wavelet family and other decomposition levels. (iv) We will try to develop fine-grid search to replace the coarse-grid search technique. (v) Swarm intelligence methods (Wang et al. 2015b) will be employed to train the weights of classifiers.



(artificial) (back-propagation) (feed-forward) neural network


(discrete) wavelet (packet) transform


(kernel) (Fuzzy) support vector machine


(gray) (white) matter


(shannon) (tsallis) entropy


artificial bee colony


Alzheimer’s disease


biogeography-based optimization


computer-aided diagnosis


cerebrospinal fluid


fuzzy membership function


huntington’s disease


K-nearest neighbor


magnetic resonance (imaging)


multiple sclerosis


pick’s disease


pathological brain detection


probabilistic neural network


quadrature mirror filter


ripplet transform


stratified cross validation


subdural hematoma


self-organizing map


wavelet packet (Shannon) (Tsallis) entropy


  1. Amaral-Silva H, Wichert-Ana L, Murta LO, Romualdo-Suzuki L, Itikawa E, Bussato GF, Azevedo-Marques P (2014) The superiority of Tsallis Entropy over traditional cost functions for brain MRI and SPECT registration. Entropy 16(3):1632–1651. doi:10.3390/e16031632

    Article  Google Scholar 

  2. Ashkezari AD, Ma H, Saha TK, Ekanayake C (2013) Application of fuzzy support vector machine for determining the health index of the insulation system of in-service power transformers. IEEE Trans Dielectr Electr Insul 20(3):965–973

    Article  Google Scholar 

  3. Cabella BCT, Sturzbecher MJ, de Araujo DB, Neves UPC (2009) Generalized relative entropy in functional magnetic resonance imaging. Phys A 388(1):41–50. doi:10.1016/j.physa.2008.09.029

    Article  Google Scholar 

  4. Campos D (2010) Real and spurious contributions for the Shannon, Rényi and Tsallis entropies. Physica A 389(18):3761–3768

    Article  Google Scholar 

  5. Chaplot S, Patnaik LM, Jagannathan NR (2006) Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed Signal Process Control 1(1):86–92. doi:10.1016/j.bspc.2006.05.002

    Article  Google Scholar 

  6. Chen JK, Li GQ (2014) Tsallis wavelet entropy and its application in power signal analysis. Entropy 16(6):3009–3025. doi:10.3390/e16063009

    Article  Google Scholar 

  7. Damodharan S, Raghavan D (2015) Combining tissue segmentation and neural network for brain tumor detection. Int Arab J Inf Technol 12(1):42–52

    Google Scholar 

  8. Das S, Chowdhury M, Kundu MK (2013) Brain MR image classification using multiscale geometric analysis of ripplet. Prog Electromagn Res-Pier 137:1–17. doi:10.2528/pier13010105

    Article  Google Scholar 

  9. Diniz PRB, Murta LO, Brum DG, de Araujo DB, Santos AC (2010) Brain tissue segmentation using q-entropy in multiple sclerosis magnetic resonance images. Brazilian J Med Biol Res 43(1):77–84. doi:10.1590/s0100-879x2009007500019

    Article  Google Scholar 

  10. Dong Z, Wu L, Wang S, Zhang Y (2011) A hybrid method for MRI brain image classification. Expert Syst Appl 38(8):10049–10053

    Article  Google Scholar 

  11. Dong Z, Zhang Y, Liu F, Duan Y, Kangarlu A, Peterson BS (2014) Improving the spectral resolution and spectral fitting of 1H MRSI data from human calf muscle by the SPREAD technique. NMR Biomed 27(11):1325–1332

    Article  Google Scholar 

  12. Dong Z, Phillips P, Wang S, Ji G, Yang J, T-f Yuan (2015) Detection of subjects and brain regions related to Alzheimer’s disease using 3D MRI scans based on eigenbrain and machine learning. Front Comput Neurosci 66(9):1–15

    Google Scholar 

  13. El-Dahshan ESA, Hosny T, Salem ABM (2010) Hybrid intelligent techniques for MRI brain images classification. Digit Signal Proc 20(2):433–441. doi:10.1016/j.dsp.2009.07.002

    Article  Google Scholar 

  14. El-Dahshan ESA, Mohsen HM, Revett K, Salem ABM (2014) Computer-aided diagnosis of human brain tumor through MRI: a survey and a new algorithm. Expert Syst Appl 41(11):5526–5545. doi:10.1016/j.eswa.2014.01.021

    Article  Google Scholar 

  15. Farzan A, Mashohor S, Ramli AR, Mahmud R (2015) Boosting diagnosis accuracy of Alzheimer’s disease using high dimensional recognition of longitudinal brain atrophy patterns. Behav Brain Res 290:124–130. doi:10.1016/j.bbr.2015.04.010

    Article  Google Scholar 

  16. Goh S, Dong Z, Zhang Y, DiMauro S, Peterson BS (2014) Mitochondrial dysfunction as a neurobiological subtype of autism spectrum disorder: evidence from brain imaging. JAMA psychiatry 71(6):665–671. doi:10.1001/jamapsychiatry.2014.179

    Article  Google Scholar 

  17. Harikumar R, Kumar BV (2015) Performance analysis of neural networks for classification of medical images with wavelets as a feature extractor. Int J Imaging Syst Technol 25(1):33–40. doi:10.1002/ima.22118

    Article  Google Scholar 

  18. Hussain M (2014) Mammogram enhancement using lifting dyadic wavelet transform and normalized Tsallis entropy. J Comput Sci Technol 29(6):1048–1057. doi:10.1007/s11390-014-1489-7

    Article  Google Scholar 

  19. Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. doi:10.1109/tpami.2007.1068

    Article  Google Scholar 

  20. Khader M, Ben Hamza A (2011) Nonrigid image registration using an entropic similarity. IEEE Trans Inf Technol Biomed 15(5):681–690. doi:10.1109/titb.2011.2159806

    Article  Google Scholar 

  21. LaViolette PS, Daun MK, Paulson ES, Schmainda KM (2014) Effect of contrast leakage on the detection of abnormal brain tumor vasculature in high-grade glioma. J Neurooncol 116(3):543–549. doi:10.1007/s11060-013-1318-9

    Article  Google Scholar 

  22. Lee SH, Lee CK, Park JB, Choi YH (2013) Diagnostic method for insulated power cables based on wavelet energy. IEICE Electronics Express 10(12):335–335. doi:10.1587/elex.10.20130335

    Article  Google Scholar 

  23. Lin C-F, Wang S-D (2002) Fuzzy support vector machines. Neural Netw IEEE Trans 13(2):464–471. doi:10.1109/72.991432

    Article  Google Scholar 

  24. Liu ZG, Hu QL, Cui Y, Zhang QG (2014) A new detection approach of transient disturbances combining wavelet packet and Tsallis entropy. Neurocomputing 142:393–407. doi:10.1016/j.neucom.2014.04.020

    Article  Google Scholar 

  25. Munteanu CR, Fernandez-Lozano C, Abad VM, Fernandez SP, Alvarez-Linera J, Hernandez-Tamames JA, Pazos A (2015) Classification of mild cognitive impairment and Alzheimer’s Disease with machine-learning techniques using H-1 magnetic resonance spectroscopy data. Expert Syst Appl 42(15–16):6205–6214. doi:10.1016/j.eswa.2015.03.011

    Article  Google Scholar 

  26. Nazir M, Wahid F, Khan SA (2015) A simple and intelligent approach for brain MRI classification. J Intell Fuzzy Syst 28(3):1127–1135. doi:10.3233/ifs-141396

    Google Scholar 

  27. Padma A, Sukanesh R (2014) Segmentation and classification of brain CT images using combined wavelet statistical texture features. Arab J Sci Eng 39(2):767–776. doi:10.1007/s13369-013-0649-3

    Article  Google Scholar 

  28. Ribbens A, Hermans J, Maes F, Vandermeulen D, Suetens P, Alzheimers Dis N (2014) Unsupervised segmentation, clustering, and groupwise registration of heterogeneous populations of brain MR images. IEEE Trans Med Imaging 33(2):201–224. doi:10.1109/tmi.2013.2270114

    Article  Google Scholar 

  29. Saritha M, Joseph KP, Mathew AT (2013) Classification of MRI brain images using combined wavelet entropy based spider web plots and probabilistic neural network. Pattern Recogn Lett 34(16):2151–2156. doi:10.1016/j.patrec.2013.08.017

    Article  Google Scholar 

  30. Schwarz D, Kasparek T (2014) Brain morphometry of MR images for automated classification of first-episode schizophrenia. Inf Fusion 19:97–102. doi:10.1016/j.inffus.2013.02.002

    Article  Google Scholar 

  31. Sturzbecher MJ, Tedeschi W, Cabella BCT, Baffa O, Neves UPC, De Araujo DB (2009) Non-extensive entropy and the extraction of BOLD spatial information in event-related functional MRI. Phys Med Biol 54(1):161–174. doi:10.1088/0031-9155/54/1/011

    Article  Google Scholar 

  32. Tsallis C (2009) Nonadditive entropy: the concept and its use. European Phys J A 40(3):257–266. doi:10.1140/epja/i2009-10799-0

    Article  Google Scholar 

  33. Tsallis C (2011) The nonadditive entropy S-q and its applications in physics and elsewhere: some remarks. Entropy 13(10):1765–1804. doi:10.3390/e13101765

    Article  Google Scholar 

  34. Tsallis C (2014) An introduction to nonadditive entropies and a thermostatistical approach to inanimate and living matter. Contemp Phys 55(3):179–197. doi:10.1080/00107514.2014.900977

    Article  Google Scholar 

  35. Venkatesan AS, Parthiban L (2014) A novel nature inspired fuzzy tsallis entropy segmentation of magnetic resonance images. Neuroquantology 12(2):221–229

    Article  Google Scholar 

  36. Wang S, Dong Z, Ji G, Zhang Y (2014) Classification of Alzheimer disease based on structural magnetic resonance imaging by kernel support vector machine decision tree. Prog Electromagn Res 144:171–184

    Article  Google Scholar 

  37. Wang S, Zhang Y, Dong Z, Du S, Ji G, Yan J, Yang J, Wang Q, Feng C, Phillips P (2015a) Feed-forward neural network optimized by hybridization of PSO and ABC for abnormal brain detection. Int J Imaging Syst Technol 25(2):153–164. doi:10.1002/ima.22132

    Article  Google Scholar 

  38. Wang S, Zhang Y, Ji G, Yang J, Wu J, Wei L (2015b) Fruit classification by wavelet-entropy and feedforward neural network trained by fitness-scaled chaotic abc and biogeography-based optimization. Entropy 17(8):5711–5728

    Article  Google Scholar 

  39. Wu L, Wang S (2011) Magnetic resonance brain image classification by an improved artificial bee colony algorithm. Prog Electromagn Res 116:65–79

    Article  Google Scholar 

  40. Xian G-m (2010) An identification method of malignant and benign liver tumors from ultrasonography based on GLCM texture features and fuzzy SVM. Expert Syst Appl 37(10):6737–6741

    Article  Google Scholar 

  41. Yang G, Zhang Y, Yang J, Ji G, Dong Z, Wang S, Feng C, Wang Q (2015) Automated classification of brain images using wavelet-energy and biogeography-based optimization. Multimedia Tools Appl. doi:10.1007/s11042-015-2649-7

    Google Scholar 

  42. Yu D-Z, Zheng C-D, Ai J-L, Shui H-W, Gen L-J, Zheng Z, Ji Q-Y (2015a) MR brain image classification via stationary wavelet transform and generalized eigenvalue proximal SVM. J Med Imaging Health Inform 5(7):1–9

    Google Scholar 

  43. Yu D, Shui H, Gen L, Zheng C (2015b) Exponential wavelet iterative shrinkage thresholding algorithm with random shift for compressed sensing magnetic resonance imaging. IEEJ Transact Electr Electron Eng 10(1):116–117. doi:10.1002/tee.22059

    Article  Google Scholar 

  44. Yu D, Shui H, Zheng C, Phillip P, Ji G, Yang J (2015c) Pathological brain detection in magnetic resonance imaging scanning by wavelet entropy and hybridization of biogeography-based optimization and particle swarm optimization. Prog Electromagn Res 152:41–58

    Article  Google Scholar 

  45. Yu D, Zheng C, Gen L, Shui H (2015d) Effect of spider-web-plot in MR brain image classification. Pattern Recogn Lett 62:14–16. doi:10.1016/j.patrec.2015.04.016

    Article  Google Scholar 

  46. Zhang Y, Wu L (2011) Optimal multi-level thresholding based on maximum tsallis entropy via an artificial bee colony approach. Entropy 13(4):841–859

    Article  Google Scholar 

  47. Zhang Y, Wu L (2012) An Mr brain images classifier via principal component analysis and kernel support vector machine. Prog Electromagn Res 130:369–388

    Article  Google Scholar 

  48. Zhang Y, Wang S, Ji G, Dong Z (2013) An MR brain images classifier system via particle swarm optimization and kernel support vector machine. Sci World J 2013:9. doi:10.1155/2013/130134

    Google Scholar 

  49. Zhang Y, Dong Z, Phillips P, Wang S, Ji G, Yang J (2015a) Exponential wavelet iterative shrinkage thresholding algorithm for compressed sensing magnetic resonance imaging. Inf Sci 322:115–132. doi:10.1016/j.ins.2015.06.017

    Article  Google Scholar 

  50. Zhang Y, Dong Z, Wang S, Ji G, Yang J (2015b) Preclinical diagnosis of magnetic resonance (MR) brain images via discrete wavelet packet transform with tsallis entropy and generalized eigenvalue proximal support vector machine (GEPSVM). Entropy 17(4):1795–1813

    Article  Google Scholar 

  51. Zhang Y, Wang S, Phillips P, Dong Z, Ji G, Yang J (2015c) Detection of Alzheimer’s disease and mild cognitive impairment based on structural volumetric MR images using 3D-DWT and WTA-KSVM trained by PSOTVAC. Biomed Signal Process Control 21:58–73

    Article  Google Scholar 

  52. Zhang Y, Wang S, Sun P, Phillips P (2015d) Pathological brain detection based on wavelet entropy and Hu moment invariants. Bio-Med Mater Eng 26(s1):1283–1290

    Article  Google Scholar 

  53. Zhou X, Wang S, Xu W, Ji G, Phillips P, Sun P, Zhang Y (2015) Detection of pathological brain in MRI scanning based on wavelet-entropy and naive bayes classifier. In: Ortuño F, Rojas I (eds) Bioinformatics and Biomedical Engineering, vol 9043. Lecture Notes in Computer Science. Springer International Publishing, Granada, pp 201–209. doi:10.1007/978-3-319-16483-0_20

Download references

Authors’ contributions

YDZ and SHW conceived the study. YDZ and XJY designed the model. SHW and ZCD acquired the data. YDZ, GL and TFY analyzed the data. GL and PP interpreted the data. YDZ and ZCD developed the program. YDZ, SHW, and TFY wrote the draft. All authors gave critical revisions and approved the submission. All authors read and approved the final manuscript.


This paper was supported by NSFC (610011024, 61273243, 51407095), Natural Science Foudation of Jiangsu Province (BK20150983), Program of Natural Science Research of Jiangsu Higher Education Institutions (13KJB460011, 14KJB520021), Jiangsu Key Laboratory of 3D Printing Equipment and Manufacturing (BM2013006), Key Supporting Science and Technology Program (Industry) of Jiangsu Province (BE2012201, BE2014009-3, BE2013012-2), Special Funds for Scientific and Technological Achievement Transformation Project in Jiangsu Province (BA2013058), Nanjing Normal University Research Foundation for Talented Scholars (2013119XGQ0061, 2014119XGQ0080).

Authors’ information

Prof. Dr. YD Zhang is affiliated with School of Computer Science and Technology, Nanjing Normal University. He published over 60 papers on “Alzheimer’s & Dementia (IF: 12.407)”, “JAMA Psychiatry (IF: 12.008)”, “Information Sciences (IF: 4.038)”, “NMR in Biomedicine (IF: 3.044)”, “Knowledge-Based Systems (IF: 2.947)”, “Food Research International (IF: 2.818)”, “Journal of Food Engineering. (IF: 2.771)”, “Basic & Clinical Pharmacology & Toxicology. (IF: 2.377)”, “Sensors (IF: 2.245)”, “Expert Systems with Applications (IF: 2.240)”, “Frontiers in Computational Neuroscience. (IF: 2.201)”, etc. He was listed in “2014 Most Cited Chinese Researchers (Computer Science)” released by Elsevier. Two papers were included in “ESI Highly Cited Papers”. Dr. Zhang is now the editor of “Scientia Iranica (IF: 1.025)”, “Journal of Computer Information Systems (IF: 0.722)”, “Fundamenta Informaticae (IF: 0.717)”, and “Maejo International Journal of Science and Technology (IF: 0.367)”. He is the associate editor of “Neurocomputing (IF: 2.083)”. In the past, he served as the guest editor-in-chief of “Mathematical Problems in Engineering (IF: 0.762)”, “Sensors (IF: 2.245)”, and “SIMULATION: Transactions of The Society for Modeling and Simulation International (IF: 0.818)”. He now serves as the guest editor of “Neural Network World (IF: 0.479)”.

Prof. Dr. XJ Yang is affiliated with Department of Mathematics and Mechanics, China University of Mining and Technology. He is invited to serve as the editor of “Advances in Mathematical Physics (IF 1.100)”, “Maejo International Journal of Science and Technology (IF 0.367), “Thermal Science (IF 1.222), “Central European Journal of Physics (IF 1.085)”. He was invited to serve as the guest editor-in-chief of “Central European Journal of Mathematics (IF: 0.578)”, “Advances in Mechanical Engineering (IF: 0.575)”.

Prof. Dr. ZC Dong is Associate Professor in Division of Translational Imaging, Columbia University, USA. He published over 20 papers on “JAMA Psychiatry (IF: 12.008)”, “Progress in Nuclear Magnetic Resonance Spectroscopy (IF: 7.237)”, “Neuropsychopharmacology (IF: 7.048)”, “Neuroimage (IF: 6.357)”, “Human Brain Mapping. (IF: 5.969)”, “Information Sciences (IF: 4.038)”, etc. He was invited as the guest editor-in-chief of “BioMed Research International (IF: 1.579)”

Prof. Dr. TF Yuan is affiliated with Department of Psychology, Nanjing Normal University. He published over 60 papers on “Neuron (IF: 15.05)”, “Brain (IF: 9.196)”, “Journal of Neuroscience. (IF: 6.344)”, “Science Signaling. (IF: 6.279)”, “Brain Research Reviews (IF: 5.930)”, “Brain Structure & Function (IF: 5.618)”, “Scientific Reports (IF 5.578)”, “Molecular Neurobiology (IF 5.137)”, “Brain Stimulation (IF: 4.399)”, “Frontiers in Cellular Neuroscience (IF: 4.289)”, etc. He is invited to serve as the editor of “PeerJ (IF 2.112)”, “Journal of Molecular Neuroscience (IF 2.343)”, and “SpringerPlus”.

Competing interests

The authors declare that they have no competing interests.

Author information



Corresponding authors

Correspondence to Yu-Dong Zhang or Ti-Fei Yuan.

Additional information

Yu-Dong Zhang, Shui-Hua Wang and Xiao-Jun Yang contributed equally

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, YD., Wang, SH., Yang, XJ. et al. Pathological brain detection in MRI scanning by wavelet packet Tsallis entropy and fuzzy support vector machine. SpringerPlus 4, 716 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Pathological brain detection (PBD)
  • Tsallis entropy
  • Magnetic resonance imaging
  • Computer-aided diagnosis
  • Discrete wavelet packet transform
  • Fuzzy support vector machine
  • Pattern recognition