Comparison of various texture classification methods using multiresolution analysis and linear regression modelling

Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.

Gray Level Co-occurrence Matrix (GLCM) (Kashyap and Chellappa 1983;Unser 1986), second order gray level statistics (Unser 1995),and Gauss-Markov random field (Fernandez 2008). These methods perform best on micro-textures. Spectral histogram is yet another commonly used texture classification method (Liu and Wang 2003). It is based on local spatial/frequency information, which provides a unified texture feature.
Extensive research has demonstrated that classification based on multiresolution analysis methods resembling the human vision system provides better performance. Hence these methods are widely used for the classification of textures. Most commonly employed multiresolution analysis techniques include the Gabor transform (Grigorescu et al. 2002) and the wavelet transform ( Van de Wouwer et al. 1999;Huang and Aviyente 2008).
In these methods the texture image is transformed by the use of the respective transform to local spatial/frequency representation by wrapping the image with appropriate band-pass filters tuned to specific parameters. In the case of Gabor transform, features such as Gabor energy, complex moments, and grating cell operator are considered to characterize the texture feature while in wavelet transform analysis ( Van de Wouwer et al. 1999;Huang and Aviyente 2008;Ma and Manjunath 1995;Hackmack et al. 2012;Wang and Yong 2008) the wavelet coefficients itself serve the purpose of characterizing the texture features. In (Selesnick et al. 2005) the dual-tree complex wavelet transform is used to extract information on different spatial scales from structural MRI data and show its relevance for disease classification.
In this paper we apply linear regression modelling to analyse the correlation between texture samples at various frequencies, facilitated through multiresolution analysis, for the efficient classification of textures. The basic algorithm is adopted from (Wang and Yong 2008). In this work we have experimented with different multiresolution analysis tools that are used for texture classification. In addition to the miltiresolution analysis tools such as the discrete wavelet transform and the discrete wavelet packet transform that are suggested in (Wang and Yong 2008), we have employed the dual tree complex wavelet packet transform, which is the novelty adopted in this work and the performance of the three methods is compared. Our experiments show that, in most of the cases, the discrete wavelet transform outperforms the discrete wavelet packet transform and dual tree complex wavelet packet transform in terms of classification rate. This paper is organized as follows. "Texture classification using linear regression model" provides an overview of the application of multiresolution analysis tools (Rahman et al. 2011) and linear regression modelling (Kerns 2011) for analysing correlation between frequency channels for the classification of textures. "Texture classification algorithm" explains the methodology adopted for the classification phase. The experimental results and the performance comparison with the different multiresolution analysis tools are presented in "Experimental results". Finally the conclusions are briefed in "Conclusion".

Multiresolution analysis tools
The level of detail within an image varies from location to location. Finer resolution for analysis is required at regions where significant information is contained.
Multiresolution representation of an image provides complete detail about the extent of information present at different locations. The main concept of multiresolution analysis is that for each vector space, there is another vector space of higher resolution until the final image is obtained. The basis of each of these vector spaces is the scale function. For textures, it provides scale invariant interpretation of a texture.
Different multiresolution analysis tools that are used for texture analysis in this work are the following: 1. Discrete wavelet transform (DWT) 2. discrete wavelet packet transform (DWPT) 3. Dual tree complex wavelet packet transform (DTCWPT) .
Discrete wavelet transform (DWT) of an image provides both frequency and location information of the analyzed image. The 2-D wavelet transform is carried out by the tensor product of two 1-D wavelet base functions along the horizontal and vertical directions, and the corresponding filters can be expressed as h LL (k, l) = h(k)h(l), h LH (k, l) = h(k)g(l), h HL (k, l) = g(k)h(l) and h HH (k, l) = g(k)g(l). By convolving the given image with these four filters, we get four sub images and thus four channels. Further decomposition is performed only in the low-low frequency region.
For quasi-periodic signals such as speech signals and texture patterns, whose dominant frequency channels are located in the middle frequency region, DWT decomposition is proposed to be non-ideal (Chang and Kuo 1993). Discrete Wavelet Packet Transform (DWPT) which allows further decomposition in all frequency regions to obtain full decomposition, is ideally suited for mid frequency regions (Chang and Kuo 1993). Thus DWPT of an image can characterize the properties of an image in all frequency regions. Wavelet packets perform better in terms of fidelity of direction but not in terms of improved directionality. The high-pass coefficients will oscillate around singularities of the signal (Ana SOVIC-Damir SERSIC 2012).
It is shown that the texture features which effectively define directional and spatial/frequency characteristics of the patterns lead to good texture analysis (Materka and Strzelecki 1998). In order to mitigate the limitations of DWT and DWPT, the complex wavelet transform can be used. To exploit the advantages of the complex transform of better shift invariance and directionality and that of the packet transform of better selectivity of decomposition, the combined transform or the dual tree complex wavelet packet transform (DTCWPT) is chosen in our work for further experiments. The dualtree complex wavelet packet transform (DTCWPT) (Selesnick et al. 2005), involves two DWPT's whose filters banks (FB) are designed so that the impulse responses of the first FB are approximately the discrete Hilbert transforms of those of the second FB as shown in Fig. 1. In this way it measures both the real (even) and the imaginary (odd) components of the input signal (hence the name complex wavelet transform).
For the three multiresolution analysis tools described above, the original image is first decomposed into four subimages by convolving the image with low pass and high pass filters both in the horizontal and vertical directions. The parent node and the four children nodes thus formed are named as O, A, B, C and D respectively. Each of the channels thus formed is given a channel number. This is to identify the frequency channels for further analysis of these channels in the experiments done. Table 1 shows the channel number assigned to the different channels formed as a result of decomposition using DWT and Table 2 shows the same for DWPT and DTCWPT.

Correlation between frequency channels
For explaining the correlation between frequency channels, decomposition using the Dual tree complex wavelet packet transform is considered here in lieu of the extensive directionality. In order to characterize the image texture, we can use the raw coefficients as such. But generally some measure derived from these values is taken as the texture feature, as handling the raw coefficients is difficult. Typical examples of these measures are mean, standard deviation, energy and so on. The energy distribution has important discriminatory properties for images as it reflects the distribution of energy along the frequency axis over scale and orientation and as such can be used as a feature for classification. In our experiments the energy values from the subimages are extracted using the mean of the magnitude of the subimage coefficients (Unser 1995). That is if (M, N) represents the size of the subimage I, and I(i, j) represents the subimage coefficient corresponding to (i, j), then its energy is given by the equation  Different samples of the same texture are taken to derive the inherent texture properties and DTCWPT decomposition is applied to each sample. Considering a three level decomposition, we get 64 real(even) and 64 imaginary(odd) components. If ah represents the real part and ag represents the imaginary part of the DTCWPT, the complex coefficients are given by ah + j ag. The magnitude of the complex coefficients is considered for further processing. Since the DTCWPT measures both the real and imaginary part of the input signal, it offers both magnitude and phase information. Hence it also characterizes the information in all frequency regions.
To arrive at the correlation between the derived channel pairs and thereby identify the significant channel pairs, we make use of a preprocessing algorithm which is detailed below. For each sample of a given texture, we get n wavelet coefficients each for real and imaginary parts. From this we find the magnitude of the complex coefficients. 3. From the energy of each channel [to be found out using (1)], form the energy vector of size n for each of the texture samples. 4. Arrange the vectors to form energy matrix E of size m x n. 5. Find the correlation coefficient matrix V from E and arrange the channel pairs in the descending order of correlation coefficients. 6. Eliminate those channel pairs whose correlation coefficient is less than a predefined threshold value T.
Applying the DTCWPT 3-level decomposition, we get 64 frequency regions. Thus the energy matrix consists of energy vectors corresponding to 64 frequency channels of m samples. This energy matrix can be viewed in statistical perspective, as each frequency channel can be considered as a random variable and the energy values corresponding to each frequency channel can be considered as the values assigned to these random variables. From the energy matrix E (m, 64) , a correlation coefficient matrix V of size 64 × 64 is generated, where (i, j)th element in the matrix corresponds to the correlation coefficient between the i and jth channel. Here it is to be noted that the correlation between the same channel pairs will generate the value 1. So all the diagonal elements which have value 1 can be neglected. Due to the symmetric property of correlation matrix, the values below and to the left of the diagonal (lower triangle) will be same as the values above and to the right of the diagonal (upper triangle). Therefore the channel pairs corresponding to the upper or lower triangle need to be considered for further analysis.
For channel elimination and identification of the significant channel pairs, a threshold value T is to be selected. To establish a suitable value for T, we have experimented with different values of thresholds and sample numbers to find an optimum value. We have used 20 brodatz textures with 36 samples from each texture for the experiments (Brodatz 1966). The threshold values varied from 0.2 to 0.8 along with different sample sizes from 30 to 100. In this context we also experimented with varying threshold values and constant sample size and vice versa. The results obtained for one of the textures namely D3, for both the cases are illustrated in the tables shown below. The experiment is repeated for different number of samples and threshold level pairs. In our experiments it is observed that the combination of 81 number of samples with a threshold value of 0.45 provides better classification rate. The classification rate thus obtained with these optimum values is tabulated in Table 3. Table 4 shows the classification rate obtained by setting the threshold value to 0.45 (as a middle value) and varying sample sizes, while Table 5 provides the same with 81 number of samples (found to give good results for 81 samples) and different threshold levels.
Thus further in this experiment for channel elimination purpose, we have choosen a threshold value of 0.45 for the selected 81 samples.

Linear regression analysis for texture classification
Regression analysis is commonly used to find the relationship between two variables. The statistical models of energies of channels are used to make predictions. When a correlation coefficient shows that data is likely to be able to predict future outcomes and a scatter graph of the data appears to form a straight line, statisticians use linear regression as an option to find a predictive function. This can be modelled as   Here ǫ i is the residue and the constant real parameters (a 0 , b 0 ) are unknown. The parameters (a, b) are estimated by maximum likelihood, which in the case of normally independent and identically distributed errors can be estimated in terms of X i and Y i by least squares, i.e. to minimize this results in: where The predictors and residuals for the observed responses Y i are given respectively by The Mean residual sum of squares (per degree of freedom) given by and the mean is given by Confidence intervals for estimated parameters are all based on the fact that the least squares estimates â, b and the corresponding predictors of (the mean of ) Y i are linear combinations of the independent normally distributed variables.
In this paper in order to analyse the correlation between different frequency channels and to extract the texture features, linear regression modelling is employed. We calculated the energy values for each channel pair (total 2 values, one for each channel) and these are modelled as two random variables. It is found that relation between these two random variables can be approximated by linear regression modelling. The algorithm used to extract the texture feature is summarized as follows (Rahman et al. 2011): [Input]: channel pair list and energy matrix E for each texture [Output]: texture feature 1. Each of channel pairs obtained from pre-processing algorithm, form two random variables. The energy values of corresponding channel pair taken from the channel energy matrix form the values assigned to the variables. 2. Compute the regression parameters b and â of the above values using (4) and (5). 3. Calculate the Predictor Y i using (11). 4. Estimate the variance and mean using Eqs. (13) and (14). 5. Calculate the residual between Y i and Ŷ i as Ŷ i − Y i . 6. Repeat 1-5 for each texture sample 7. Now the channel pairs, corresponding correlation coefficient, regression parameters, mean and variance, characterize the texture features. Using these, create the database containing the feature list of each texture.

Texture classification algorithm
Classification refers to as assigning a physical object or incident into one of a set of predefined categories. In texture classification the goal is to assign an unknown sample image to one of a set of known texture classes and it is one of the four problem domains in the field of texture analysis. Texture classification process involves two phases: the learning phase and the recognition phase. In the learning phase, the target is to build a model for the texture content of each texture class present in the training data, which generally comprises of images with known class labels. The texture content of the training images is captured with the chosen texture analysis method. This yields a set of textural features for each image. These features can be scalar numbers or discrete histograms or empirical distributions. Examples are spatial structure, contrast, roughness, orientation, etc. In the recognition phase the texture content of the unknown sample is first described with the same texture analysis method. Then the textural features of the sample are compared to those of the training images with a classification algorithm, and the sample is assigned to the category with the best match. Optionally, if the best match is not sufficiently good according to some predefined criteria; the unknown sample can be rejected.

Learning phase
The learning phase includes the pre-processing algorithm ("Correlation between frequency channels") and the feature extraction algorithm ("Linear regression analysis for texture classification"). The outcome of pre-processing algorithm is the channel energy matrix and top channel pair list while that of the feature extraction algorithm is the feature list of the textures in the database which includes the channel pair list, correlation coefficient, mean, variance and the linear regression parameters. The learning algorithm is summarised as follows [Input]: texture samples [Output]: database containing texture features 1. Take different samples of the given texture, apply pre-processing algorithm (as in "Correlation between frequency channels") and derive the energy matrix and channel pair list. 2. Select the channel pairs with correlation coefficient greater than threshold value and arrange them in the descending order of correlation coefficients. 3. Apply feature extraction algorithm (as in "Linear regression analysis for texture classification") and form the database. 4. Repeat steps 1-3 for all textures.
According to central limit theorem under certain conditions, the sum of a large number of random variables will have an approximately normal distribution. For example if (x1, . . . , xn) is a sequence of independent and identically distributed random variables, each having mean μ and variance σ 2 , then the central limit theorem states that For a normal distribution the probability density function is dependent on mean, μ and variance, σ and is given by Here P(−3σ < x < + 3σ) = 99.7 %, since mean µ ≈ 0. In this experiment the value of x represents the energy values of the channels corresponding to the channel pairs in the database.

Recognition phase
In the recognition phase an unknown texture serves as the input. The classification algorithm is summarized as follows [Input]: unknown texture [Output]: unknown texture classified 1. Decompose the unknown texture using multiresolution analysis tool and obtain its energy vector V. 2. Select one texture from the database. 3. Choose the topmost channel pair. 4. Compute the regression parameters, mean and variance of the corresponding channel pair. 5. From energy vector V of the unknown texture, select the energies corresponding to the selected channel pair. 6. Consider one of the energy of two channels as X i and other as Y i and applying linear regression modelling, find predictor Y i , using (11). 7. Compute the residue Ŷ i − Y i . 8. If the residue is greater than µ ± 3σ, eliminate the texture from candidate list, choose the next texture and repeat steps 3-7. 9. Else select the next channel pair and repeat steps 4-7. If for all the channel pairs the residue is less than µ ± 3σ, then that unknown texture is assigned to that corresponding texture class. 10. If there is only one texture left in the database, the unknown texture is assigned to that texture class.

Experimental results
The twenty textures from Brodatz database (Brodatz 1966) shown in Fig. 2 are used in the experiments. For each texture 81 samples of size 128 × 128 are formed by an overlap of 64 pixels in both horizontal and vertical directions to form a database of 1620 samples. From these, half of the samples are used for training purpose and the remaining for testing. The above mentioned multiresolution analysis tools namely the discrete wavelet transform, discrete wavelet packet transform and the dual tree complex wavelet packet transform are used in the experiments. First we considered one level Dual tree complex wavelet packet transform (DTCWPT) as the multiresolution analysis tool. But it is found that for some textures while applying channel elimination, we will be left with little channel pairs for further processing. So two level dual tree complex wavelet packet transform (DTCWPT) is chosen. It provides 16 channels when decomposed. Energy of each of the channels is calculated using (1), and an energy vector of length 16 is formed. Thus the energy vector obtained for all the samples is arranged to form an energy matrix of size 40 × 16 as shown in Table 6. From the energy matrix, a correlation coefficient matrix is formed, which is of size 16 × 16.  631  131  105  57  31  70  19  29  27  21  44  34  11  15  13  20   807  123  110  55  29  59  19  31  26  19  50  30  10  14  13  17   783  109  111  51  27  55  18  29  31  24  53  33  11  15  13  19   853  104  117  52  26  51  16  30  32  22  57  33  10  15  13  19   773  121  110  55  27  60  18  30  28  18  50  31  10  13  12  19   758  114  112  52 The (i, j) th element of the matrix represents the correlation between the i and jth channels. The correlation coefficient matrix obtained for texture D3 using 2 level DTCWPT is shown in Table 7. From these, we sort the channel pairs in the descending order of the correlation coefficients (neglect the correlation values between same channels). We experimented with various threshold values from a lower boundary value of 0.2 to a higher boundary value of 0.8. For presentation we are highlighting the results obtained for texture D3. As explained in "Correlation between frequency channels", threshold value of 0.45 is chosen and those channel pairs whose correlation coefficient is below this threshold are neglected. It is observed that for texture D3, only 41 channel pairs have correlation coefficient value above the chosen threshold, using 2 level dual tree complex wavelet packet transform decomposition. So all other channel pairs are eliminated and not listed in the database. A database is created for each of the textures, which includes the top channel pairs with correlation coefficient greater than the predefined threshold, corresponding linear regression parameters, mean and variance. The linear regression parameters for these channel pairs can be determined using (4) and (5). For top channel pairs the distribution of energy values from the energy matrix represents a straight line. Figure 3 shows the approximate linear relationship between the channel pair having the highest correlation in the case of texture D3 using two level DTCWPT as the multiresolution analysis tool. Using (14) and (15) the variance and mean is calculated and the database is created. Table 8 shows the top five channel pairs and the corresponding database values for texture D3 using two level DTCWPT.
In our work we have mainly focussed on three level decomposition to get better performance and the performance of three level DTCWPT is compared with that of DWPT and DWT. Discrete wavelet packet transform and dual tree complex wavelet transform produces 64 channels when three level decomposition is considered. Thus the correlation coefficient matrix is of size 64 × 64. But in the case of 3 level DWT there are only 10 channels and thus a 10 × 10 correlation coefficient matrix is obtained. In the case of DTCWPT the number of channel pairs with correlation coefficient above the predefined threshold of 0.45 is 512 for texture D3, while the same in the case of DWPT and DWT is 377 and 10 respectively. Table 9 shows the database created for texture D3 using 3 level DWT. The top 5 channel pair list for texture D3 and the corresponding database values using 3 level DTCWPT and DWPT are listed in Tables 10 and 11 respectively. It is found that for brodatz texture D101, using 3 level DTCWPT, 254 channel pairs are obtained which have correlation coefficient above 0.45.
In this experiment 20 brodatz textures were taken, and each of the multiresolution analysis methods listed above is combined with the linear regression modelling method for texture classification. Figure 4 shows the classification rate for all the 20 textures using these multiresolution analysis methods.
It is noted from Fig. 4 that discrete wavelet transform provides the best texture classification rate for all the textures compared with discrete wavelet packet transform and dual tree complex wavelet packet transform. It is also observed that for some of the textures dual tree complex wavelet packet transform and discrete wavelet packet transform provides the same classification rate. Contrary to the above, it is found that for brodatz textures D6, D68, D72 and D89 dual tree complex wavelet packet provides better Table 7 Correlation coefficient matrix obtained for texture D3 using 2 level DTCWPT classification rate compared to that of discrete wavelet packet transform. In terms of the time elapsed for creating the database, it is found to be less for discrete wavelet transform. For dual tree complex wavelet packet transform the time needed for creating the database for 20 brodatz textures is 12 min, that for discrete wavelet packet transform is 5 min and 49 s while in the case of discrete wavelet transform the time expended is   only 1 min and 38 s. We infer that computational complexity is also much reduced by the adoption of the multiresolution analysis tool of discrete wavelet transform. This is because of the simplicity in the application of the filters. For the DWPT and DTCWPT, applying the appropriate filter also plays a major role in the accuracy of the extracted features. Designing the filter required for the context is understood to give better response.

Conclusion
In this paper, various multiresolution analysis tools such as discrete wavelet transform, discrete wavelet packet transform or dual tree complex wavelet packet transform is  combined with linear regression modelling for classification of textures. The dual tree complex wavelet packet transform (DTCWPT) which is the most advanced version of wavelet transforms is expected to produce the best classication rate. But in our experiments the discrete wavelet transform (DWT) outperforms both the dual tree complex wavelet packet transform (DTCWPT) and the discrete wavelet packet transform (DWPT) when applied for classification of textures. This work has focused on texture classification method. Application of this method to texture segmentation may be explored in future. The performance is also dependent on the filters used. Requirement specific filter can be designed and tried for better accuracy for DWPT and DTCWPT in future.