- Research
- Open access
- Published:
Application of 1-D discrete wavelet transform based compressed sensing matrices for speech compression
SpringerPlus volume 5, Article number: 2048 (2016)
Abstract
Background
Compressed sensing is a novel signal compression technique in which signal is compressed while sensing. The compressed signal is recovered with the only few numbers of observations compared to conventional Shannon–Nyquist sampling, and thus reduces the storage requirements. In this study, we have proposed the 1-D discrete wavelet transform (DWT) based sensing matrices for speech signal compression. The present study investigates the performance analysis of the different DWT based sensing matrices such as: Daubechies, Coiflets, Symlets, Battle, Beylkin and Vaidyanathan wavelet families.
Results
First, we have proposed the Daubechies wavelet family based sensing matrices. The experimental result indicates that the db10 wavelet based sensing matrix exhibits the better performance compared to other Daubechies wavelet based sensing matrices. Second, we have proposed the Coiflets wavelet family based sensing matrices. The result shows that the coif5 wavelet based sensing matrix exhibits the best performance. Third, we have proposed the sensing matrices based on Symlets wavelet family. The result indicates that the sym9 wavelet based sensing matrix demonstrates the less reconstruction time and the less relative error, and thus exhibits the good performance compared to other Symlets wavelet based sensing matrices. Next, we have proposed the DWT based sensing matrices using the Battle, Beylkin and the Vaidyanathan wavelet families. The Beylkin wavelet based sensing matrix demonstrates the less reconstruction time and relative error, and thus exhibits the good performance compared to the Battle and the Vaidyanathan wavelet based sensing matrices. Further, an attempt was made to find out the best-proposed DWT based sensing matrix, and the result reveals that sym9 wavelet based sensing matrix shows the better performance among all other proposed matrices. Subsequently, the study demonstrates the performance analysis of the sym9 wavelet based sensing matrix and state-of-the-art random and deterministic sensing matrices.
Conclusions
The result reveals that the proposed sym9 wavelet matrix exhibits the better performance compared to state-of-the-art sensing matrices. Finally, speech quality is evaluated using the MOS, PESQ and the information based measures. The test result confirms that the proposed sym9 wavelet based sensing matrix shows the better MOS and PESQ score indicating the good quality of speech.
Introduction
Conventional signal processing methods such as Fourier transform and a short time Fourier transform (STFT) are inadequate for the analysis of non-stationary signals which have abrupt transitions superimposed on the lower frequency backgrounds such as the speech, music and bio-electric signals. The wavelet transform (WT) (Daubechie Ingrid 1992) overcomes these drawbacks and provides both the time resolution and frequency resolution of a signal. The basic idea of the wavelet transform is to represent the signal to be analyzed as a superposition of wavelets. The wavelet transform is the most popular signal analysis tool, and it is successfully used in different application areas such as speech or audio and image compression.
Given an input signal x of length N, the wavelet transform consists of log2 N decomposition levels. The input signal decomposition is accomplished through a series filtering and downsampling processes. The reconstruction of the original signal is accomplished through an upsampling, series filtering and adding all the sub-bands. Figure 1 shows the block diagram of 1-D forward wavelet transform with 2-level decomposition (Mallat 2009; Meyer 1993). The input signal is filtered using the low-pass filter (u) and the high-pass filter (v). A filtering is achieved by computing a linear convolution between the input signal and the filter coefficients. The two filters are chosen such that, they are orthogonal to each other and provides a perfect reconstruction of the original signal x. Therefore, the quadrature mirror filter (QMF) is commonly used for the perfect reconstruction of a two-channel filter bank.
Wavelet analysis provides approximation coefficients and detail coefficients. The low frequency information about the signal is given by the approximation, while the high frequency information is given by the detail coefficients. Since the low frequency signal is of more importance than the high frequency signal, the output of the low-pass filter is used as an input for the next decomposition stages; whereas the output of high-pass filter is used at the time of signal reconstruction. The wavelet coefficients are computed by using a series filtering and downsampling processes. The wavelet coefficients (f) are given by:
where W is the N × N wavelet matrix and defined as: W = WI, where I is N × N identity matrix.
Thus, the classical approach of data compression is to employ the discrete wavelet transform (DWT) based methods (Skodras and Ebrahimi 2001) prior to the transmission. However, these methods includes the complicated multiplications, exhaustive coefficient search and sorting procedure along with the arithmetic encoding of the significant coefficients with their locations, which consequently results in a huge storage requirement and power consumption. Furthermore, the smooth oscillatory signals such as the speech or music signals will be compressed more efficiently in the wavelet packet basis compared to the wavelet representation. Coifman and Wickerhauser (1992) proposed the algorithm for an efficient data compression based on the Shannon entropy for the best basis selection. The orthogonal wavelet packets and localized trigonometric functions are exploited as a basis. This allows an efficient compression of a voice and image signals; however, at the cost of an additional computation in searching the best wavelet packet basis.
The research work presented on CS by Donoho (2006), Baraniuk (2007), Candes and Wakin (2008), and Donoho and Tsaig (2006) have energized the research in many application areas like medical image processing (Lustig et al. 2008), wireless sensor networks (Guan et al. 2011), analog-to-information converters (AIC) (Laska et al. 2007), communications and networks (Berger et al. 2010), radar (Qu and Yang 2012), etc.
In the paper Liu et al. (2014) successfully implemented the CS based compression and the wavelet based compression procedure on the field programmable gate array (FPGA). The result shows that the CS based procedure achieves the better performance compared to the wavelet compression in terms of power consumption and the number of computing resources required. Furthermore, the sparse binary sensing matrix achieves the desired signal compression, but at the price of the higher signal reconstruction time and the higher sensing matrix construction time.
Candes et al. (2006a, b) proposed an i.i.d. (independent identical distribution) Gaussian or Bernoulli random sensing matrices for the compressed sensing. However, the practical implementation of these sensing matrices requires the huge computational cost and memory storage requirements, and therefore considered as inappropriate for large scale applications.
Rauhut (2009), Haupt et al. (2010), Xu et al. (2014), Yin et al. (2010), and Sebert et al. (2008) exploited the Toeplitz and Circulant sensing matrices which effectively recover the original signal with the reduction in the computational cost and the memory requirement.
As an alternative to the random sensing matrices, the authors in Arash and Farokh (2011) proposed the deterministic construction of sensing matrices such as binary, bipolar and the ternary matrices. Several authors have proposed the deterministic construction of sensing matrices using the codes such as the sparse binary matrices based on the low density parity check (LDPC) code (Lu and Kpalma 2012), chirp sensing codes (Applebauma et al. 2009), scrambled block Hadamard matrices (Gan et al. 2008), Reed–Muller sensing codes (Howard et al. 2008) and the Vandermond matrices (DeVore 2007).
The restricted isometry property (RIP) is just a sufficient condition for an exact signal recovery. Even though, the deterministic sensing matrices are an incapable to satisfy RIP condition, they are very useful in practice because of the deterministic nature of the sampler and might be able to advance some features like compression ratio and computational complexity.
The successful implementation of the CS technique is depends on the efficient design of the sensing matrices which are used to compress the given signal. Since, the DWT shows a very good energy compaction property, it can be used for designing the sensing matrices. In this study, we have proposed the 1-D discrete wavelet transform (DWT) based sensing matrices for speech signal compression. The major contributions of the research paper are the proposed 1-D DWT sensing matrices based on different wavelet families such as the Daubechies, Coiflets, Symlets, Battle, Beylkin and the Vaidyanathan wavelet families. Furthermore, the proposed DWT based sensing matrices are compared with state-of-the-art random and the deterministic sensing matrices. Besides, the speech quality is evaluated using mean opinion sore (MOS) and the perceptual evaluation of speech quality (PESQ) measures.
The paper is organized as follows. Section two briefly introduces the compressed sensing (CS) theory with signal acquisition and reconstruction model. Section three describes the proposed methodology for the discrete wavelet transform (DWT) matrix. Experimental results and discussion are presented in section four. Finally, section five presents the conclusions.
Compressed sensing (CS) framework
Background
Compressed sensing is a novel signal compression technique in which signal is acquired and compressed simultaneously. The signal is recovered with the only few number of observations compared to the conventional Shannon–Nyquist sampling which requires observations that are twice the signal bandwidth. Compressed sensing is performed with two basic steps: signal acquisition and signal reconstruction.
CS signal acquisition model
Compressed sensing technique is illustrated as follows:
where f is the input signal of length N × 1, y is the compressed output signal of length M × 1, and Φ is M × N sensing matrix.
The input signal f is sparse in some sparsifying domain (Ψ) and given as:
where x is the non-sparse input signal. Combined form of Eqs. (2) and (3) is given as:
The two basic conditions should be satisfied for the successful implementation of the CS.
-
1.
Sensing matrix (Φ) and sparsity transform (Ψ) should be incoherent to each other.
-
2.
The Φ should satisfy the restricted isometric property (RIP) (Candes and Tao 2006) and defined as follow:
$$(1 - \mathop \delta \nolimits_{k} )\mathop {\left\| x \right\|}\nolimits_{2}^{2} \le \mathop {\left\| {{\varvec{\Phi}}x} \right\|}\nolimits_{2}^{2} \le (1 + \mathop \delta \nolimits_{k} )\mathop {\left\| x \right\|}\nolimits_{2}^{2}$$(5)where δ k ∊ (0, 1) is called as restricted isometric constant of the matrix and k is the number of non-zero coefficients.
CS signal reconstruction model
Since, the compressed sensing technique use only a few number of observations, there are large number of solutions. Therefore, the different optimization based algorithms are used to find the exact sparse solution. The basic algorithms are based on the norm minimization such as L0-norm, L1-norm and L2-norm. Out of these three, L1-norm is widely used, because of its ability to recover the exact sparse solution along with the efficient reconstruction speed. Presently, there are different recovery algorithm available such as the basis pursuit (BP) (Chen et al. 2001), orthogonal matching pursuit (OMP) (Tropp and Gilbert 2007), etc.
The proposed 1-D discrete wavelet transform (DWT) matrix
1-D DWT matrix
For a signal x of length N = 2n and a low-pass filter (u), the ith level wavelet decomposition (Vidakovic 1999; Wang and Vieira 2010) is given by an Eqs. (6) and (7). Where, v is the high-pass filter.
And
The reconstruction of \(f_{u}^{i - 1}\) from f i u and f i v can be obtained by
The 1-D DWT matrix forms are given as below:
and
where, \(f_{u}^{(i)}\) is the 2n−i dimensional low pass vector in the ith level and \(f_{v}^{(i)}\) the high-pass, while \(f_{u}^{(i - 1)}\) is the 2n−i+1 dimensional low-pass vector in the (i − 1)th level. The two 2n−i by 2n−i+1 wavelet filter matrices are given below.
And
Thus, the ith scale wavelet transform can be represented as:
This gives the wavelet matrix of 1-level decomposition. The wavelet matrix for different levels of decomposition is given as below.
Above equation can be represented as,
Here, the numbers of signal decomposition levels are restricted to 2n−i+1 ≥ L. Where, L is the length of the filter.
Thus, the final wavelet transform matrix is given by an Eq. (16).
Design procedure for the proposed 1-D DWT based sensing matrices
Following are the procedural steps to construct 1-D DWT based sensing matrices.
-
1.
Create a desired quadrature mirror filters (QMF) such as Daubechies, Coiflets, Symlets, Beylkin, Vaidyanathan and Battle filters. For example db1 (Haar) filter is given as f = [1 1] and the db2 filter is formed as follows:
$$f = \left[ {\begin{array}{*{20}c} {0.482962913145} & {0.836516303738} \\ {0.224143868042} & { - 0.129409522551} \\ \end{array} } \right]$$(17) -
2.
Create the N × N Identity matrix.
-
3.
Perform 1-D forward wavelet transform on the N × N Identity matrix. Thus, the N × N wavelet transform matrix is generated.
-
4.
Select the first m number of rows to form the m × N DWT sensing matrix. Where, m is the minimum number of measurements.
Experimental results and discussion
Methodology
The proposed work is evaluated on the CMU/CSTR KDT US English TIMIT database for speech synthesis by Carnegie Mellon University and Edinburgh University (Edinburgh 2002). The details of the database used are as follows: File name: Kdt_001.wav, channel: 1(Mono), bit rate: 256 kbps, audio sample rate: 16 kHz, total duration: 3 s. The number of samples (N) selected are 2048 and the total duration of analyzed speech signal is 0.128 s for simulation. The experimental work is performed using MATLAB 7.8.0 (R2009a) software with Intel (R) CORE 2 Duo CPU, 3 GB RAM system specifications. The discrete cosine transform (DCT) is used as the sparsifying basis for speech signal because of its high sparsity. The speech compression is performed using the sensing matrices based on the different DWT families (Donoho et al. 2007). The basis pursuit (BP) (Chen et al. 2001) is used as signal recovery algorithm for speech signal.
The performance of the reconstructed speech signal is evaluated using the metrics like compression ratio (CR), root mean square error (RMSE), relative error, signal to noise ratio (SNR), signal reconstruction time and sensing matrix construction time.
CR is obtained using relation,
where N is the length of speech signal and M is the number of measurements taken from sensing matrix.
RMSE is given as below:
where x(n) is the original signal and \(\tilde{x}(n)\) is the reconstructed signal.
Relative error is defined as:
where x(n) is the original signal and \(\tilde{x}(n)\) is the reconstructed signal.
SNR is obtained as,
where x(n) is the original signal and \(\tilde{x}(n)\) is the reconstructed signal.
Besides, signal reconstruction time is computed to provide the amount of time required to recover the original signal using reconstruction algorithm. The amount of time required to construct the sensing matrix is also an important parameter and should be minimum.
Performance analysis of the Daubechies wavelet family based sensing matrices
This section demonstrates the performance analysis of the different DWT sensing matrices based on Daubechies wavelet family such as db1, db2, db3, db4, db5, db6, db7, db8, db9, db10. The speech signal of length 2048 is taken with 50% sparsity level, preserving the only 1024 number of non-zeros. For a different number of measurements (m), corresponding compression ratios (CR), signal reconstruction time (s), relative error, root mean square error (RMSE) and signal-to-noise ratio (SNR) are calculated (Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10).
It is noted from Fig. 2 that the db1 (Haar) wavelet based sensing matrix requires less reconstruction time compared to all other Daubechies wavelet based sensing matrices. The second best choice will be db2 or db10, closely followed by the db9 wavelet based sensing matrix. From Fig. 3, it can be observed that the db10 wavelet based sensing matrix shows the minimum relative error compared to all other matrices. From Fig. 4, it can be observed that the db10 wavelet sensing matrix exhibits the high SNR (particularly from CR = 0.3 to CR = 1) compared to other sensing matrices.
Thus, it is evident from Figs. 2, 3 and 4 that overall the db10 wavelet based sensing matrix shows the good balance between signal reconstruction error and signal reconstruction time. Moreover, the db9 also shows a close performance to the db10 and may be the second best choice.
Performance analysis of the Coiflets wavelet family based sensing matrices
This section demonstrates the performance analysis of the different DWT sensing matrices based on Coiflets wavelet family such as coif1, coif2, coif3, coif4 and coif5 (Tables 11, 12, 13, 14, 15).
It is noted from Fig. 5 that the coif5 and coif4 wavelet based sensing matrix shows a close performance and requires the less reconstruction time compared to all other Coiflets wavelet based sensing matrices. From Fig. 6, it can be observed that coif5 wavelet based sensing matrix shows the minimum relative error compared to all other matrices. Also, from Fig. 7, it is seen that coif5 wavelet based sensing matrix exhibits the high SNR compared to other sensing matrices.
Thus, overall the coif5 wavelet based sensing matrix shows the good performance, since it requires the less reconstruction time, minimum relative error and the high SNR compared to other Coiflets wavelet based sensing matrices. In addition, the coif4 may be the second choice of sensing matrix.
Performance analysis of the Symlets wavelet family based sensing matrices
This section demonstrates the performance analysis of the different DWT sensing matrices based on Symlets wavelet family such as sym4, sym5, sym6, sym7, sym8, sym9 and sym10 (Tables 16, 17, 18, 19, 20, 21, 22).
It is noted from Fig. 8 that the sym9 wavelet based sensing matrix requires the less reconstruction time compared to all other Symlets wavelet based sensing matrices. Furthermore, the sym5 also shows a very close performance to that of the sym9 wavelet based sensing matrix. From Fig. 9, it can be observed that the sym9 and the sym10 wavelet based sensing matrices almost demonstrate similar performance with minimum relative error compared to all other matrices. Also, from Fig. 10, it is observed that the sym9 and the sym10 wavelet based sensing matrices nearly shows similar performance and exhibits the high SNR compared to other sensing matrices.
Thus, it is evident from Figs. 9 and 10 that overall the sym9 wavelet sensing matrix demonstrates the less reconstruction time and the less relative error, and thus exhibits the good performance compared to other Symlets wavelet based sensing matrices. Moreover, the sym10 may be the second choice of sensing matrix followed by the sym5.
Performance analysis of the Beylkin, Vaidyanathan and Battle wavelet family based sensing matrices
This section shows the performance analysis of the different DWT sensing matrices based on Beylkin, Vaidyanathan, and Battle1, Battle3 and Battle5 wavelet families (Tables 23, 24, 25, 26, 27).
Figure 11 shows that the Beylkin wavelet based sensing matrix requires the less reconstruction time compared to all other Symlets wavelet based sensing matrices. From Fig. 12, it can be observed that the Beylkin and the Battle5 wavelet based sensing matrices shows a very close performance with minimum relative error compared to all other matrices. Also, from Fig. 13, it can be seen that the Beylkin and the Battle5 wavelet based sensing matrices shows a very comparable performance and exhibits the high SNR compared to other sensing matrices.
Thus, it can be noted from Figs. 11, 12 and 13 that overall the Beylkin wavelet sensing matrix demonstrates the less reconstruction time and relative error, and thus exhibits the good performance compared to other wavelet based sensing matrices. However, the Battle5 shows a close performance and may be the second best choice of sensing matrix.
Performance analysis of the best-proposed DWT based sensing matrices namely: Beylkin, db10, coif5 and sym9 wavelet family
This section illustrates the performance analysis of the best-proposed DWT sensing matrices namely: Beylkin, db10, coif5 and sym9 wavelet families.
Figure 14 shows that the sym9 wavelet based sensing matrix clearly outperforms the Beylkin, db10, and the coif5 wavelet based sensing matrices in terms of signal reconstruction time. From Fig. 15, it can be observed that the db10 shows the good performance over CR = 0.3–0.5; however overall the sym9 wavelet based sensing matrices shows the good (from CR = 0.5–1.0) and comparable performance with db10. Also, from Fig. 16, it can be observed that the db10 and sym9 wavelet based sensing matrices shows a comparable performance and exhibits the high SNR compared to other sensing matrices. In addition, the sym9 wavelet based sensing matrix shows an edge over db10 from the CR = 0.5–1.0.
Thus, it can be evident from Figs. 14, 15 and 16 that overall the sym9 wavelet based sensing matrix shows the superior performance compared to the Beylkin, db10 and the coif5 wavelet based sensing matrices in views of signal reconstruction time and relative error. Furthermore, the db10 may be the second best choice of sensing matrix.
Performance analysis of the best-proposed sym9 wavelet based sensing matrix with state-of-the-art random and deterministic sensing matrices
This section illustrates the comparative analysis of the proposed sym9 wavelet based sensing matrix and state-of-the-art random sensing matrices such as Gaussian, Uniform, Toeplitz, Circulant and Hadamard matrix along with deterministic sensing matrices such as the DCT and the sparse binary sensing matrices for speech signal compression (Tables 28, 29, 30, 31, 32, 33, 34).
It is noted from Fig. 17 that the proposed sym9 wavelet based sensing matrix clearly outperforms the state-of-the-art random sensing matrices such as Gaussian, Uniform, Toeplitz, Circulant and Hadamard sensing matrices as well as the deterministic DCT and sparse binary sensing matrices in terms of signal reconstruction time. It can be observed from Figs. 18 and 19 that the proposed sym9 wavelet based sensing matrix demonstrates a close comparable performance compared to the state-of-the-art random and deterministic sensing matrices.
The overall remark
Thus, it is evident from Figs. 17, 18 and 19 (Tables 28, 29, 30, 31, 32, 33, 34) that the proposed sym9 wavelet based sensing matrix exhibits the better performance compared to the state-of-the-art random and deterministic sensing matrices.
Subjective quality evaluation
Simple quality measures like SNR do not provide an accurate measure of the speech quality. Hence, speech quality assessment is performed by highly robust and accurate measures such as the mean opinion score (MOS) and perceptual evaluation of speech quality (PESQ) recommended by International Telecommunication Union Telephony (ITU-T) standards.
In this section, the performance of the proposed sensing matrices is evaluated using mean opinion score (MOS). The MOS is a subjective listening test to perceive the speech quality and one of the widely recommended method by ITU standard (ITU-T P.800) (ITU-T 1996).
Table 35 presents subjective evaluation of the reconstructed speech quality using the mean opinion score (MOS) test. The MOS test is performed on a group of seven male listeners and three female listeners. The listeners are required to train and evaluate the quality of the reconstructed speech signal with respect to the original signal. The speech quality is evaluated by rating to a signal within the range of 1–5. The MOS is computed by taking the average score of all the individual listeners and it ranges between 1 (bad speech quality) and 5 (excellent speech quality).
The following conclusions can be drawn from Table 35.
-
1.
Overall, the Symlets wavelet family achieves the good MOS scores compared to other proposed as well as state-of-the-art sensing matrices.
-
2.
The highest MOS score of 4.4 is achieved by the sym9 wavelet family followed by the sym6, sym8, sym10, Battle1, Battle3 (MOS = 4.1) and followed by the db2, coif5 (MOS = 4.0) respectively. Thus, these MOS scores can be considered as an acceptable score for speech quality.
-
3.
Moreover, the state-of-the-art DCT sensing matrix (MOS = 4.2) and the random Hadamard sensing matrix (MOS = 4.0) shows the good MOS score compared to other state-of-the-art sensing matrices.
However, MOS test frequently requires a sizeable number of listeners to accomplish stable results, and is also the time-consuming and expensive. Nevertheless, subjective quality measures are still one of the most decisive ways to estimate speech quality.
Objective quality evaluation
The PESQ is a most modern international ITU-T standard (P.862) (ITU-T 2005) for an automated prediction of speech quality by estimating quality scores ranging from −1 to 4.5. In other way, it estimates the MOS (Mean Opinion Score) from both the clean signal and its distorted signal. A higher quality score signifies the better speech quality. Moreover, since human listeners are not required; PESQ is less expensive, accurate and less time-consuming;
Table 36 presents the different objective speech quality metrics such as the PESQ, log-likelihood ratio (LLR) and weighted spectral slope (WSS) along with the three subjective rating scales namely: signal distortion, noise distortion, and overall quality. The ratings are based on the five-point (1–5) MOS scale (Hu and Loizou 2008).
The following conclusions can be drawn from Table 36.
-
1.
The Symlets wavelet family shows the higher signal distortion rating (rating between: 3–4) indicating the fairly natural speech signal quality compared to other proposed and state-of-the art sensing matrices
-
2.
The db5, db9, db10, coif3, coif4, coif5 and Symlets wavelet families shows the good background distortion rating (between rating: 2–3) indicating noticeable noise, but not intrusive and are close comparable to state-of-the art sensing matrices.
-
3.
The db5, db9, db10, coif3, coif4, coif5 and Symlets wavelet families shows the higher signal quality rating (between rating: 3–4) indicating the good/fair speech quality compared to state-of-the art sensing matrices.
-
4.
Overall, the sym9 and the sym10 wavelet family based sensing matrices exhibits good/fair overall quality (For db9 and db10 ratings are 3.1843 and 3.1985 respectively) compared to other proposed and state-of-the art sensing matrices.
-
5.
In terms of objective measures, the sym9 and the sym10 wavelet family based sensing matrices exhibits the lower values of log-likelihood ratio (LLR) and weighted spectral slope (WSS) metrics, indicating the good speech quality and are close comparable with state-of-the art sensing matrices.
-
6.
Finally, in views of PESQ measure, the sym9 and the sym10 wavelet family based sensing matrices exhibits the higher PESQ scores; PESQ = 2.6003 (sym9) and PESQ = 2.6006 (sym10) respectively, signifying the good/fair speech quality compared to other proposed and state-of-the art sensing matrices.
Information based evaluation
Entropy (H) is a measure of an average information content of a signal (x) and widely used in signal processing applications. It is defined as:
where X = {x 1, x 2,…,x N } is a set of random variable, P(x i ) is a probability of random variable x i and N is the length of a signal or possible outcomes. It is obvious that the higher signal entropy reflects more information content or more unpredictability of information content.
Table 37 presents the information based evaluation of speech quality. Furthermore, it also provides insights on the selection of the best basis sensing matrix.
The following observations are evident from Table 37.
-
1.
CS based sensing matrices, including proposed as well as state-of-the-art sensing matrices has the higher entropy (H = 11.0) compared to classical wavelet compression technique (H = 9.7573).
-
2.
It is also evident that for the proposed sensing matrices the entropy of the reconstructed speech signal (H = 11.0) is very close to the original signal entropy (H = 10.2888).
-
3.
Furthermore, we have computed the entropy of sensing matrices which shows that state-of-the-art random matrices like Gaussian, Uniform, Toeplitz, Circulant attains higher entropy due to its randomness, followed by deterministic DCT matrix.
-
4.
The proposed sensing matrices such as the Battle (for Battle5, H = 4.0745) and the Symlets wavelet families (for sym9 and sym10, H = 1.7689 and H = 1.9047, respectively) shows the higher entropy compared to the sparse binary (H = 0.0659) and the random Hadamard sensing matrices (H = 1).
Spectrographic analysis
The spectrograms are used to visually investigate the joint time–frequency properties of speech signals with intensity or color representing the relative energy of contributing frequencies and it plays an important role in decoding the underlying linguistic massage. Figure 20 shows the spectrographic analysis of the original and the reconstructed speech signal for the proposed sym9 wavelet based sensing matrix (for CR = 0.5). Figure 20a shows the spectrogram of the original input speech signal and Fig. 20b shows the spectrogram of the reconstructed speech signal.
Thus, the spectrographic analysis from Fig. 20 shows that the time–frequency characteristic of the reconstructed spectrogram is a very close to the original speech spectrogram, preserving most of the signal energy. Moreover, the red color shows energy at the highest frequency followed by the yellow, blue respectively, and the white area shows the absence of frequency components.
Furthermore, Fig. 21 shows the original and the reconstructed speech signal with the DCT basis for CR = 0.5 (N = 2048 and m = 1024). It can be observed that the original speech signal is successfully reconstructed using the proposed sym9 wavelet based sensing matrix.
Conclusions
In this study, an attempt was made to investigate the DWT based sensing matrices for the speech signal compression. This study presents the performance comparison of the different DWT based sensing matrices such as the: Daubechies, Coiflets, Symlets, Battle, Beylkin and Vaidyanathan wavelet families. Further study presents the performance analysis of the proposed DWT based sensing matrices with state-of-the-art random and deterministic sensing matrices. The speech quality is evaluated using subjective and objective measures. The subjective evaluation of speech quality is performed by mean opinion sore (MOS). Moreover, the objective speech quality is evaluated using the PESQ and other measures such as the log-likelihood ratio (LLR) and weighted spectral slope (WSS). Besides, an attempt was made to evaluate the speech quality using the information based measure such as Shannon entropy. In addition, efforts are made to present an insight on the selection of the best basis sensing matrix using the information based measure.
The following major conclusions are drawn based on the investigation:
-
Overall, the db10 wavelet based sensing matrix shows the good balance between signal reconstruction error and signal reconstruction time compared to other Daubechies wavelet based sensing matrices. Moreover, the db9 also shows close performance to the db10 and may be the second best choice.
-
The coif5 wavelet based sensing matrix shows the good performance, since it requires less reconstruction time, minimum relative error and the high SNR compared to other Coiflets wavelet based sensing matrices. In addition, the coif4 may be the second choice of sensing matrix.
-
Overall, the sym9 wavelet sensing matrix demonstrates the less reconstruction time and the less relative error, and thus exhibits the good performance compared to other Symlets wavelet based sensing matrices. Moreover, the sym10 may be the second choice of sensing matrix followed by the sym9.
-
The Beylkin wavelet sensing matrix demonstrates the less reconstruction time and relative error, and thus exhibits the good performance compared to the Battle and the Vaidyanathan wavelet based sensing matrices. However, the Battle5 shows a close performance and may be the second best choice of sensing matrix.
-
When compared for the best of the DWT sensing matrix, the sym9 wavelet based sensing matrix shows the superior performance compared to the db10, coif5 and Beylkin wavelet based sensing matrices, in the views of signal reconstruction time and relative error. Furthermore, the db10 may be the second best choice of sensing matrix.
-
Finally, it is revealed that the proposed sym9 wavelet based sensing matrix exhibits the better performance compared to state-of-the-art random and deterministic sensing matrices in terms of signal reconstruction time and reconstruction error.
-
Overall, the Symlets wavelet family achieves good MOS scores compared to other proposed as well as state-of-the-art sensing matrices.
-
The highest MOS score of 4.4 is achieved by the sym9 wavelet family followed by the sym6, sym8, sym10, Battle1, Battle3 (MOS = 4.1) and followed by the db2, coif5 (MOS = 4.0) respectively. Thus, these MOS scores can be considered as an acceptable score for speech quality.
-
In terms of the PESQ measure, the sym9 and the sym10 wavelet family based sensing matrices exhibits the higher PESQ scores i.e. PESQ = 2.6003 (sym9) and PESQ = 2.6006 (sym10) respectively; signifying the good/fair speech quality compared to other proposed and state-of-the art sensing matrices.
-
The sym9 and the sym10 wavelet family based sensing matrices exhibits the lower values of Log-Likelihood Ratio (LLR) and Weighted Spectral Slope (WSS) metrics indicating the good speech quality, and are the close comparable with state-of-the art sensing matrices.
-
In views of information based evaluation, CS based sensing matrices, including the proposed DWT based as well as state-of-the-art sensing matrices, has the higher entropy (H = 11.0) compared to the classical wavelet compression technique (H = 9.7573).
-
The proposed sensing matrices such as the Battle (For the Battle5, H = 4.0745) and the Symlets wavelet families (For the sym9 and the sym10, H = 1.7689 and H = 1.9047 respectively) shows the higher entropy compared to the sparse binary (H = 0.0659) and the random Hadamard sensing matrices (H = 1).
-
Finally, the DWT based sensing matrices exhibits the good promise for speech signal compression.
Thus, this study shows the effectiveness of the DWT based sensing matrices for speech signal processing applications. The scope of this study can be further expanded by investigating the use of the DWT based sensing matrices in other application areas such as music signal processing, under water acoustics and the biomedical signal processing such as the ECG and EEG analysis.
Abbreviations
- DWT:
-
discrete wavelet transform
- CS:
-
compressed sensing
- CR:
-
compression ratio
- RMSE:
-
root mean square error
- SNR:
-
signal to noise ratio
- MOS:
-
mean opinion score
- PESQ:
-
perceptual evaluation of speech quality
References
Applebauma L, Howardb SD, Searlec S, Calderbank R (2009) Chirp sensing codes: deterministic compressed sensing measurements for fast recovery. Appl Comput Harmonic Anal 26(2):283–290
Arash A, Farokh M (2011) Deterministic construction of binary, bipolar and ternary compressed sensing matrices. IEEE Trans Inf Theory 57:2360–2370. doi:10.1109/TIT.2011.2111670
Baraniuk RG (2007) Compressive sensing. IEEE Signal Process Mag. doi:10.1109/MSP.2007.4286571
Berger CR, Zhou S, Preisig JC, Willett P (2010) Sparse channel estimation for multicarrier underwater acoustic communication: from subspace methods to compressed sensing. IEEE Trans Signal Process 58(3):1708–1721. doi:10.1109/TSP.2009.2038424
Candes EJ, Tao T (2006) Near-optimal signal recovery from random projections: universal encoding strategies. IEEE Trans Inf Theory 52(12):5406–5425
Candes EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag. doi:10.1109/MSP.2007.914731
Candes EJ, Romberg J, Tao T (2006a) Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math 59(8):1207–1223
Candes EJ, Romberg J, Tao T (2006b) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52(2):489–509
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159
Coifman RR, Wickerhauser MV (1992) Entropy-based algorithms for best basis selection. IEEE Trans Inf Theory 38(2):713–718
Daubechie I (1992) Ten lectures on wavelets. In: CBMS-NSF conference series in applied mathematics. http://dx.doi.org/10.1137/1.9781611970104
DeVore RA (2007) Deterministic construction of compressed sensing matrices. J Complex 23:918–925. doi:10.1016/j.jco.2007.04.002
Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306
Donoho DL, Tsaig Y (2006) Extensions of compressed sensing. Signal Process 86:533–548. doi:10.1016/j.sigpro.2005.05.029
Donoho DL, Stodden V, Tsaig Y (2007) SparseLab 2.1 Toolbox. https://sparselab.stanford.edu/
Gan L, Do T, Tran TD (2008) Fast compressive imaging using scrambled block Hadamard ensemble. In: EUSIPCO. Lausanne, Switzerland
Guan X, Yulong G, Chang J, Zhang Z (2011) Advances in theory of compressive sensing and applications in communication. In: Proceedings of IEEE first international conference on instrumentation, measurement, computer, communication and control, pp 662–665. doi:10.1109/IMCCC.2011.169
Haupt J, Bajwa WU, Raz G, Nowak R (2010) Toeplitz compressed sensing matrices with applications to sparse channel estimation. IEEE Trans Inf Theory 56:5862–5875
Howard SD, Calderbank AR, Searle SJ (2008) A fast reconstruction algorithm for deterministic compressive sensing using second order reed-muller codes. In: IEEE conference on information sciences and systems (CISS2008)
Hu Y, Loizou P (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Speech Audio Process 16(1):229–238
ITU-T (1996) ITU-T recommendation P.800: method for subjective determination of transmission quality. http://www.itu.int
ITU-T (2005) P.862: revised annex A—reference implementations and conformance testing for ITU-T Recs P.862, P.862.1 and P.862.2. http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en
Laska JN, Kirolos S, Duarte MF, Ragheb TS, Baraniuk RG, Massoud Y (2007) Theory and implementation of an analog-to-information converter using random demodulation. In: Proceedings of IEEE ISCAS, pp 1959–1962. doi:10.1109/ISCAS.2007.378360
Liu B, Zhang Z, Xu G, Fan H, Fu Q (2014) Energy efficient telemonitoring of physiological signals via compressed sensing: a fast algorithm and power consumption evaluation. Biomed Signal Process Control 11:80–88. doi:10.1016/j.bspc.2014.02.010
Lu W, Kpalma K (2012) Sparse binary matrices of LDPC codes for compressed sensing. In: Storer JA, Marcellin MW (eds). DCC, p 405
Lustig M, Donoho DL, Santos JM, Pauly JM (2008) Compressed sensing MRI. IEEE Signal Process Mag 25(8):72–82. doi:10.1109/MSP.2007.914728
Mallat S (2009) A wavelet tour of signal processing-The sparse way, 3rd edn. Academic Press, London
Meyer Y (1993) Wavelets: algorithms and applications. Society for Industrial and Applied Mathematics, Philadelphia, pp 13–31, 101–105
Qu L, Yang T (2012) Investigation of air/ground reflection and antenna beamwidth for compressive sensing SFCW GPR migration imaging. IEEE Trans Geosci Remote Sens 50(8):3143–3149. doi:10.1109/TGRS.2011.2179049
Rauhut H (2009) Circulant and Toeplitz matrices in compressed sensing. http://arxiv.org/abs/0902.4394
Sebert F, Yi MZ, Leslie Y (2008) Toeplitz block matrices in compressed sensing and their applications in imaging. In: ITAB, Shenzhen, pp 47–50
Skodras CC, Ebrahimi T (2001) The jpeg2000 still image compression standard. Sig Process Mag IEEE 18(5):36–58
Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53(12):4655–4666
University of Edinburgh (2002) Center for speech technology research, CSTR US KED TIMIT. http://festvox.org/dbs/dbs_kdt.html
Vidakovic B (1999) Statistical modeling by wavelets. Wiley, London
Wang H, Vieira J (2010) 2-D wavelet transforms in the form of matrices and application in compressed sensing. In: Proceedings of the 8th world congress on intelligent control and automation, Jinan, China, pp 35–39
Xu Y, Yin W, Osher S (2014) Learning circulant sensing kernels. Inverse Probl Imaging 8:901–923. doi:10.3934/ipi.2014.8.901
Yin W, Morgan S, Yang J, Zhang Y (2010) Practical compressive sensing with Toeplitz and Circulant matrices. In: Proceedings of visual communications and image processing (VCIP). SPIE, San Jose, CA
Authors’ contributions
YVP have made substantial contributions to design and development of DWT based sensing matrices and their application to speech signal processing. YVP formulated the problem with objective, performed the experimentation and wrote the paper. SLN has been involved in the critical testing and analysis of proposed DWT based sensing matrices, manuscript preparation and proof reading. Both authors read and approved the final manuscript.
Acknowledgements
The authors wish to acknowledge the Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra, India for providing infrastructure for this research work. The authors would like to thank the anonymous reviewers for their constructive comments and questions which greatly improved the quality of article.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
All datasets on which the conclusions of the manuscript are rely and the data supporting their findings are presented in the main paper.
Funding
The authors declare that they have no funding provided for the research reported in this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Parkale, Y.V., Nalbalwar, S.L. Application of 1-D discrete wavelet transform based compressed sensing matrices for speech compression. SpringerPlus 5, 2048 (2016). https://doi.org/10.1186/s40064-016-3740-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40064-016-3740-x