Application of 1-D discrete wavelet transform based compressed sensing matrices for speech compression

Parkale, Yuvraj V.; Nalbalwar, Sanjay L.

doi:10.1186/s40064-016-3740-x

Research
Open access
Published: 30 November 2016

Application of 1-D discrete wavelet transform based compressed sensing matrices for speech compression

Yuvraj V. Parkale¹ &
Sanjay L. Nalbalwar¹

SpringerPlus volume 5, Article number: 2048 (2016) Cite this article

3828 Accesses
10 Citations
Metrics details

Abstract

Background

Compressed sensing is a novel signal compression technique in which signal is compressed while sensing. The compressed signal is recovered with the only few numbers of observations compared to conventional Shannon–Nyquist sampling, and thus reduces the storage requirements. In this study, we have proposed the 1-D discrete wavelet transform (DWT) based sensing matrices for speech signal compression. The present study investigates the performance analysis of the different DWT based sensing matrices such as: Daubechies, Coiflets, Symlets, Battle, Beylkin and Vaidyanathan wavelet families.

Results

First, we have proposed the Daubechies wavelet family based sensing matrices. The experimental result indicates that the db10 wavelet based sensing matrix exhibits the better performance compared to other Daubechies wavelet based sensing matrices. Second, we have proposed the Coiflets wavelet family based sensing matrices. The result shows that the coif5 wavelet based sensing matrix exhibits the best performance. Third, we have proposed the sensing matrices based on Symlets wavelet family. The result indicates that the sym9 wavelet based sensing matrix demonstrates the less reconstruction time and the less relative error, and thus exhibits the good performance compared to other Symlets wavelet based sensing matrices. Next, we have proposed the DWT based sensing matrices using the Battle, Beylkin and the Vaidyanathan wavelet families. The Beylkin wavelet based sensing matrix demonstrates the less reconstruction time and relative error, and thus exhibits the good performance compared to the Battle and the Vaidyanathan wavelet based sensing matrices. Further, an attempt was made to find out the best-proposed DWT based sensing matrix, and the result reveals that sym9 wavelet based sensing matrix shows the better performance among all other proposed matrices. Subsequently, the study demonstrates the performance analysis of the sym9 wavelet based sensing matrix and state-of-the-art random and deterministic sensing matrices.

Conclusions

The result reveals that the proposed sym9 wavelet matrix exhibits the better performance compared to state-of-the-art sensing matrices. Finally, speech quality is evaluated using the MOS, PESQ and the information based measures. The test result confirms that the proposed sym9 wavelet based sensing matrix shows the better MOS and PESQ score indicating the good quality of speech.

Introduction

Conventional signal processing methods such as Fourier transform and a short time Fourier transform (STFT) are inadequate for the analysis of non-stationary signals which have abrupt transitions superimposed on the lower frequency backgrounds such as the speech, music and bio-electric signals. The wavelet transform (WT) (Daubechie Ingrid 1992) overcomes these drawbacks and provides both the time resolution and frequency resolution of a signal. The basic idea of the wavelet transform is to represent the signal to be analyzed as a superposition of wavelets. The wavelet transform is the most popular signal analysis tool, and it is successfully used in different application areas such as speech or audio and image compression.

Given an input signal x of length N, the wavelet transform consists of log₂ N decomposition levels. The input signal decomposition is accomplished through a series filtering and downsampling processes. The reconstruction of the original signal is accomplished through an upsampling, series filtering and adding all the sub-bands. Figure 1 shows the block diagram of 1-D forward wavelet transform with 2-level decomposition (Mallat 2009; Meyer 1993). The input signal is filtered using the low-pass filter (u) and the high-pass filter (v). A filtering is achieved by computing a linear convolution between the input signal and the filter coefficients. The two filters are chosen such that, they are orthogonal to each other and provides a perfect reconstruction of the original signal x. Therefore, the quadrature mirror filter (QMF) is commonly used for the perfect reconstruction of a two-channel filter bank.

Wavelet analysis provides approximation coefficients and detail coefficients. The low frequency information about the signal is given by the approximation, while the high frequency information is given by the detail coefficients. Since the low frequency signal is of more importance than the high frequency signal, the output of the low-pass filter is used as an input for the next decomposition stages; whereas the output of high-pass filter is used at the time of signal reconstruction. The wavelet coefficients are computed by using a series filtering and downsampling processes. The wavelet coefficients (f) are given by:

$$f = {\mathbf{W}}x$$

(1)

where W is the N × N wavelet matrix and defined as: W = WI, where I is N × N identity matrix.

Thus, the classical approach of data compression is to employ the discrete wavelet transform (DWT) based methods (Skodras and Ebrahimi 2001) prior to the transmission. However, these methods includes the complicated multiplications, exhaustive coefficient search and sorting procedure along with the arithmetic encoding of the significant coefficients with their locations, which consequently results in a huge storage requirement and power consumption. Furthermore, the smooth oscillatory signals such as the speech or music signals will be compressed more efficiently in the wavelet packet basis compared to the wavelet representation. Coifman and Wickerhauser (1992) proposed the algorithm for an efficient data compression based on the Shannon entropy for the best basis selection. The orthogonal wavelet packets and localized trigonometric functions are exploited as a basis. This allows an efficient compression of a voice and image signals; however, at the cost of an additional computation in searching the best wavelet packet basis.

The research work presented on CS by Donoho (2006), Baraniuk (2007), Candes and Wakin (2008), and Donoho and Tsaig (2006) have energized the research in many application areas like medical image processing (Lustig et al. 2008), wireless sensor networks (Guan et al. 2011), analog-to-information converters (AIC) (Laska et al. 2007), communications and networks (Berger et al. 2010), radar (Qu and Yang 2012), etc.

In the paper Liu et al. (2014) successfully implemented the CS based compression and the wavelet based compression procedure on the field programmable gate array (FPGA). The result shows that the CS based procedure achieves the better performance compared to the wavelet compression in terms of power consumption and the number of computing resources required. Furthermore, the sparse binary sensing matrix achieves the desired signal compression, but at the price of the higher signal reconstruction time and the higher sensing matrix construction time.

Candes et al. (2006a, b) proposed an i.i.d. (independent identical distribution) Gaussian or Bernoulli random sensing matrices for the compressed sensing. However, the practical implementation of these sensing matrices requires the huge computational cost and memory storage requirements, and therefore considered as inappropriate for large scale applications.

Rauhut (2009), Haupt et al. (2010), Xu et al. (2014), Yin et al. (2010), and Sebert et al. (2008) exploited the Toeplitz and Circulant sensing matrices which effectively recover the original signal with the reduction in the computational cost and the memory requirement.

As an alternative to the random sensing matrices, the authors in Arash and Farokh (2011) proposed the deterministic construction of sensing matrices such as binary, bipolar and the ternary matrices. Several authors have proposed the deterministic construction of sensing matrices using the codes such as the sparse binary matrices based on the low density parity check (LDPC) code (Lu and Kpalma 2012), chirp sensing codes (Applebauma et al. 2009), scrambled block Hadamard matrices (Gan et al. 2008), Reed–Muller sensing codes (Howard et al. 2008) and the Vandermond matrices (DeVore 2007).

The restricted isometry property (RIP) is just a sufficient condition for an exact signal recovery. Even though, the deterministic sensing matrices are an incapable to satisfy RIP condition, they are very useful in practice because of the deterministic nature of the sampler and might be able to advance some features like compression ratio and computational complexity.

The successful implementation of the CS technique is depends on the efficient design of the sensing matrices which are used to compress the given signal. Since, the DWT shows a very good energy compaction property, it can be used for designing the sensing matrices. In this study, we have proposed the 1-D discrete wavelet transform (DWT) based sensing matrices for speech signal compression. The major contributions of the research paper are the proposed 1-D DWT sensing matrices based on different wavelet families such as the Daubechies, Coiflets, Symlets, Battle, Beylkin and the Vaidyanathan wavelet families. Furthermore, the proposed DWT based sensing matrices are compared with state-of-the-art random and the deterministic sensing matrices. Besides, the speech quality is evaluated using mean opinion sore (MOS) and the perceptual evaluation of speech quality (PESQ) measures.

The paper is organized as follows. Section two briefly introduces the compressed sensing (CS) theory with signal acquisition and reconstruction model. Section three describes the proposed methodology for the discrete wavelet transform (DWT) matrix. Experimental results and discussion are presented in section four. Finally, section five presents the conclusions.

Compressed sensing (CS) framework

Background

Compressed sensing is a novel signal compression technique in which signal is acquired and compressed simultaneously. The signal is recovered with the only few number of observations compared to the conventional Shannon–Nyquist sampling which requires observations that are twice the signal bandwidth. Compressed sensing is performed with two basic steps: signal acquisition and signal reconstruction.

CS signal acquisition model

Compressed sensing technique is illustrated as follows:

$$y = {\varvec{\Phi}}f$$

(2)

where f is the input signal of length N × 1, y is the compressed output signal of length M × 1, and Φ is M × N sensing matrix.

The input signal f is sparse in some sparsifying domain (Ψ) and given as:

$$f = {\varvec{\Psi}}x$$

(3)

where x is the non-sparse input signal. Combined form of Eqs. (2) and (3) is given as:

$$y = {\varvec{\Theta}}f = {\mathbf{\varPhi \varPsi }}x$$

(4)

The two basic conditions should be satisfied for the successful implementation of the CS.

1.
Sensing matrix (Φ) and sparsity transform (Ψ) should be incoherent to each other.
2.
The Φ should satisfy the restricted isometric property (RIP) (Candes and Tao 2006) and defined as follow:
$$(1 - \mathop \delta \nolimits_{k} )\mathop {\left\| x \right\|}\nolimits_{2}^{2} \le \mathop {\left\| {{\varvec{\Phi}}x} \right\|}\nolimits_{2}^{2} \le (1 + \mathop \delta \nolimits_{k} )\mathop {\left\| x \right\|}\nolimits_{2}^{2}$$
(5)
where δ _k ∊ (0, 1) is called as restricted isometric constant of the matrix and k is the number of non-zero coefficients.

CS signal reconstruction model

Since, the compressed sensing technique use only a few number of observations, there are large number of solutions. Therefore, the different optimization based algorithms are used to find the exact sparse solution. The basic algorithms are based on the norm minimization such as L0-norm, L1-norm and L2-norm. Out of these three, L1-norm is widely used, because of its ability to recover the exact sparse solution along with the efficient reconstruction speed. Presently, there are different recovery algorithm available such as the basis pursuit (BP) (Chen et al. 2001), orthogonal matching pursuit (OMP) (Tropp and Gilbert 2007), etc.

The proposed 1-D discrete wavelet transform (DWT) matrix

1-D DWT matrix

For a signal x of length N = 2ⁿ and a low-pass filter (u), the ith level wavelet decomposition (Vidakovic 1999; Wang and Vieira 2010) is given by an Eqs. (6) and (7). Where, v is the high-pass filter.

$$\mathop f\nolimits_{u}^{(i)} (j) = \sum\limits_{k = 1}^{{\mathop 2\nolimits^{n - i + 1} }} {u(k - 2j)\mathop f\nolimits_{u}^{(i - 1)} } (k)\quad {\text{where,}}\quad j = 1,2, \ldots ,\mathop 2\nolimits^{n - i}$$

(6)

And

$$\mathop f\nolimits_{v}^{(i)} (j) = \sum\limits_{k = 1}^{{\mathop 2\nolimits^{n - i + 1} }} {v(k - 2j)\mathop f\nolimits_{u}^{(i - 1)} } (k)\quad {\text{where}},\quad j = 1,2, \ldots ,\mathop 2\nolimits^{n - i}$$

(7)

The reconstruction of $f_{u}^{i - 1}$ from f ⁱ_u and f ⁱ_v can be obtained by

$$\mathop f\nolimits_{u}^{(i - 1)} (j) = \sum\limits_{k = 1}^{{\mathop 2\nolimits^{n - i} }} {u(j - 2k)} \mathop f\nolimits_{u}^{(i)} (k) + \sum\limits_{k = 1}^{{\mathop 2\nolimits^{n - i} }} {v(j - 2k)} \mathop f\nolimits_{v}^{(i)} (k)$$

(8)

The 1-D DWT matrix forms are given as below:

$$\mathop f\nolimits_{u}^{(i)} = \mathop U\nolimits^{(i)} \mathop f\nolimits_{u}^{(i - 1)}$$

(9)

and

$$\mathop f\nolimits_{v}^{(i)} = \mathop V\nolimits^{(i)} \mathop f\nolimits_{v}^{(i - 1)}$$

(10)

where, $f_{u}^{(i)}$ is the 2ⁿ⁻ⁱ dimensional low pass vector in the ith level and $f_{v}^{(i)}$ the high-pass, while $f_{u}^{(i - 1)}$ is the 2ⁿ⁻ⁱ⁺¹ dimensional low-pass vector in the (i − 1)th level. The two 2ⁿ⁻ⁱ by 2ⁿ⁻ⁱ⁺¹ wavelet filter matrices are given below.

$$\mathop U\nolimits^{(i)} = \left[ {\begin{array}{*{20}c} {u( - 1)} & 0 & 0 & {\begin{array}{*{20}c} 0 & \cdots & {u( - 3)} & {u( - 2)} \\ \end{array} } \\ {u( - 3)} & {u( - 2)} & {u( - 1)} & {\begin{array}{*{20}c} 0 & \cdots & {u( - 5)} & {u( - 4)} \\ \end{array} } \\ \vdots & \vdots & \vdots & {\begin{array}{*{20}c} \vdots & \ddots & \vdots & \vdots \\ \end{array} } \\ 0 & 0 & 0 & {\begin{array}{*{20}c} 0 & \cdots & {u( - 1)} & 0 \\ \end{array} } \\ \end{array} } \right]$$

(11)

And

$$\mathop V\nolimits^{(i)} = \left[ {\begin{array}{*{20}c} {v( - 1)} & 0 & 0 & {\begin{array}{*{20}c} 0 & \cdots & {v( - 3)} & {v( - 2)} \\ \end{array} } \\ {v( - 3)} & {v( - 2)} & {v( - 1)} & {\begin{array}{*{20}c} 0 & \cdots & {v( - 5)} & {v( - 4)} \\ \end{array} } \\ \vdots & \vdots & \vdots & {\begin{array}{*{20}c} \vdots & \ddots & \vdots & \vdots \\ \end{array} } \\ 0 & 0 & 0 & {\begin{array}{*{20}c} 0 & \cdots & {v( - 1)} & 0 \\ \end{array} } \\ \end{array} } \right]$$

(12)

Thus, the ith scale wavelet transform can be represented as:

$$\left[ {\begin{array}{*{20}c} {\mathop f\nolimits_{u}^{(i)} } \\ {\mathop f\nolimits_{v}^{(i)} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mathop U\nolimits^{(i)} } \\ {\mathop V\nolimits^{(i)} } \\ \end{array} } \right]\mathop f\nolimits_{u}^{(i - 1)}$$

(13)

This gives the wavelet matrix of 1-level decomposition. The wavelet matrix for different levels of decomposition is given as below.

$$\mathop f\nolimits_{u}^{(i - 1)} = \mathop U\nolimits^{(i - 1)} \mathop f\nolimits_{u}^{(i - 2)}$$

(14)

Above equation can be represented as,

$$\left[ {\begin{array}{*{20}c} {\mathop f\nolimits_{u}^{(i)} } \\ {\mathop f\nolimits_{v}^{(i)} } \\ {\mathop f\nolimits_{v}^{(i - 1)} } \\ {\begin{array}{*{20}c} \vdots \\ {\mathop f\nolimits_{v}^{(2)} } \\ {\mathop f\nolimits_{v}^{(1)} } \\ \end{array} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mathop U\nolimits^{(i)} \mathop U\nolimits^{(i - 1)} \cdots \mathop U\nolimits^{(1)} } \\ {\mathop V\nolimits^{(i)} \mathop U\nolimits^{(i - 1)} \cdots \mathop U\nolimits^{(1)} } \\ {\mathop V\nolimits^{(i)} \mathop U\nolimits^{(i - 2)} \cdots \mathop U\nolimits^{(1)} } \\ {\begin{array}{*{20}c} \vdots \\ {\mathop V\nolimits^{(2)} \mathop U\nolimits^{(1)} } \\ {\mathop V\nolimits^{(1)} } \\ \end{array} } \\ \end{array} } \right]x$$

(15)

Here, the numbers of signal decomposition levels are restricted to 2ⁿ⁻ⁱ⁺¹ ≥ L. Where, L is the length of the filter.

Thus, the final wavelet transform matrix is given by an Eq. (16).

$${\mathbf{W}} = \left[ {\begin{array}{*{20}c} {\mathop U\nolimits^{(i)} \mathop U\nolimits^{(i - 1)} \cdots \mathop U\nolimits^{(1)} } \\ {\mathop V\nolimits^{(i)} \mathop U\nolimits^{(i - 1)} \cdots \mathop U\nolimits^{(1)} } \\ {\mathop V\nolimits^{(i)} \mathop U\nolimits^{(i - 2)} \cdots \mathop U\nolimits^{(1)} } \\ {\begin{array}{*{20}c} \vdots \\ {\mathop V\nolimits^{(2)} \mathop U\nolimits^{(1)} } \\ {\mathop V\nolimits^{(1)} } \\ \end{array} } \\ \end{array} } \right]$$

(16)

Design procedure for the proposed 1-D DWT based sensing matrices

Following are the procedural steps to construct 1-D DWT based sensing matrices.

1.
Create a desired quadrature mirror filters (QMF) such as Daubechies, Coiflets, Symlets, Beylkin, Vaidyanathan and Battle filters. For example db1 (Haar) filter is given as f = [1 1] and the db2 filter is formed as follows:
$$f = \left[ {\begin{array}{*{20}c} {0.482962913145} & {0.836516303738} \\ {0.224143868042} & { - 0.129409522551} \\ \end{array} } \right]$$
(17)
2.
Create the N × N Identity matrix.
3.
Perform 1-D forward wavelet transform on the N × N Identity matrix. Thus, the N × N wavelet transform matrix is generated.
4.
Select the first m number of rows to form the m × N DWT sensing matrix. Where, m is the minimum number of measurements.

Experimental results and discussion

Methodology

The proposed work is evaluated on the CMU/CSTR KDT US English TIMIT database for speech synthesis by Carnegie Mellon University and Edinburgh University (Edinburgh 2002). The details of the database used are as follows: File name: Kdt_001.wav, channel: 1(Mono), bit rate: 256 kbps, audio sample rate: 16 kHz, total duration: 3 s. The number of samples (N) selected are 2048 and the total duration of analyzed speech signal is 0.128 s for simulation. The experimental work is performed using MATLAB 7.8.0 (R2009a) software with Intel (R) CORE 2 Duo CPU, 3 GB RAM system specifications. The discrete cosine transform (DCT) is used as the sparsifying basis for speech signal because of its high sparsity. The speech compression is performed using the sensing matrices based on the different DWT families (Donoho et al. 2007). The basis pursuit (BP) (Chen et al. 2001) is used as signal recovery algorithm for speech signal.

The performance of the reconstructed speech signal is evaluated using the metrics like compression ratio (CR), root mean square error (RMSE), relative error, signal to noise ratio (SNR), signal reconstruction time and sensing matrix construction time.

CR is obtained using relation,

$$CR = \frac{M}{N}$$

(18)

where N is the length of speech signal and M is the number of measurements taken from sensing matrix.

RMSE is given as below:

$${\text{RMSE}} = \sqrt {\frac{{\sum\nolimits_{n = 1}^{N} {\mathop {(x(n) - \tilde{x}(n))}\nolimits^{2} } }}{N}}$$

(19)

where x(n) is the original signal and $\tilde{x}(n)$ is the reconstructed signal.

Relative error is defined as:

$$Rel.Error = \frac{{\left\| {\tilde{x}(n) - x(n)} \right\|_{2} }}{{\left\| {x(n)} \right\|_{2} }}$$

(20)

where x(n) is the original signal and $\tilde{x}(n)$ is the reconstructed signal.

SNR is obtained as,

$$SNR(db) = 20\log \left( {\frac{{\left\| {x(n)} \right\|_{2} }}{{\left\| {x(n) - \tilde{x}(n)} \right\|_{2} }}} \right)$$

(21)

where x(n) is the original signal and $\tilde{x}(n)$ is the reconstructed signal.

Besides, signal reconstruction time is computed to provide the amount of time required to recover the original signal using reconstruction algorithm. The amount of time required to construct the sensing matrix is also an important parameter and should be minimum.

Performance analysis of the Daubechies wavelet family based sensing matrices

This section demonstrates the performance analysis of the different DWT sensing matrices based on Daubechies wavelet family such as db1, db2, db3, db4, db5, db6, db7, db8, db9, db10. The speech signal of length 2048 is taken with 50% sparsity level, preserving the only 1024 number of non-zeros. For a different number of measurements (m), corresponding compression ratios (CR), signal reconstruction time (s), relative error, root mean square error (RMSE) and signal-to-noise ratio (SNR) are calculated (Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10).

Table 1 Performance analysis of the proposed db1 (Haar) wavelet based sensing matrix

Application of 1-D discrete wavelet transform based compressed sensing matrices for speech compression

Abstract

Background

Results

Conclusions

Introduction

Compressed sensing (CS) framework

Background

CS signal acquisition model

CS signal reconstruction model

The proposed 1-D discrete wavelet transform (DWT) matrix

1-D DWT matrix

Design procedure for the proposed 1-D DWT based sensing matrices

Experimental results and discussion

Methodology

Performance analysis of the Daubechies wavelet family based sensing matrices

Performance analysis of the Coiflets wavelet family based sensing matrices

Performance analysis of the Symlets wavelet family based sensing matrices

Performance analysis of the Beylkin, Vaidyanathan and Battle wavelet family based sensing matrices

Performance analysis of the best-proposed DWT based sensing matrices namely: Beylkin, db10, coif5 and sym9 wavelet family

Performance analysis of the best-proposed sym9 wavelet based sensing matrix with state-of-the-art random and deterministic sensing matrices

The overall remark

Subjective quality evaluation

Objective quality evaluation

Information based evaluation

Spectrographic analysis

Conclusions

Abbreviations

References

Authors’ contributions

Acknowledgements

Competing interests

Availability of data and materials

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords