Existence detection and embedding rate estimation of blended speech in covert speech communications

SpringerPlus

Table 1 Algorithm for feature extraction

Input: A speech signal X of length N
Output: A feature vector F that contains two features
Step 1: For a given speech signal X, invert its odd–even points to obtain the inverted version \(X_{w}\)
Step 2: Calculate the OED of X, which is denoted by \(D_{r}\), and the OED of \(X_{w}\), which is denoted by \(D_{w}\)
Step 3: Calculate the average ZCRs of \(D_{r}\) and \(D_{w}\), respectively, which are denoted by \(Z(D_{r} )\) and \(Z(D_{w} )\). Obtain the first feature \(v_{1}\), where \(v_{1} = \hbox{min} \{ Z(D_{r} ),Z(D_{w} )\}\). Here, the reason we take the smaller average ZCR is to correct the case in which the odd–even points of the blended speech are inverted
Step 4: If \(Z(D_{r} ) \le Z(D_{w} )\), set \(D = D_{r}\); otherwise, \(D = D_{w}\). Divide D into N frames and calculate the average ZCR per frame, choosing the smallest value as the second feature, which is denoted by \(v_{2}\)
Step 5: Construct the feature vector \(F = < v_{1} ,v_{2} ,Type >\) from these two features. The type attribute specifies whether the signal is blended (1) or pure (−1) speech object