Skip to main content

Table 1 HTK parameter values used in experiments

From: Heterophonic speech recognition using composite phones

Item

Value

Signal Processing

Frame period: 10 ms, Hamming window size: 25 ms, first order pre-emphasis coefficient: 0.97

Feature vector

Cepstral lifting coefficient: 22, filterbank channels: 26, 12 MFCC coefficients and 1 energy + delta + acceleration for total of 39 coefficients

HMM topology

5 state non-skip left-to-right with diagonal covariance

HMM Variance

Floor on variance estimated: 0.01 * global covariance

HMM training and realignment

Start with pruning beam width at 250 and increment at 150 for a maximum of 1000

Triphone Cluster

Minimum number of frames allocated to any cluster: 100

Decision tree splitting

Split cluster into two until increase in log likelihood falls below 350

Data driven clustering

Greatest distance between any two states in cluster: 100

Decoding

Word insertion log probability: 0.0; Language model grammar scale factor: 1.0