Skip to main content

Table 1 HTK parameter values used in experiments

From: Heterophonic speech recognition using composite phones

Item Value
Signal Processing Frame period: 10 ms, Hamming window size: 25 ms, first order pre-emphasis coefficient: 0.97
Feature vector Cepstral lifting coefficient: 22, filterbank channels: 26, 12 MFCC coefficients and 1 energy + delta + acceleration for total of 39 coefficients
HMM topology 5 state non-skip left-to-right with diagonal covariance
HMM Variance Floor on variance estimated: 0.01 * global covariance
HMM training and realignment Start with pruning beam width at 250 and increment at 150 for a maximum of 1000
Triphone Cluster Minimum number of frames allocated to any cluster: 100
Decision tree splitting Split cluster into two until increase in log likelihood falls below 350
Data driven clustering Greatest distance between any two states in cluster: 100
Decoding Word insertion log probability: 0.0; Language model grammar scale factor: 1.0