From: Heterophonic speech recognition using composite phones
Item | Value |
---|---|
Signal Processing | Frame period: 10 ms, Hamming window size: 25 ms, first order pre-emphasis coefficient: 0.97 |
Feature vector | Cepstral lifting coefficient: 22, filterbank channels: 26, 12 MFCC coefficients and 1 energy + delta + acceleration for total of 39 coefficients |
HMM topology | 5 state non-skip left-to-right with diagonal covariance |
HMM Variance | Floor on variance estimated: 0.01 * global covariance |
HMM training and realignment | Start with pruning beam width at 250 and increment at 150 for a maximum of 1000 |
Triphone Cluster | Minimum number of frames allocated to any cluster: 100 |
Decision tree splitting | Split cluster into two until increase in log likelihood falls below 350 |
Data driven clustering | Greatest distance between any two states in cluster: 100 |
Decoding | Word insertion log probability: 0.0; Language model grammar scale factor: 1.0 |