Font Size: a A A

Research On Cross-Channel Model For Speech Recognition

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:W Q DuFull Text:PDF
GTID:2428330614970829Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,automatic speech recognition technology has developed rapidly,and the acoustic model built with the DNN-HMM framework has gradually replaced the traditional GMM-HMM modeling method.However,with the landing of various practical applications and the deployment of system operations,problems such as insufficient model generalization capabilities have gradually been exposed.The specific manifestation is: when faced with different device channels,the system performance stability is not enough.The research direction of this paper focuses on solving the impact of cross-device channels on speech recognition systems.The research content is mainly composed of the following two parts.First,use the discriminative training model(Chain Model)training strategy to improve the performance of the baseline system.First,use the differentiated training model(Chain Model)training strategy to improve the performance of the baseline system.Through Spec Augment and mean normalization,the channel distortion noise is effectively removed,and the robustness of the model is improved.Second,the transfer learning method is used to guide the current task with the rich knowledge in the source domain model,and compare the advantages and disadvantages of the two transfer learning methods in the current task with model pre-training and hidden layer parameter sharing.Finally,the system performance is fully improved.Based on the above research,through discriminative training,the difference between the recognition results of the model due to different device channels can be reduced from more than 50% to 34%.After introducing the two methods of Spec Augment and mean normalization,the difference of the recognition results is reduced to about 11%.Combined with transfer learning technology,the final difference can be controlled within 10%.At the same time,the average word error rate of the optimal model on the complex multi-channel test set is reduced from 46.67% of the baseline system to 21.33%.
Keywords/Search Tags:Speech Recognition, Transfer learning, Cross-device channel, Deep neural network
PDF Full Text Request
Related items