Research And Application Of Speech Recognition Based On Syllable Modeling

Posted on:2022-12-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Zhuo

Full Text:PDF

GTID:2518306749972059

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

With the development of the Internet economy,applications such as audiobooks and Podcasts have entered people’s daily life,and the demand for efficient recognition and understanding of online speech content continues to increase.To alleviate the dependence on training data,Hanyu Pinyin is adopted as an intermediate result between speech input and text results,splitting the working process into two stages: first recognizing acoustic features into Hanyu Pinyin（as acoustic model stage）,and then transforming Hanyu Pinyin into the expected text results（as language model stage）.The main work of this thesis is as follows:（1）In acoustic model,a peaky distribution is shown when connectionist temporal classification（CTC）loss function predicts non-blank labels,and the predicted locations often deviate significantly from the true positions,which is prone to performance impairment.Since Mandarin Chinese is a syllable language and is pronounced with approximately equal duration for each syllable,equal interval prior can be introduced in acoustic modeling to largely limit the CTC path search range and reduce the computational cost.To address the problem of peaky distribution of CTC loss prediction,equal interval prior is introduced into CTC loss to limit the CTC path search range and improve the performance of the acoustic model.The performance of acoustic models based on DFSMN networks were compared on the speech-to-pinyin conversion task.It is verified that,compared with CTC,the equal interval prior-based Es CTC algorithm has a positive effect on the acoustic model.The character error rate（CER）on the dataset AISHELL-1 is reduced by 3.76% compared to DFCNN baseline model.（2）In language model,the intermediate results of Hanyu Pinyin need to be converted into expected text results.However,various types of errors are output by the acoustic model,which can reduce the accuracy of conversion to text results.Since the semantic information of Hanyu Pinyin exists in combinatorial relationship of adjacent tokens,the ability on error correction for Hanyu Pinyin sequences and the quality of the text results can be improved by enhancing the modeling of local context.Enhancing the semantic modeling of the local context of Hanyu Pinyin texts is introduced into language model to improve the ability of error correction.A local semantic enhancement method based on Gaussian distribution is introduced into the self-attention network（SAN）,and two submodels for pinyin error correction and pinyin-to-chinese conversion are designed and combined in cascade.The CER is reduced by 3.0% compared with the Transformer baseline model.（3）Based on the above results,a speech recognition system for Uyghur and a following machine translation system for Uyghur-> Chinese was designed and built.In the test against the THUYG-20 test set and CWMT 2017 test set,respectively,the WER for Uyghur speech recognition is reduced by 11.61% compared with the THUGY-20 baseline,and the BLEU for Uyghur-> Chinese translation is increased by 5.84 compared with the standard Transformer model as baseline,which proves that the above methods can be extended to different languages.

Keywords/Search Tags:

Mandarin speech recognition, Hanyu Pinyin, Equal spacing, Local semantic enhancement

PDF Full Text Request

Related items

1	Research On Application Of Pitch Extraction Method Based On Speech Enhancement To Speech Recognition
2	Parallel Optimization Method In Language Model For Mandarin Speech Recognition
3	Emotion Recognition By Speech Signal In Mandarin
4	Research Of Mandarin Pronouncing Evaluation Based On Speech Recognition
5	Research On Noise Robust Methods In Mandarin Word Recognition
6	Research On Mandarin Digit Speech Recognition Technology And Implement Approach
7	Research On Key Issues Of Mandarin Speech Emotion Recognition
8	Research On The Using Of Tone Information In Mandarin Automatic Speech Recognition System
9	Applications Of Speech Recognition And Evaluation In Computer-Assisted Mandarin Learning
10	The Research Concerning The Features Of Mandarin Speech