Font Size: a A A

Research On Vietnamese Speech Recognition Method Based On Multi-granularity Error Correction

Posted on:2022-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:R F LiangFull Text:PDF
GTID:2518306524952419Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech recognition technology is the basis of human-computer interaction applications,and has important application value in systems such as machine translation machines,human-machine dialogue question and answer systems,and intelligent conference real-time captions.At present,there is relatively little research work on Vietnamese speech recognition,mainly using traditional hybrid models based on deep neural network hidden Markov in mainstream languages such as English and French.Recently,the method based on sequence to sequence has gradually become a research hotspot in academia..However,unlike mainstream languages,Vietnamese is a scarce resource and faces the problem of a scarcity of speech training corpora.It is difficult to obtain good results on the current speech recognition models that require large-scale training corpus.Secondly,Vietnamese is a kind of In a monosyllabic tonal language,the smallest structural unit is a syllable.In the speech recognition model,syllables and phonemes are generally used as the recognition units of Vietnamese.Due to the vague definition of the boundaries of Vietnamese syllables,these recognition units have unreasonable problems.Finally,each syllable in Vietnamese has 6 tones,and different tones represent different meanings,which makes the combination of Vietnamese words and entities complex and diverse.The same words and entities with different tones have similar pronunciation and pronunciation.The current acoustic model lacks correctness.The understanding of speech content has the problem that it is difficult to distinguish Vietnamese words and entities with multiple granularity and similar pronunciation,which makes Vietnamese speech recognition effect poor.In response to the above problems,the thesis mainly completed the following research work:(1)Construction of Vietnamese speech recognition training corpus based on multi-granularity error correctionIn response to the scarcity of the Vietnamese speech training corpus,first,analyze the strategies for acquiring Vietnamese speech and stylistic data,and use crawler technology to obtain a part of the speech-text parallel corpus and relatively more Vietnamese text monolingual data from the Internet.-The text parallel corpus is stored in the database after preprocessing operations such as deduplication,audio track extraction,cutting and storage,etc.,in order to obtain real speech with noisy environment,and Vietnamese text monolingual data after deduplication,denoising and other preprocessing The purpose of obtaining clean monolingual text data is to prepare for the subsequent expansion of the corpus.Secondly,in a quiet environment,we use recording equipment to manually record part of the Vietnamese voice in order to obtain a clean and true voice.Finally,we use speech synthesis technology to synthesize the clean Vietnamese text into the corresponding speech to obtain a speech-text parallel corpus,with the purpose of expanding the Vietnamese speech training corpus.The experimental results show that the speech training corpus constructed by web crawling,manual recording and speech synthesis technology can meet the basic needs of the speech recognition model for training corpus in real application scenarios.(2)Vietnamese speech recognition method based on sub-syllablesAiming at the problem of the unreasonable division of Vietnamese recognition units,the characteristics of Vietnamese are analyzed,and five multi-granularity modeling units of Vietnamese phonemes,letters,syllables,sub-syllables and words are introduced to explore,and a Vietnamese sub-syllable-based division of Vietnamese is proposed.method.Find the most suitable recognition unit for the subsequent research work of multi-granularity error detection and correction for speech recognition.Firstly,72 alphabetic dictionaries were constructed based on the constituent units of Vietnamese vowels,consonants and six tones,and then the training text data was split into two smaller sub-syllable units of consonants and tones based on the alphabetic dictionary to model.Finally,the experimental results show that the proposed method has better recognition effect than the baseline model.(3)Vietnamese speech recognition method based on multi-granularity error correctionAiming at the problem that the current model is difficult to distinguish similar pronunciation sequences in Vietnamese,a Vietnamese speech recognition method based on multi-granularity error correction is proposed.The purpose is to perform word and entity recognition error detection on the Vietnamese speech recognition results and correct the wrong words and entities,so that the final model output results conform to the voice expression content and improve the semantic expression ability of the model.Train the speech recognition model on the speech training corpus constructed based on(1),label the multi-granular parallel corpus of words and entity recognition errors according to the recognition results,and then use the sub-syllable-based modeling unit in(2)to analyze the training corpus.Text data modeling,incorporating multiple granularities for decoding during training.The experimental results show that the detection of multi-granularity errors in the speech recognition results and the integration of multi-granularity error correction have significantly improved the speech recognition model's ability to express the semantics of sentences.(4)Vietnamese speech recognition prototype system based on multi-granularity error correctionBased on the above-mentioned related theoretical research,a Vietnamese speech recognition prototype system based on multi-granularity error correction was built.The system is mainly aimed at Vietnamese speech recognition.The functional modules of the system include Vietnamese speech input,Vietnamese speech transliteration,output functions,and deployment and application of trained speech recognition models.
Keywords/Search Tags:Automatic speech recognition, End-to-end, Vietnamese, Error correction, Multi-granularity
PDF Full Text Request
Related items