Font Size: a A A

Research On Amdo Tibetan Speech Recognition Technology Based On MRDCNN?CTC?Transformer Transformer

Posted on:2022-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:B J GongFull Text:PDF
GTID:2518306752493284Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Speech recognition technology takes speech as the research object.Through speech signal processing and pattern recognition technology,computers can automatically recognize and understand human spoken speech,and convert speech into corresponding text sequences.It has important application value in projects such as home furnishing,unmanned driving,and mobile robot voice command interaction.With the development of deep learning technology,speech recognition technology based on neural network has gradually become a research hotspot at home and abroad.Due to the limitations of corpus,natural language technology and multiple dialects,Tibetan speech recognition has developed slowly,but its needs are very urgent.In order to promote the development of Tibetan speech recognition technology,this paper takes Amdo Tibetan speech as the research object,adopts deep learning method,constructs Tibetan text corpus and Amdo speech corpus,corpus preprocessing,construction of acoustic model and language model,etc.The key technologies of Amdo Tibetan speech recognition are studied.(1)Corpus ConstructionBy analyzing Tibetan text features and Amdo Tibetan phonetic features.This paper collect a 284.2MB text corpus of different types and a 170-hour Amdo Tibetan phonetic corpus.(2)Corpus preprocessingAccording to the actual needs of Tibetan speech recognition in Amdo,this paper preprocesses the corpus,such as normalization,character segmentation/labeling,etc.We formulate Tibetan digital text classification and specification rules,design a speech recognition-oriented Tibetan character segmentation/labeling.Labeling algorithm,and statistics of Tibetan character distribution.The average accuracy rates of Tibetan digital text classification and specification are 99.45% and 99.28%,respectively,and the accuracy of Tibetan script segmentation/labeling is 99.99%.(3)Acoustic Modeling and Language ModelingOn the basis of analyzing the speech features of Amdo Tibetan,we design a speech recognition model of Amdo Tibetan by MRDCNN?CTC?Transformer with the Tibetan script as the modeling unit.Classification algorithm MRDCNN?CTC,language model using Transformer.(4)Design and Implementation of Amdo Tibetan Speech Recognition SystemOn the basis of constructing the acoustic model and language model of Amdo Tibetan speech recognition,we design and implement an Amdo Tibetan speech recognition visualization system based on MRDCNN?CTC?Transformer.And the performance of the acoustic model,language model and recognition system is verified experimentally.Experiments show that the error rate of the acoustic model is 18.67%,the error rate of the language model is 2.8%,and the error rate of speech recognition is 18.87%.
Keywords/Search Tags:Amdo Tibetan, Speech Recognition, Modeling Unit, Acoustic Model, Language Model
PDF Full Text Request
Related items