Font Size: a A A

Research On Data Augmentation Technology For Speech Recognition Application

Posted on:2022-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2518306788956899Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In recent years,machine learning(ML)and automated speech recognition(ASR)technologies have become mature.Voice input is possible instead of text keyboard input.However,in some areas of professional technology,the use of technical terms poses a huge challenge to voice input.One of the most important problems is that the amount of professional speech data with tags is too small to meet the data requirements of training of automatic speech recognition system,which leads to over-fitting of speech recognition model and low accuracy of speech recognition.The collection of techniques and tactics in national table tennis match is a very professional work.The general practice is to label the matches videos with prefabricated technical and tactical terms,and on this basis,complete the relevant technical and tactical statistics and analysis.Text keyboard input using voice instead of technical and tactical templates has become an urgent need for the scientific research group of the national team.To address these issues and needs,this thesis first synthesizes speech using online speech synthesis method,and adds the generated speech to the existing dataset as the basic training set of speech recognition model.Then,the training set is expanded using the data augmentation algorithm based on genetic algorithm or the data augmentation algorithm based on random mean substitution,and then the enhanced dataset is used to complete the training of the acoustic model.Finally,the language model of table tennis field is trained with SRILM tool.Through the above steps,the problem of voice input with professional terms of table tennis is effectively solved.The results of experiment and trial run show that the speech recognition rate reaches 93.77% in a quiet environment,basically reaching the technical and tactical acquisition requirements of the national table tennis team.
Keywords/Search Tags:Data Augmentation, Speech Recognition, Speech Data, Genetic Algorithms
PDF Full Text Request
Related items