Font Size: a A A

Research On Tibetan Speech Recognition Based On Bidirectional Recurrent Neural Network

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiuFull Text:PDF
GTID:2438330620476058Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the emergence of smart phones,the application of terminals has become the trend of technology development.Apple has introduced voice recognition technology into smartphones,setting off an upsurge in human-computer interaction.Voice is the most direct and convenient way to communicate with each other.Compared with the mouse,keyboard and other devices,voice is the fastest and most popular input method.Because the traditional speech recognition model can't recognize non-specific people and complex and changeable speech well,the recognition model in this paper chooses bidirectional cyclic neural network,which has strong time series correlation.At present,there are few researches on Tibetan in speech recognition,so the research in this paper is based on bidirectional recurrent neural network,which tends to process sequential data,Using bidirectional recurrent neural network to recognize Tibetan speech can improve the stability and accuracy of speech.The research contents of this paper are as follows:1)Speech acquisition: intercept a section of audio from the corpus as input and input into the speech system.2)Pre-processing: speech signal pre-processing operations are pre-weighting,sub-frame,windowing,pre-processing the first speech to anti-aliasing filtering processing,this is because people's own vocal organs and acquisition of speech equipment will bring high frequency,aliasing,high harmonic distortion and other effects,this operation is to minimize frequency folding resulting in false frequency components;second,people speak will produce lip radiation,through pre-reloading,improve high-frequency resolution;speech signal features are short-time,so pre-frame smoothing.The preprocessing operation can make the speech signal smoother and more uniform.After preprocessing,the feature extraction stage can extract better feature parameters,thus further improving the performance of speech recognition.3)Feature extraction: the key to the subsequent link of speech recognition is feature extraction.At present,there are many methods for feature extraction.In this paper,feature parameter extraction is based on the Mel frequency cepstral coefficient based on fast Fourier transform(FFT)analysis.4)Bidirectional recurrent neural network: the bidirectional recurrent neural network has strong correlation in time series,which is suitable for dealing with sequence data problems,and the connection is temporal classification technology does not need to mark,align and post-process the data in advance.Therefore,the link timing classification technology is combined with bidirectional recurrent neural network,which makes Tibetan speech recognition based on bidirectional recurrent neural network have better results.Connectionist temporal classification algorithm and bidirectional recurrent neural network can solve the problem of speech segment and label text alignment.Experimental results: this experience uses bidirectional cyclic neural network training data,compared with the traditional HMM and BP model,its recognition rate is higher,and it fluctuates little and is more stable in the experience.
Keywords/Search Tags:Bidirectional recurrent neural network, Tibetan speech recognition, Connectionist temporal classification, Feature extraction, Fast Fourier transform
PDF Full Text Request
Related items