Font Size: a A A

Application Research Of Deep Learning In Speech Recognition Of Sichuan Dialect

Posted on:2021-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:J FuFull Text:PDF
GTID:2428330611987195Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of deep learning,speech recognition technology has made great progress.With the development of speech recognition technology and the mutual influence of different cultures,the use of dialects for humancomputer speech interaction has become a new research direction.Sichuan dialect is active in social media with its unique charm,and the number of people using Sichuan dialect is hundreds of millions,and speech recognition is a relatively hot application technology.The study of Sichuan dialect speech recognition plays a positive role in understanding Bashu culture.Speech recognition is a hot research direction and also the focus of scientific research.Researchers have made a lot of research on speech recognition and identification.Although there have been studies on dialect speech recognition,there are only a handful of studies on Sichuan dialect speech recognition.This paper mainly uses Convolutional Neural Network,Gated Recurrent Network,Hidden Markov Model,Transformer Model to conduct Sichuan dialect speech recognition research,construct Sichuan dialect corpus,and propose a speech recognition method based on improved convolutional neural network and gated recurrent network.The specific research contents are as follows:Aiming at the problem that there is no open standard corpus of Sichuan dialect,a corpus of Sichuan dialect is designed.The corpus data is extracted from the local Sichuan dialect movies and TV plays.After the format conversion,cutting,labeling and checking,all the data are divided into three data sets,including two training sets and one test set.The training set contains voice data of about 201 minutes and 30 minutes respectively,the voice data duration of the test set is 20 minutes.Through this corpus,the correspondence from voice audio to Mandarin text annotation is realized.To solve the problem that there are few researches on speech recognition of Sichuan dialect,a speech recognition method based on improved convolution neural network is proposed.The feature information of spectrogram is extracted by deep convolution neural network,and then it is mapped to text by combining CTC decoding and hidden Markov model.Training and testing on Sichuan dialect corpus,and comparing with other dialect recognition results,the experimental results show that the proposed algorithm reduces the error rate of Sichuan dialect speech recognition,and improves the recognition rate.In this paper,an improved gated recurrent network speech recognition method is proposed,which uses GRU as acoustic model and transformer as language model.Through the preprocessing,feature extraction,CTC decoding and Transformer model of speech segments,the mapping process from audio sequence to text sequence is realized and tested on Sichuan dialect corpus The experimental results show that the algorithm proposed in this paper improves the accuracy of the words,and is superior to other related algorithms in the field.
Keywords/Search Tags:Sichuan dialect, speech recognition, corpus of Sichuan dialect, convolutional neural network, gated recurrent network
PDF Full Text Request
Related items