Font Size: a A A

Design And Implementation Of Intelligent Speech Interaction

Posted on:2021-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhaFull Text:PDF
GTID:2518306461461054Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the continuous progress and development of deep learning technology,intelligent voice interaction technology has shown positive and important value in various products and fields.In the entire speech interaction system,the most important and most difficult is speech recognition.In recent years,more and more scholars have begun to use deep learning technology to conduct feature recognition research on Chinese speech.However,Chinese speech is rich and complex,One-sound multiple words,isolated words,personalized hot words,etc.,which have added complexity to Chinese speech recognition.the complexity.Most scholars research on speech recognition based on the study of isolated words and standard continuous speech recognition.The recognition performance of some regional unrestricted expressions is poor.In response to the above problems,this paper studies the CNN-based speech recognition algorithm,and at the same time combines the AIUI open platform of Iflytek Co.Ltd to design and develop a speech interaction system.In view of the high efficiency and accuracy requirements of voice interaction,according to the features of strong feature extraction ability and high efficiency of convolutional neural networks,this paper proposes an acoustic model based on CNN network structure.By designing multiple convolutional layers corresponding to one layer of pooling The layer allows the model structure to see enough historical information.First,through windowing and framing and Fourier transform operations,the speech signal sequence is converted into a frequency domain image,that is,a spectrogram;the spectrogram is sent to the designed deep convolutional neural network for further feature learning,and then the feature sequence Input the fully connected layer for feature integration,and finally optimize the output sequence through the CTC loss function.Traditional language models are mainly based on rule models and statistical models.Usually,they only assume the relevance of words to the two words in front of them,and ignore the influence of all previous words.According to the characteristics of self-attention disregarding the distance between words and directly calculating the dependency relationship,this paper proposes a language model based on self-attention mechanism,which can learn the internal structure of a sentence and can learn between the current word and the previous part of the sentence Relevance,First,the mapping between phonemes and text is constructed,and the corresponding sequence is sent to the multi-head attention function for weighted summation to obtain the attention function,and then the learning integration output is carried out through the fully connected layer.Based on the speech recognition model trained in the subject,this article combines the semantic understanding and speech synthesis functions of the University of Science and Technology,and designs and develops a speech interaction system on the Window platform VS2017 + QT environment.First design the overall program framework,which is mainly divided into algorithm layer,logic interaction layer,and GUI layer.Functionally,it can realize microphone voice pickup,speaker voice playback,call of the voice recognition model in this article,and application of Iflytek Co.Ltd voice SDK and AIUI platform.
Keywords/Search Tags:deep learning, speech recognition, acoustic model, language model, self-attention, speech interaction
PDF Full Text Request
Related items