Font Size: a A A

Research On Adaptive Recognition Of Different Accent Conversations Based On Convolutional Neural Network

Posted on:2019-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2428330566478003Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,computers and the Internet have led the development trend of the entire world.In the big data age of this information explosion,there are countless audio files from various fields every day,the technology of speech recognition is also becoming more and more popular in academic research and industrial production.However,in the process of speech recognition,it is unavoidable to encounter audio files containing more than one speaker in a voice,and it is also inevitable that there will be speaker sounds with different accents,the existence of these two objective problems will inevitably affect the effect of speech recognition,and greatly reduce the performance of the recognition system.Therefore,this article focuses on these two issues to carry out research and discussion.This paper first proposes a combined feature MFCC_SPECTROGRAM based on Mel-Frequency Cepstral Coefficients(MFCC)and SPECTROGRAM.Since the human ear's perception system is a special non-linear system,Mel-Frequency Cepstral Coefficients can be better to represent the sound from the listener's perspective,and it is more consistent with the ear's auditory characteristics,it focuses on the description of the hidden features of speech signal from the human auditory perspective,while SPECTROGRAM focus on the essential characteristics of speech signals from the perspective of the human body's speech signal generation system.Based on the characteristics of two features of MFCC and SPECTROGRAM,this paper proposes MFCC_SPECTROGRAM speech combined feature parameters as the basic characteristics of subsequent speech research.Secondly,this article introduces the concept of CALL-CENTER,and describes and introduces the characteristics of a large amount of speech data exists objectively in the CALL-CENTER environment.The voice recording files in the CALL-CENTER environment of this article generally contain the voice of two speakers,and there are two types of accent,one is Mandarin and the other is Chongqing dialect.For conversational speech containing two people,a speaker segmentation model is proposed based on a combined feature approach using the Convolution Neural Network.The combined feature is used to represent the speaker's speech signal characteristics.As the input of the convolutional neural network,the speech feature of the interlocutor is trained to get the training model which can be used as the speech segmentation of the speaker.Through experimental comparison,the results show that the speaker speech segmentation algorithm based on convolutional neural network has higher performance than the traditional Bayesian distance segmentation algorithm under the same characteristics.In the speaker segmentation model based on convolutional neural network proposed in this paper,the MFCC_SPECTROGRAM combined feature is used as the speech feature,and the segmentation result is higher than the segmentation result obtained by MFCC or SPECTROGRAM speech feature alone.In the end,aiming at the phenomenon that voice recording files contain different accents in the CALL-CENTER environment,an automatic classification method of different accents based on weighted multi-features Fusion is proposed.At the same time,it proposes a different accent speech recognition model based on speaker segmentation in conversational speech.At the same time,a multi-accent speech recognition model based on speaker segmentation in conversational speech is proposed.The speech recognition system is automatically called for automatic recognition based on the obtained segmentation marks of the speaker and the classification marks of different accents.In order to verify the feasibility and effectiveness of the proposed method and model,the speech recognition open system of the famous Chinese enterprise iFLY is introduced.Through a series of related experiments,the results show that for a dialogue speech that contains different accents,the speech recognition result of the speech after speaker segmentation is higher than that of the original speech recognition,classifying speech with different accents and calling the corresponding speech recognition interfaces,the speech recognition effect is higher than that of the original speech recognition.The results show that the method and model proposed in this paper have certain research significance and practical value.
Keywords/Search Tags:MFCC_SPECTROGRAM combined feature, speaker speech segmentation, accent classification, speech recognition
PDF Full Text Request
Related items