Font Size: a A A

Speech Data Classification Based On Hybrid Neural Network

Posted on:2020-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:X Y CaoFull Text:PDF
GTID:2428330575996881Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence,the recognition and analysis of speech,text,physiological signals and facial expressions have attracted more and more attention of scholars at home and abroad.In addition to the research in various fields,finding cross-domain similarities has gradually become a research topic of concern to researchers.In the field of speech emotion recognition,many researches focus on the innovation of feature extraction and data dimensionality reduction because of the diversity speech segments with high dimensionality.However,it is difficult to find a generalized model suitable for all speech emotion recognition tasks.At the same time,because the construction of speech emotion database is not as convenient as other signals,so the data enhancement method for speech emotion corpus is also a research focus.In the current speech emotion analysis work,a model with high robustness and a large-scale data set is particularly important.In this paper,speech emotion classification based on hybrid neural network model is studied around speech emotion data enhancement method.The main work of this paper is as follows:.(1)Dividing speech segments into word levels according to semantics,the segments after dividing will contain more distinct temporal information,which will be put into temporal network model for feature extraction,and the original temporal feature information of speech will be retained.According to the characteristics of speech emotion,the existing image enhancement methods are used for reference and improved,and several feasible speech data enhancement methods are proposed to avoid the data sparseness problem in speech emotion recognition.(2)In order to verify the experimental results of the proposed data enhancement methods,several different neural network models,such as Convolution Neural Network(CNN),Recurrent Neural Network(RNN)and Long Short-Term Memory Neural Network(LSTM),are compared in this paper.On this basis,a hybrid neural network model combining CNN and bidirectional LSTM is proposed to preserve both the temporal features of audio segments and the deep image features extracted by convolution layer.This experiment uses CASIA Chinese Affective Corpus provided by Institute of Automation,Chinese Academy of Sciences,the German speech data set DMO-DB and that speech data set of TV interview and movie clips collect by the laboratory are compared with the hybrid neural network model to verify that the proposed model improves the accuracy and robustness relative to the baseline system.
Keywords/Search Tags:Semantic Segmentation, Speech Emotion Classification, Speech Data Enhancement, Hybrid Neural Network Model
PDF Full Text Request
Related items