Speech Data Classification Based On Hybrid Neural Network

Posted on:2020-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Cao

Full Text:PDF

GTID:2428330575996881

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence,the recognition and analysis of speech,text,physiological signals and facial expressions have attracted more and more attention of scholars at home and abroad.In addition to the research in various fields,finding cross-domain similarities has gradually become a research topic of concern to researchers.In the field of speech emotion recognition,many researches focus on the innovation of feature extraction and data dimensionality reduction because of the diversity speech segments with high dimensionality.However,it is difficult to find a generalized model suitable for all speech emotion recognition tasks.At the same time,because the construction of speech emotion database is not as convenient as other signals,so the data enhancement method for speech emotion corpus is also a research focus.In the current speech emotion analysis work,a model with high robustness and a large-scale data set is particularly important.In this paper,speech emotion classification based on hybrid neural network model is studied around speech emotion data enhancement method.The main work of this paper is as follows:.(1)Dividing speech segments into word levels according to semantics,the segments after dividing will contain more distinct temporal information,which will be put into temporal network model for feature extraction,and the original temporal feature information of speech will be retained.According to the characteristics of speech emotion,the existing image enhancement methods are used for reference and improved,and several feasible speech data enhancement methods are proposed to avoid the data sparseness problem in speech emotion recognition.(2)In order to verify the experimental results of the proposed data enhancement methods,several different neural network models,such as Convolution Neural Network(CNN),Recurrent Neural Network(RNN)and Long Short-Term Memory Neural Network(LSTM),are compared in this paper.On this basis,a hybrid neural network model combining CNN and bidirectional LSTM is proposed to preserve both the temporal features of audio segments and the deep image features extracted by convolution layer.This experiment uses CASIA Chinese Affective Corpus provided by Institute of Automation,Chinese Academy of Sciences,the German speech data set DMO-DB and that speech data set of TV interview and movie clips collect by the laboratory are compared with the hybrid neural network model to verify that the proposed model improves the accuracy and robustness relative to the baseline system.

Keywords/Search Tags:

Semantic Segmentation, Speech Emotion Classification, Speech Data Enhancement, Hybrid Neural Network Model

PDF Full Text Request

Related items

1	The Design And Implementation Of Old People Speech Emotion Recognition System
2	Mandarin Speech Emotion Recognition Based On HMM And Artificial Neural Network Hybrid Model
3	The Speech Emotion Recognition Research Based On Speech Spectrogram And Convolutional Neural Network
4	Research On Speech Emotion Recognition Model Based On Deep Neural Network
5	Neural Network-based Chinese Speech Emotion Recognition
6	Research On Key Technologies Of Speech Emotion Recognition
7	The Modeling Research For Speech Emotion Towards Expressive Speech Synthesis
8	Research On Speech Emotion Recognition Method Based On Hybrid Neural Network
9	A Hybrid HMM And RBF Method For Speech Emotion Recognition
10	The Research On Speech Emotion Recognition Based On Contextual Position Enhancement And Weighted Space