Font Size: a A A

Research And Implementation Of Speech Emotion Mining Based On Chinese Language Background

Posted on:2019-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z ChenFull Text:PDF
GTID:2348330569995553Subject:Engineering
Abstract/Summary:PDF Full Text Request
Speech Emotion Mining is one of the core applications in machine learning and pattern recognition.Its main research goal is to focus on the signal analysis,feature extraction,and algorithm model establishment of speech,and to classify the emotions for the continuous speech signals generated by narrators.In the current information society,the study of speech emotion mining not only has important theoretical research significance,but also has a very important engineering value.There are many related researches on speech emotion mining.However,because the work mainly focus on some specific language type or some specific data set,it is difficult to have a link of universal application.This has led to the current research work based on non-Chinese speech emotion data not being well adapted to the mining tasks with Chinese background.In addition,due to the difficulty in collection and labeling of speech emotion data,the data available for research at the moment is small in scale and poor in variety.The research of this thesis focuses on the task of speech emotion mining based on Chinese.First,the new model structure is explored by analyzing the limitations of the current speech emotion mining model.Secondly,in view of the lack of resources in current speech emotion data,the data augmentation strategy of speech data is used as a breakthrough point to conduct in-depth research and modeling design,with a view to enhancing the performance of the model proposed in this thesis while proposing a new data-category optimization method for the study of future speech emotion mining tasks.Finally,based on the results of this thesis and the possible problems in the Chinese speech emotion mining system,a specific system architecture is designed.The specific work of this thesis includes: 1.For the problems of the existing machine learning models,such as having difficulty in feature extraction,insufficient model expression,and strong dependence on specific speech data,this thesis puts forward a neural network structure with a multi-layer,multi-channel feature map taken as input,with CNN(Convolutional Neural Networks)process and GRU(Gated Recurrent Unit)cyclic process: MSCGNN(Multi-Channel Spectrogram Conv-GRU Neural Net Work)structure taken as the recognition model.Model training was carried out relying on the CASIA Chinese speech emotion data set recorded by CASIA(Institute of Automation,Chinese Academy of Science),and comparative experiments was conducted with relevant speech emotion mining models.Finally,the experiments showed that the MSCGNN performed well in the Chinese speech emotion mining tasks.2.For the problems that the current speech emotion data sets have small capacity and the process of collection and labeling is often difficult,this thesis proposes a new data augmentation strategy where a variational auto-encoder was used for data mixture generation,and then the reconstruction errors of the sparse encoder through training for various emotion categories were calculated,so as to label its emotion category.Finally,experiments showed that this data augmentation strategy played a good role in deep learning and model learning.3.According to the particularity of the speech emotion mining tasks and the requirements in practical application,this thesis finally presents a set of architecture of the Chinese speech emotion mining system based on online learning,together with the prototype system related display.
Keywords/Search Tags:speech emotion mining, deep learning, data augmentation, online learning
PDF Full Text Request
Related items