Font Size: a A A

Research Of Speech Emotion Recognition Based On Deep Learning

Posted on:2022-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZhangFull Text:PDF
GTID:2518306491955169Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a necessary way of interpersonal communication,speech not only conveys information,but also expresses emotions.The same discourse often expresses different meanings in different emotional contexts,which is the case in global languages.Therefore,speech emotion recognition has important research value.In the field of artificial intelligence emotion computing,speech signal is one of the most basic and important modes.Many scholars at home and abroad focus on the research of speech emotion,which is mainly divided into speech signal direct processing and recognition and conversion to spectrogram processing and recognition.Moreover,speech emotion recognition task can be extended to many application fields,such as intelligent robot for the elderly,office hall service robot,etc.,which also has certain application value.In this paper,speech emotion recognition is studied deeply.Two speech emotion recognition methods are proposed.On the basis of sufficient experiments,the two methods are evaluated and compared with the current advanced speech emotion recognition methods.Finally,conclusions and future work are given.The detailed contents of this paper include:(1)Speech emotion recognition method based on the combination of GRU and CTC.This method combines GRU network with CTC loss function,and inputs the original speech signal directly into the network.After sequence to sequence of GRU network and CTC loss layer processing,good experimental results are obtained on IEMOCAP dataset.(2)Speech emotion recognition method based on Face Net.This method transfers Face Net network which has excellent performance in face recognition to speech emotion recognition,and takes spectrogram and waveform as input signals respectively,and achieves good recognition results on challenging dataset IEMOCAP and CASIA.Due to the 99%recognition rate of Face Net in face recognition,this paper first uses the spectrogram as the input,but the recognition effect is not as expected.Next,the original speech signal is used as input,and Face Net is pretrained on CASIA data set,and then recognition is performed on IEMOCAP.In view of the above two methods,rich comparative experiments are carried out.Experimental results show that the method based on Face Net achieves best recognition result on the single modal on IEMOCAP dataset.
Keywords/Search Tags:Deep Learning, Emotion Computing, Speech Emotion Recognition, FaceNet Network, CTC
PDF Full Text Request
Related items