Font Size: a A A

Speech Emotion Recognition Research Based On Information Understanding

Posted on:2023-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:X M PanFull Text:PDF
GTID:2558306914973369Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of intelligent device technology,voice,as one of the most convenient human-computer interaction technologies,has attracted extensive attention.Emotion,as one of the intelligent expressions that effectively distinguish human beings from most animals and plants,plays an important role in human-human interaction.In order to make artificial intelligence carry out natural and smooth communication and interaction,emotional expression is essential.Because the recognition and detection of speech emotion is relatively complex,the performance of speech emotion recognition is not ideal.This paper proposes the relative invariance of speech emotion.The relative invariance of emotion emphasizes that the expression of emotion is relative and is affected by multiple factors,so the influence of other factors should be removed when extracting emotional information.Relative invariance has different manifestations in different dimensions;in the time dimension,emotion is long-term,and the expression of emotion is sparse,and emotion is reflected only in a few frames in speech;In the frequency dimension,emotion is reflected in the relative value of fundamental frequency and the relative value of formant.The following is the main content of the paper:1.The speech emotion information extraction method based on multi-scale and attention is studiedStarting from the performance of relative invariance in time domain and frequency domain,this paper uses single-scale and multi-scale CNN to extract information,carries out systematic experiments on the parameters of single-scale CNN,and carries out serial layer hopping connection experiments on multi-scale CNN.Through the analysis of confusion matrix,this paper studies the specific effect of multi-scale on various emotions.Research shows that the introduction of multi-scale can improve the accuracy of sadness and neutralization emotion recognition to a certain extent.At the same time,according to the relative invariance of different dimensions of speech emotion,time attention,frequency attention and channel attention are introduced,and the insertion position,pooling mode and scaling parameters of frequency attention and channel attention are studied to improve the performance of the system.2.A speech emotion relative invariance information extraction method based on difference is proposedAccording to the performance of the relative invariance of speech emotion in the time dimension,this paper creatively proposes to use the long time difference to extract the invariance information contained in the spectrogram features.Long duration is a unique property that distinguishes emotion from speaker and content,so long time difference can eliminate the influence of speaker and content to a certain extent.In this paper,the influencing factors of long time difference are summarized into three points:differential time moving mode,differential time filling mode and the form of subtraction between differential time intermediate signal and original signal.Parameter experiments are carried out on the influencing factors of long time difference between inputs and networks to find the most suitable experimental parameters for speech emotion.In the process of experiment,the mutual evidence between experimental results and experimental conjecture further proves the long-term nature of emotion.3.The extraction method of speech emotion relative invariance information based on attribute modeling is studiedConsidering the correlation between emotion attributes and the final emotion classification,the invariance information of emotion exists in the attribute information to a certain extent.Based on multi task learning,four kinds of attribute information are introduced into the main task of speech emotion recognition to assist recognition.By analyzing the effects of individual experiments and combined experiments of four kinds of attribute information on different emotions,the relative invariance of emotion is deeply explored,and at the same time,attribute coding is introduced,let the network learn the invariable attribute information in voice emotion by itself.In contrast,the experiment of introducing kinds of attribute information into the main task of speech emotion recognition has achieved better results.Based on the essence of speech emotion,this paper looks for a deep learning network suitable for speech emotion recognition by studying the relative invariance of speech emotion.
Keywords/Search Tags:speech emotion recognition, relative invariance, multiscale, difference, attribute modeling
PDF Full Text Request
Related items