Font Size: a A A

Research On Speech Emotion Recognition For The Elderly

Posted on:2023-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q J JianFull Text:PDF
GTID:2568307031988379Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the population aging process speeding up,the government has made it apparent that more attention should be paid to psychological health of the elderly in the construction of a smart pension system.Speech emotion recognition for the elderly(ESER)has become a study focus of intelligent elderly care since speech is the most direct mode of communication and contains a lot of emotional information.The performance of ESER is not good at present due to the low fundamental frequency,unclear pronunciation and tone quality change of the voice of the elderly.In addition,both Bidirectional Long short-term Memory(BLSTM)and Convolutional Neural Networks(CNN)have the following deficiencies when applied to ESER: First,BLSTM regards every frame of speech as equally important,so it cannot highlight important emotional information.Second,CNN ignores the global features while capturing the local features of speech of the elderly.In view of the above problems,the main research contents of this thesis are as follows:1.Based on the in-depth analysis of the research status of ESER,an overall framework of ESER is proposed from the aspects of elderly emotional data set,characteristics of speech of the elderly,extraction of emotion features and recognition methods.2.To address the issue that BLSTM offers speech of the elderly in each frame the same amount of attention,the attention mechanism is introduced to assign corresponding weight of each frame of elderly speech.In this model,speech features of the elderly are input as a time step vector into BLSTM to learn the deep temporal features,and then the attention mechanism is utilized to assign the appropriate weight to each deep temporal feature.Experiments on an elderly speech emotion library reveal that the attention mechanism can increase ESER performance by making BLSTM output more effective in terms of temporal aspects.3.Because CNN does not adequately represent the global features of speech of the elderly,this thesis takes advantage of the ability of the Transformer to capture long-distance feature dependence and combines CNN with Transformer to improve the perception of local and global features of speech of the elderly.CNN and Transformer are connected in parallel to learn the low-level features of speech the elderly.The local and global features are combined to gain the spatial features of speech of the elderly.By putting an elderly speech emotion library to the test,the model can significantly increase ESER performance.4.The temporal and spatial features of the speech of the elderly can express the emotional information in the speech from two different perspectives.In this thesis,the two features are recombined to identify the emotion in the speech of the elderly.Experiments on an elderly speech emotion library reveal that this method can further improve the performance of ESER.
Keywords/Search Tags:speech emotion recognition for the elderly, Bidirectional Long Short-Term Memory, Convolutional Neural Networks, Transformer, attention mechanism
PDF Full Text Request
Related items