Font Size: a A A

Uyghur Speech Emotion Features Analysis And Recognition

Posted on:2022-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:S F L T N Z M D N I Z A M Full Text:PDF
GTID:1488306557994869Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Machine translation,automatic speech recognition and other fields have been studied for more than half a century.The purpose of these research field is to be able to interact naturally between people and machines,even between machines,just like people do.The traditional speech recognition only converts the speaker's content from speech to text,without considering the speaker's state,emotion and other characteristics.Speech emotion recognition is to study the speaker's emotional state.However,compared with traditional tasks such as speech recognition,speech emotion recognition task lacks large-scale emotional speech database,and many languages do not have emotional speech database.Therefore,the traditional deep learning method for speech emotion recognition is not effective.In view of the above problems,this thesis studies the emotional database establishment,feature dimension transformation and model establishment:1.Aiming at the problem that the domestic minority language speech emotion database is research gap,this topic designs the related experimental scene and establishes the Uyghur speech emotion database.A total of 1200 speech emotion samples were collected from 20 Uyghur speaking performers(10 women and 10 men)simulating six emotions.In the process of recording,performers are induced by emotional scene sentences to approximate the real emotional voice.Based on the above-mentioned Uyghur speech emotion database,this thesis analyzes the acoustic characteristics of each emotional voice,and ability of acoustic features to distinguish emotional states was observed.2.Affective feature space learning is one of the most important research directions in the field of speech emotion.To achieve efficient and compact low-dimensional features for speech emotion recognition,novel feature reduction method using uncertain linear discriminant analysis is proposed.Using the same principles as for conventional linear iscriminant analysis(LDA),uncertainties of the noisy or distorted input data are employed in order to estimate maximally discriminant directions.The effectiveness of the proposed uncertain LDA(ULDA)is demonstrated in the Uyghur speech emotion recognition task.The better performance is achieved compared with other dimensionality reduction techniques.The experimental results show that when employing an appropriate uncertainty estimation algorithm,uncertain LDA outperforms the conventional LDA counterpart on Uyghur speech emotion recognition.In addition,this thesis proposes a classification method based on atomic representation model to realize Uyghur speech emotion recognition.In recent years,classification algorithms based on representation model,such as sparse representation method,have aroused great interest in the field of pattern recognition,and achieved good results.Effective expression of emotion features has a great influence on speech emotion recognition.Emotional features are extracted from Uyghur speech,and the atomic representation model is used to model the extracted emotional feature space.The closest emotional category is selected from the constructed emotional space model,so as to achieve the purpose of emotional recognition.The experimental results show that the proposed method is superior to the traditional method,and the recognition rate on the Uyghur emotional speech database reaches 64.17%.3.As a highly active topic in the field of human-computer interaction,the speech emotion recognition(SER)aims to classify the emotional tendency of speakers‘ utterances.However,due to the insufficiency of training examples on speech emotion corpora,the current deep learning methods can achieve high performances when using a large amount of training data.But on the other hand,Siamese neural network can work with a limited amount of training data,through pairwise training,which mitigates the impacts from sample deficiency,and provides enough iterations.To obtain sufficient SER training,this study proposes a novel method,using Siamese Attention-based Long Short-Term Memory Networks.In this framework,we designed two Attention-based Long Short-Term Memory Networks which share the same weights,and we fed the networks with frame level acoustic emotional features rather than utterance level emotional features.The proposed solution showed significant improvement on SER results,compared to conventional deep learning methods.In addition,in order to efficiently utilize information and solve degradation,attention-based dense long short-term memory(LSTM)is proposed for speech emotion recognition.LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced.That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer.The experiments demonstrate that proposed method improves the recognition performance by12% and 7% on e NTERFACE and IEMOCAP corpus respectively.4.For the existing emotional computing algorithms,there exists emotional tracking delay,and the continuity of emotional state is not taken into account.In response to this situation,this study presents a continuous speech emotion trend detection technology based on data field emotion space and shuffled frog-leaping algorithm.Firstly,the emotional space of the data field is constructed,and the data field particles are simulated by the emotional features.The interaction between the particles is described by potential energy function.And then use the shuffled frog-leaping algorithm,with the frog individual to simulate the emotional characteristics among the emotional state changes,so as to find the trend of emotional change.Experiments show that the algorithm performance is indeed better than the existing algorithms.
Keywords/Search Tags:speech emotion recognition, Uyghur language, Siamese network, long-short term memory, uncertain linear discriminant analysis, atomic representation, data field, shuffled frog-leaping algorithm
PDF Full Text Request
Related items