Research On Feature Selection And Classification Methods For Speech Emotion Recognition | Posted on:2024-08-12 | Degree:Master | Type:Thesis | Country:China | Candidate:Z Luo | Full Text:PDF | GTID:2568307127954959 | Subject:Electronic information | Abstract/Summary: | PDF Full Text Request | Speech emotion recognition is a challenging technology that identifies emotional states conveyed in speech by analyzing human voices.With the popularization of speech interaction technology,the development of speech emotion recognition provides humans with more opportunities for emotional communication and brings huge value and potential applications to the business world.Currently,speech emotion recognition tasks are generally divided into two categories: traditional machine learning-based speech emotion recognition and deep learning-based speech emotion recognition.The former involves manually extracting multiple acoustic features and physically reducing their dimensionality through feature engineering methods to improve recognition accuracy.The latter focuses on building network structures that can extract advanced features conducive to emotional classification.Due to the influence of high-dimensional features and sample imbalances,this paper proposes two methods for speech emotion recognition,namely a speech emotion recognition method based on feature fusion selection and a CNN-Bi LSTM speech emotion recognition method based on attention mechanism.The aim is to improve the classification efficiency and accuracy of speech emotion recognition technology and contribute to the development of intelligent speech interaction systems.The main research content of this paper includes the following parts:(1)A feature fusion selection method is proposed to address the feature redundancy of complex high-dimensional feature sets in traditional machine learning speech emotion recognition tasks.The optimal feature subset is selected through filtering and packaging methods based on different evaluation criteria.In the Relief F filtering method,the average distance between samples is calculated to evaluate the differences between samples and effectively remove irrelevant feature vectors with negative weights.The reduced feature vector generated is used as the second-stage input for feature fusion selection to improve the accuracy and reliability of the model.The packaging method uses the two-stage mutation grey wolf optimization algorithm,which considers the interaction between features and classification algorithms and introduces two-stage mutation operators to improve the efficiency of the algorithm.The experiment proposes a random forest weight initialization method to retain most significant features in the early stage of the population,accelerating the algorithm convergence.Using the support vector machine classifier,the proposed method achieved classification accuracies of 90.66% and 78.96% on the EMODB and SAVEE datasets,respectively,reducing the feature dimensions from 1582 to 366 and 255.The results demonstrate that this method can effectively reduce feature dimensions while improving classification performance.(2)A CNN-Bi LSTM speech emotion recognition technology based on attention mechanism is proposed to address the limitations of traditional acoustic features and classifiers and the imbalanced dataset.The Mel-spectrogram features and frame-level acoustic features are combined to form a dual-channel feature to enhance the information of input features.The SMOTE oversampling technique is used to balance the number of different emotion categories in the dataset.The convolutional neural network and bidirectional long short-term memory network system are used to extract local spatial features of speech data and time-series context-dependent information.The attention mechanism is used to mimic the selective attention of humans to better emphasize emotion-related features in speech signals.The experimental results show that this method effectively improves the recognition rate of few sample categories,and the classification accuracy reaches 92.14% on the EMODB dataset.(3)A speech emotion recognition system based on the CNN-Bi LSTM network model with attention mechanism is built using Python language and Py Qt5 function library.The system implements waveform analysis,spectrogram analysis,feature extraction,and emotion recognition functions through buttons. | Keywords/Search Tags: | Speech Emotion Recognition, Acoustic Features, Feature Selection, Deep Learning, Data Balance | PDF Full Text Request | Related items |
| |
|