With the development of human-computer interaction and intelligent driving,in the future,the function of the car is not only a vehicle,but also needs to interact with the driver and provide auxiliary driving functions to ensure safety.Even in the field of automatic driving,the driver may not need to operate the steering wheel manually,and can directly control the vehicle through voice.Human voice not only contains surface text content information,but also contains implicit emotional information.It is the key to realize artificial intelligence to let machines understand human emotion.More importantly,the negative emotion of drivers will easily cause traffic accidents.Therefore,the purpose of this paper is to improve road safety by detecting the emotional changes of drivers through the research of speech emotion recognition in the vehicle driving environment.This dissertation studies the related theories in the field of speech emotion recognition and analyze the characteristics of the noise environment in the car.The main research contents are as follows:First,six common emotional discrete models for drivers are selected,and prosodic features,spectral features and sound quality features are studied in feature extraction.In the real environment,there are white noise,engine noise,tire noise and music noise,which are unavoidable during driving and cause serious interference to the recognition rate of voice emotion recognition.The signal-to-noise ratio of four noises is analyzed,The experimental data set is constructed by adding noise.Then the front end speech enhancement is carried out for the existing noise interference.On the basis of the in-car mixing model,the weighted prediction error,spectral subtraction and Fast ICA de-redundant and de-noising algorithms are compared.It is found that Fast ICA does not require prior knowledge and has low complexity.Even in the case of low signal-to-noise ratio,this method can also improve the speech enhancement performance and is suitable for application in this environment.In order to better solve the problem of multiple interference sources and low signal-to-noise ratio in the interior environment of the vehicle,the automatic selection method of the optimal wavelet basis is added on the basis of this algorithm to optimize the signal denoising performance.Secondly,the mixed signals are separated in the frequency domain through the complex fast independent component analysis algorithm,and the enhanced voice is trained through the support vector machine,The results show that the improved Fast ICA can significantly improve the accuracy of speech emotion recognition.Finally,considering that there will be more and more sensors in vehicles,and with the increase of various types of training parameters,the traditional machine learning ability is limited,the voice emotion recognition model based on convolutional neural network is selected,and the average pooling is used to reduce the impact of noise,and the impact of mixed data enhancement and general data enhancement is explored.In addition,the two-way short-term memory network BiLSTM can use context information,and can capture emotional information in both directions.The experimental comparison shows that the average recognition rate of the CNN+BiLSTM model under noise is 75.7%,and the recognition rate reaches 78.8% at the signal-tonoise ratio of 10 d B,which shows that the method can significantly improve the recognition rate and robustness in noisy environment. |