Font Size: a A A

Road Rage Recognition System Based On Speech Features

Posted on:2022-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:W J WangFull Text:PDF
GTID:2492306743951689Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the improvement of Chinese people’s living standards,the number of drivers is increasing.So do traffic accidents.One of the most important causes of traffic accidents is road rage.Comparing road rage recognition with traditional emotion recognition,the voice signal to be recognized contains complex traffic environment noise,and the recognition target is anger.The work and innovation of this paper consists of three parts as follow.(1)Extract the features,which have the characteristics of good robustness and effective expression of anger.The Mel frequency cepstrum coefficient(MFCC)is used.Then,the inverse Mel frequency cepstrum coefficient(IMFCC)is introduced because the MFCC can not express high-frequency angry speech signal well.For feature fusion,part of MFCC and IMFCC are spliced to obtain spliced MFCC.The combined filter bank is composed of the Mel filter bank and the inverse Mel filter bank to obtain hybrid MFCC.In order to improve the robustness of features,Gammatone cepstrum coefficient(GFCC)is introduced.In order to reduce the redundancy of fused features,Fisher ratio is used to calculate the contribution of each dimension feature to anger recognition,and the 12 th order F-MFCC and the 18 th order F-MGCC are constructed.Through experiments of noiseless speech,compared with other single features and fusion features,the anger recognition accuracy of F-MGCC improved by 7.53%.Aiming at the problem of anger emotion recognition with unknown speech signal-tonoise ratio(SNR)in real environment,the Fisher ratio distribution of MFCC,IMFCC and GFCC under 0d B,-10 d B,-20 d B SNR and noiseless speech is studied,the 20 th order generalized F-MGCC is reconstructed.The generalized F-MGCC is used to train the model with single SNR speech signal.For speech inference with other SNR,the average accuracy of anger recognition is 87.25%.(2)A model is proposed to recognize anger,which is based on convolutional neural network(CNN)and bidirectional long-term and short-term memory network(Multi-Head Self-Attention Bi-LSTM)with Multi-Head-head Self-Attention criterion.CNN is used to obtain the high-level feature vector of spatial dimension in speech feature parameters.Bi-LSTM with Multi-Head Self-Attention is used to obtain the high-level feature vector of time dimension in speech feature parameters.The output of Bi-LSTM with Multi-Head Self-Attention and CNN are spliced as the input of the full connection layer.Finally,the anger recognition and classification task is completed by using the Softmax function.The accuracy rates are 96.27% and 97.87% obtained on the RAVDESS and CASIA data sets respectively.(3)Design and implement the road rage diagnosis system based on speech.The system is developed based on the features and model proposed in this paper,My SQL database,SSM(Spring + spring MVC + My Batis)and We Chat mini-programme framework.The functions of the system include data collection,anger recognition,visual analysis of single time of voice road rage and road rage frequency,so as to assist drivers’ driving.
Keywords/Search Tags:anger recognition, road rage, speech signal processing, deep learning, noise environment, feature fusion, Fisher ratio criterion
PDF Full Text Request
Related items