| Speech,as one of the most convenient and simple ways for human to interact with external information,not only contains semantic information,but also contains emotional information.Speech expresses emotional information through the size of sound,pitch,speed and pause,and so on.When people are happy,they speak at a brisk pace and their voice is clear.In sorrow,the voice is slow and dull.In anger,speak quickly and in a higher tone.The presence of emotional information can lead to different or even completely opposite messages in speech with the same semantic meaning,so it is necessary to study speech emotion recognition.Based on the relevant knowledge in the field of transfer learning,this thesis studies the single-database speech emotion recognition and cross-database speech emotion recognition,and the main research work is as follows:1)A method of data processing and data enhancement based on spectrogram is proposed.Most speech emotion recognition methods are based on the time domain information or frequency domain information of speech signals,and the joint characteristics of time and frequency domains are rarely studied.As a time-frequency representation method of speech emotion signals,the spectrogram not only reflects the time domain information and frequency domain information of speech signals,but also expresses the relationship between them.Therefore,the original speech signals are processed to get the spectrogram in this thesis.At the same time,to overcome the defect of less emotional data in speech,this thesis uses four kinds of data enhancement methods,including geometric transformation,pixel processing,add gaussian noise and unsharp masking filtering to expand the data.2)A recognition method of parameter migration based on improved VGG-16(Visual Geometry Group 16)network is proposed.The parameters of the traditional VGG-16 model are migrated,and the speech emotional data are recognized by using its knowledge learned from Image Net.This thesis studies the effects of different transfer layers on speech emotion recognition.The proposed method achieves 83.02%,66.25% and 51.12% accuracy on EMODB,CASIA and SVAEE databases respectively,which is 18.87%,5.04% and 2.19% higher than the results of unmigrated method respectively.At the same time,both the time complexity and space complexity are reduced,which verifies the effectiveness of the migration network.3)A recognition method of domain adaption based on ResNet(Residual Network)is proposed.The model introduces multi-core maximum average difference based on the Res Net network,measures the difference between the source domain and the target domain,by weighting the multi-core maximum average difference to find the optimal kernel,the target domain and the source domain can be matched by minimizing the maximum average difference,then the speech emotion of the target domain can be identified.The proposed method minimized the distance of emotion features between different databases in Hilbert space,and achieved good results in cross-database recognition on Emo-DB,CASIA and SVAEE databases,and verified the effectiveness of the proposed method. |