| With the wide application of social media,text alone cannot meet the needs of complex emotional analysis of network discourse.The use of a large number of audio,video,pictures and other information has attracted close attention from relevant government departments,researchers and society.In the analysis of complex emotions,the emergence of multi-modal data provides a new direction for the study of fine-grained emotions.By integrating the characteristics of multi-modal data to solve the problem of singleness of emotional characteristics in single-modal texts,it can also provide an applied research basis for accurately describing complex emotions in online discourse.This multimodal research provides a new analysis method for public opinion analysis and wind control prediction of relevant departments,and provides a new research direction for data analysis of existing e-commerce,short video and other industries.In this paper,complex sentiment analysis is carried out based on speech and text modes,and the feature extraction method,multimodal feature fusion method and sample imbalance problem are studied.The main research contents are as follows :This paper proposes a MA2 PE speech emotion feature extraction method to solve the problem of feature missing and redundancy caused by too long or too little data in the process of speech emotion feature extraction.The speech modal data are recombined by spectrum diagram,self-correlation,silence ratio and tone,and8-dimensional audio emotional features are constructed.Finally,the accuracy of speech emotion recognition reaches 89.08 %.In view of the sample imbalance,SOM over-sampling method and fully iterative under-sampling method are proposed,and a text self-monitoring pre-training model based on text2 vec is established.Through the experimental comparison,the differences of the overall results of different processing methods are analyzed,and it is determined that the SOM oversampling method has a great improvement on the effect of sample imbalance.In view of the above research results,the fusion scheme of multi-modal features is analyzed,and a multi-modal fine-grained sentiment analysis model based on feature layer fusion is proposed.A comparative experiment is conducted on MELD and IEMOCAP data sets.The results show that when applied to MELD dataset,the proposed model improves the accuracy by 33.88 % compared with SOTA model,and the classification effect reaches 94.13 %.The proposed multi-modal fine-grained sentiment analysis model based on feature layer fusion can extract more abundant multi-modal sentiment features,and has a good recognition effect on the sentiment analysis of sample unbalanced data,which can further improve the ability of multi-modal sentiment analysis. |