| As human society rapidly steps into the era of intelligence,human-computer interaction in daily life has changed from a rigid way to an intelligent way of speech,text,action and physiological signals.Therefore,people hope that computers can also have independent consciousness and thoughts,and can accurately identify human emotional state,and emotional recognition research should not only be limited to adults,children as the future of the thriving motherland,parents and teachers through accurate understanding of their emotional state,can help them relieve unnecessary emotional pressure.To better help them grow.The emotional information contained in the traditional single-mode emotion recognition is often limited,and the recognition accuracy of the single-mode emotion recognition model is not high.Therefore,this paper carries out the emotion recognition of children through the feature fusion of speech and text,and the specific work is as follows:On the one hand,the children’s emotion recognition is carried out by the way of speech and text bimodal fusion,and the recognition effect is compared with the single mode emotion recognition model.Neural network is used to extract the emotional features of speech and text modes.At the same time,spectrogram features are integrated into the single mode of children’s speech emotion recognition model,and frequency domain features are combined with other speech features to obtain the emotional features of children’s speech.Experimental results are compared to verify that frequency domain features in the spectrogram have improved the performance of children’s emotion recognition model.The validity of the proposed model is verified.On the other hand,when children’s emotion recognition is carried out,there are often distinct segments of emotional and non-emotional information in children’s speech features and text features.Therefore,in the process of emotional feature extraction,direct use of bidirectional long short-term memory network model for feature extraction may be mixed with features unrelated to emotions,so as to be unable to accurately identify children’s emotions.In this paper,attention mechanism is integrated into children’s bimodal emotion recognition model,and experiments are conducted on IEMOCAP dataset and FAU-AIBO dataset.Compared with the recognition accuracy of the bimodal model in the data,the recognition accuracy of the emotion recognition model based on the integration of attention mechanism in this paper is improved by about 2%,which verifies the superiority of the proposed model.The experimental results show that the children’s dual modal emotion recognition model proposed in this paper integrating attention mechanism can effectively recognize children’s emotions,and the children’s emotion recognition model integrating two modal features of speech and text is better than the children’s emotion recognition model with single modal features. |