Font Size: a A A

Research On Multimodal Music Emotion Recognition Based On Deep Learnig

Posted on:2024-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:L Q YinFull Text:PDF
GTID:2555307076491364Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid development of digital media,audio technology and artificial intelligence,the amount of data on music continues to grow,and so does the corresponding research on it.Music by its nature is a vehicle for human emotion expression,and how to achieve more accurate recognition of musical emotions becomes a key concern.In this paper,we combine the continuous and discrete emotion features of music based on deep learning fusion of multimodal music features,i.e.combining different types of information in music information,to optimise the emotion recognition effect of the model.The innovation points and main work of the whole paper can be divided into three parts as follows:(1)The WLDNN GAN model is proposed.The original music information is first processed by pre-emphasis,windowing and framing,and then MFCC(Mel-Frequency Cepstral Coefficients)features and PLP(Perceptual Linear Prediction)features are extracted from the processed music samples,which are used as the input to the constructed WLDNN The fusion of the two features in the high-dimensional space of the WLDNN is carried out to retain the original music features to the greatest extent possible,while the pre-processed music data is more representative and paves the way for the recognition part later.Finally,the GAN network was used for emotion recognition,and the results were compared with the current mainstream emotion recognition models MLR(Multivariable Linear Regression Model),DBLSTM(Deep Bidirectional Long Short-Term Memory)and CNN_GAN.The results show that the final VA prediction regression value is better than the current mainstream models,and the prediction effect of the valence value is better than that of the arousal value.(2)Based on the previous chapter,the WLDNN SAGAN model is constructed,i.e.the GAN module of WLDNN GAN is optimised and the self-attention module is added to optimise the weight size of the music signal input to achieve more efficient and accurate music emotion recognition.This chapter adopts Schenkerian analysis to extract the most emotionally representative musical section in the movement and use it as the main melody vector input,and optimises the MFCC features based on Chapter 3 by combining the MFCC features with RP for weighting to represent a more complete and comprehensive musical emotion feature.The fused music information is input into the WLDNN SAGAN network,and the WGAN(Wasserstein GAN),MCCLSTM and MCCBL models are experimented in a cross-sectional comparison,and it is concluded that the MFCC features extracted from the WLDNN SAGAN model in the continuous emotion space of input features and the PLP features and the main melody features extracted from the discrete emotion space The WLDNN SAGAN model has the highest accuracy compared with other mainstream emotion recognition models when the number of features is 1:1,which proves that multimodal feature input has a positive effect on music emotion recognition.(3)The interface of the intelligent music system was built based on Pyqt5,and the WLDNN SAGAN model built in this paper was embedded into the emotion recognition module of the intelligent music system to realise the function that the system automatically extracts the emotion features and performs emotion recognition once the user imports music.
Keywords/Search Tags:Musical emotion recognition, Multimodal information fusion, Main melody vector, WLDNN_GAN, Intelligent music system
PDF Full Text Request
Related items