The Research On Music Mood Classification Methods Based On Multi-Modal Fusion

Posted on:2017-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:H Xue

Full Text:PDF

GTID:2308330485471114

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology, vast amount of music data emerges on the Internet. How to efficiently organize and retrieve relevant music infor-mation from volumes of music data has attracted ever growing attentions from varying research fields. As one important measure for music information retrieval, categorizing music based on its emotional attributes helps to effectively enhance the accuracy and efficiency of music retrieval, which is also faced with many technical challenges. Typ-ically, music data is composed of audio and lyrics modalities, while traditional music mood classification methods mainly focus on analyzing music data in single modality, which cannot make full use of the emotional information embedded in the music data due to the limitation of semantics of single modality. Therefore, to effectively mine and exploit the complementarity and correlation between audio and lyrics modalities of music is important for enhancing the performances of current music mood classifi-cation methods.This thesis addresses the problem of automatic music mood classification based on fusion of multiple modalities of music data, and focuses on the effective capturing and exploitation of the emotional information conveyed in multi-modal music data for the improved mood classification performance. We propose a fine-grained sentence-level music representation method, which more precisely captures the emotional character-istics of multi-modal music data than the traditional document-level representations of music. Furthermore, we propose a music lyrics pre-filtering mechanism based on vocabulary reduction by word discriminability ranking and synonymy-based lyrics ex-pansion, which increases the mood discriminability of music lyrics data. On the other hand, we extend the Locality Preserving Projection (LPP) algorithm to the multi-modal scenario to learn a common latent space for the audio and lyric modalities to eliminate their heterogeneity for better fusion. We propose two novel multi-modal classifica-tion models that effectively capture the temporal and structural correlations between sentence-level lyrics and audio descriptions of music. We first propose an hierarchical voting scheme fort music mood classification based on Hough forest, which effectively makes use of the time alignment/correlation cross modalities for higher prediction per-formance. On the other hand, we propose a K nearest neighbour based graph learning method to propagate similarity among cross-modal sentence-level music descriptions, which effectively enhances the mood classification performance by exploitation of the correlation and complementarity between music features of different modalities. The effectiveness of the proposed music mood classification methods have been proved in the experiments.

Keywords/Search Tags:

music mood classification, multi-modal, graph learning, Hough forest, latent space

PDF Full Text Request

Related items

1	The Research Of Music Mood Classification Algorithm In Digital Audio System
2	Research On Automatic Music Annotation And Mood Classification Methods
3	Research On Multi-Latent Spaces-Based Transfer Learning Algorithms
4	Automatic Generation Of Family Music Album Based On Multi-modal Fusion
5	Research And Application Of Multi-modal Feature Learning Algorithms Based On Graph Network
6	Research On Cross-Modal Retrieval Based Latent Semantic Space Learning
7	Multi-view Neural Network Learning Approaches For Cross-modal Retrieval And Classification
8	Research On Multi-Target Pedestrian Detection Method Based On Hough Forest
9	Non-rigid Multi-modal Medical Image Registration Based On Modal Reduction And Conversion
10	Research On Music Emotion Classification Based On Audio And Lyrics