Font Size: a A A

The Research On Music Mood Classification Methods Based On Multi-Modal Fusion

Posted on:2017-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:H XueFull Text:PDF
GTID:2308330485471114Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, vast amount of music data emerges on the Internet. How to efficiently organize and retrieve relevant music infor-mation from volumes of music data has attracted ever growing attentions from varying research fields. As one important measure for music information retrieval, categorizing music based on its emotional attributes helps to effectively enhance the accuracy and efficiency of music retrieval, which is also faced with many technical challenges. Typ-ically, music data is composed of audio and lyrics modalities, while traditional music mood classification methods mainly focus on analyzing music data in single modality, which cannot make full use of the emotional information embedded in the music data due to the limitation of semantics of single modality. Therefore, to effectively mine and exploit the complementarity and correlation between audio and lyrics modalities of music is important for enhancing the performances of current music mood classifi-cation methods.This thesis addresses the problem of automatic music mood classification based on fusion of multiple modalities of music data, and focuses on the effective capturing and exploitation of the emotional information conveyed in multi-modal music data for the improved mood classification performance. We propose a fine-grained sentence-level music representation method, which more precisely captures the emotional character-istics of multi-modal music data than the traditional document-level representations of music. Furthermore, we propose a music lyrics pre-filtering mechanism based on vocabulary reduction by word discriminability ranking and synonymy-based lyrics ex-pansion, which increases the mood discriminability of music lyrics data. On the other hand, we extend the Locality Preserving Projection (LPP) algorithm to the multi-modal scenario to learn a common latent space for the audio and lyric modalities to eliminate their heterogeneity for better fusion. We propose two novel multi-modal classifica-tion models that effectively capture the temporal and structural correlations between sentence-level lyrics and audio descriptions of music. We first propose an hierarchical voting scheme fort music mood classification based on Hough forest, which effectively makes use of the time alignment/correlation cross modalities for higher prediction per-formance. On the other hand, we propose a K nearest neighbour based graph learning method to propagate similarity among cross-modal sentence-level music descriptions, which effectively enhances the mood classification performance by exploitation of the correlation and complementarity between music features of different modalities. The effectiveness of the proposed music mood classification methods have been proved in the experiments.
Keywords/Search Tags:music mood classification, multi-modal, graph learning, Hough forest, latent space
PDF Full Text Request
Related items