Font Size: a A A

Research On Music Similarity Calculation Method Based On Deep Learning

Posted on:2022-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2518306494471294Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Music similarity calculation is an important branch in the field of music information retrieval.It has a positive effect on the identification of music plagiarism and other retrieval based on music content.So,music similarity is meaningful.The similarity between music can be summarized as emotional similarity,similar musical theory characteristics,similar genre,etc.In the application scenarios of cover singing and plagiarism detection,it is suitable to focus on comparing music content and similar music theory features.Now,using traditional methods for music similarity comparison is poor in accuracy and inflexible in feature extraction.In response to the above problems,we proposed a music similarity calculation method based on deep learning.We studied the influence of different low-level features and deep learning model structures on the extraction of the main melody of music and the application of deep learning in music similarity calculation.The main contents of this paper are as follows:1.In order to reduce the interference of music information,we extract the main melody of music at first.According to the advantages of convolutional neural network in image processing,we use the semantic segmentation model based on the encoderdecoder structure of convolutional neural network to extract main melody of music.In terms of input,the audio is converted into two-dimensional features which are Generalized Cepstrum(GC)and Generalized Cepstrum of Spectrum(GCOS).Moreover,the Mel cepstral coefficient(MFCC)and Chroma Feature are manually extracted into the input data in a multi-channel manner,so that the input data contains the tone and vocal information.In addition,we try to add a channel-based attention mechanism to the model.Experiments show that the convergence speed of model training is accelerated after adding artificial features.The overall accuracy has improved compare to the baseline after used the multi-feature fusion model which has added the attention mechanism.At the same time,the false alarm rate has a drop.2.Due to the temporality of music and the contextual connection of music,this paper uses a bi-directional long-short term memory network combined with the model structure of the attention mechanism to encode the input data.In terms of input,this article mainly selects the main melody pitch as the main feature,and also uses two important music content features,tone and rhythm.In terms of data,the data is classified into different tonal clusters.Then,we encode these labels into vectors.The data in the same set should be closer.In terms of experiments,this article will be divided into three parts to compare the effects of attention mechanism,distance formula,and music characteristics on the results.And then we show the actual effect of the overall link.Experiments show that the bi-directional long-short term memory network with the attention mechanism can achieve higher accuracy,and setting the loss function as the cosine distance is more helpful to increase the discrimination of clusters;The combination of the main melody and rhythm feature have showed better performance as input data.
Keywords/Search Tags:music similarity, deep learning, multi-feature fusion, music feature extraction
PDF Full Text Request
Related items