Font Size: a A A

Application Of Deep Learning In Music Automatic Tagging

Posted on:2018-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q GongFull Text:PDF
GTID:2348330512492056Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The traditional way of music automatic tagging follows such routine:beginning with a group of labeled dataset,extract audio feature of each song,then modeling different label with different model,but this approach seems redundant.In another aspect,the emergence of bigger music automatic tagging dataset has gradually changed the way of model design along with the rise of deep learning in recent years.In this paper,we utilized deep learning and big dataset to provide more concise and accurate tagging models.Specifically,we designed three different model structure correspond to different feature inputs(mel-spectrogram,spectrogram,MFCC and raw audio),and we evaluated their performance on Maganatagatune dataset for the convenience of performance comparison since lots of previous work was based on this dataset.The result shows that the model of raw audio and mel-spectrogram performs much better than spectrogram and MFCC.Then we visualized the strongest response of pre-trained mel-spectrogram model through gradient descent given random noise as input.Then,to compare model of different depth,we utilize the subset of MSD(Million Song Dataset)the lasm.fm dataset.The result shows that deeper model obviously outperformed shallow model,which agrees to latest work in computer filed.Such result also imply the importance of dataset size on the performance of deep learning model.Our main contribution is as follow:(1)We designed several deep learning model for music automatic tagging and used several musical feature as model input.The result shows that raw audio and mel-spectrogram provide much better performance than spectrogram and MFCC.Meanwhile as compare to previous work,our raw audio model achieved better AUC than previous work.(2)We compared the results of model with different depth using bigger dataset and it shows that deeper model significantly outperformed shallow model,which agrees to the latest work in computer vison.As we compared the results of different depth between different dataset,it is obvious that that size of dataset can severely affect the performance of the model,dataset with bigger size is more likely to explore the potential of a model.(3)We visualized the strongest input response of each layer's each filter in mel-spectrogram model,and we found the frequency response is align to the human perceptual scale.
Keywords/Search Tags:deep learning, music, automatic tagging
PDF Full Text Request
Related items