Font Size: a A A

Research On Music Classification Algorithms Based On Deep Neural Networks And Residual Learning

Posted on:2022-02-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Mohsin AshrafFull Text:PDF
GTID:1488306521463934Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the rapid increase in the number of music,the application demand for music classification has become more prominent.Music Genre Classification(MGC)is a content-based music analysis that plays an important role in music retrieval.Although music classification algorithms have achieved impressive results,they are still challenging in terms of improving accuracy,training,and hyperparameter settings.Convolutional Neural Networks(CNN)can extract low-level features,but cannot maintain long-term dependence.Although Recurrent Neural Network(RNN)can maintain long-term dependence,it still has the problem of gradient disappearance.It is urgent to study the intelligent and efficient technology of music databases.This paper studies a deep neural architecture with proper normalization,regularization,and balanced hyperparameter settings for music classification and model training.The main research progress is as follows:(1)To solve the model complexity problem caused by too many layers,an improved method combining CNN and residual learning is proposed for music genre classification.This method uses the Mel spectrogram as input and uses CNN layers of different pooling techniques to provide richer classification information.The convolutional network used in residual learning skips unimportant learning steps,thus avoiding the complexity of the network.The experiments of this model on the GTZAN and FMA datasets show that the classification accuracy of the two datasets is 87.80% and 68.50%,respectively.(2)To solve the problems of network training complexity and classification accuracy,a hybrid model of CNN and RNN based on global layer normalization is proposed.CNN uses mass spectra to automatically extract low-level features,thereby eliminating the need for manual intervention,while RNN performs time aggregation and maintains long-term dependence.Layer normalization effectively replaces traditional batch normalization.Calculating statistical information together with functional dimensions improves the dynamics of the hidden state of music data.The experimental results show that the global normalization joint neural architecture improves the model training.The average accuracy of the model on the GTZAN and FMA data sets is 89.79% and 68.78%,respectively,which improves the classification accuracy.(3)To solve the problem of RNN gradient disappearance in music classification,a hybrid model combining CNN and improved RNN is proposed.The improvements of RNN include long and short-term memory LSTM,bidirectional long and short-term memory Bi-LSTM,gated recurrent unit GRU,and bi-directional gated recurrent unit Bi-GRU.This paper uses MFCC and Mel spectrogram features to compare different network structures and compare the performance of the proposed hybrid model.Experiments based on the GTZAN data set show that the hybrid model of CNN and LSTM has a classification accuracy of 76.40% for MFCC features,while the combination of CNN and Bi-GRU achieves an accuracy of 89.40%.
Keywords/Search Tags:Music Information Retrieval, Music genre classification, Convolutional Neural Network, Recurrent Neural Network, Residual Learning
PDF Full Text Request
Related items