Font Size: a A A

Research Of Audio Alassification Algorithms Based On Convolutional Neural Network And Its Applications

Posted on:2022-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhuFull Text:PDF
GTID:2518306614454534Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As an important data processing method in machine learning,classification has been widely used in many fields.Among them,audio classification has always been a hot spot and difficulty in research,and an effective method to solve this difficulty is convolutional neural network.However,most excellent convolutional neural networks are applied to image tasks,and the performance of transferring models applied to image tasks to audio classification tasks is not expected.Therefore,it is the primary task to improve the performance of audio classification to design the model according to the characteristics of audio data.For the audio type classification task,it is reasonable to use more mature models in the image field,but the models in the image task are all applied to the image task or are trained based on the image data set and not designed for the characteristics of audio data,so there are certain problems:1.Models designed for image pattern recognition tasks cannot capture the features of audio data correctly because audio data is different from image data.2.When separating voice and audio data,too few peak trajectory features are obtained,leading to low classification performance.In order to solve the above problems,this paper focuses on the audio classification algorithm based on convolutional neural network and makes some improvements to it.The main research work and achievements are as follows:(1)Aiming at the problem that the image task model cannot extract the features of audio Mayer spectrum graph effectively,a time-frequency bidirectional audio classification algorithm based on convolutional neural network was proposed.In the original audio classification algorithm,researchers used the model used in the image task.The model in the image task was designed for the image data,but the audio data was different from the image data.As a result,the model could not extract the audio features effectively,resulting in the performance bottleneck of the algorithm.To overcome this problem,a supervised timbre module and a supervised time module are designed in this algorithm.The two modules focus on the time axis and frequency axis of the audio Mayer spectrum respectively,so as to fully extract audio features.In addition,we also added the attention module to pay attention to the channel information of the feature matrix.Finally,we calculated the loss of the network and fed it back to the supervised timbre module,supervised time module and attention module for updating.We apply the algorithm to music data sets GTZAN and Dortmund,dance music data set Ballroom,extended dance music data set ExtendBallroom,and environmental sound data set UrbanSound8K.Experimental results show that the algorithm can effectively extract the features of audio spectrum.The classification accuracy is high.(2)Aiming at the problem that only maximum peak trajectory features of spectrum graph can not represent spectrum graph effectively,an audio classification algorithm based on maximum peak and valley trajectory of spectrum graph is proposed.Audio classification algorithm based on maximum peak spectral trajectory while effectively focus on the audio dropped by the peak moment,but only attention peak trajectory cannot effectively use the audio,so we add the minimum valley track of the Mel-spectrum,and extend the maximum peak feature of Mel-spectrum to the maximum minimum peak valley feature of Mel-spectrum.After the peak trajectory algorithm is used to calculate the peak trajectory,on the basis of calculating the maximum peak trajectory,the minimal peak-valley trajectory is calculated,which is connected to become the final feature matrix,and then the feature matrix is input into the model to obtain the classification results.We apply the algorithm to GTZAN Music/Speech Collection Experimental results show that the proposed algorithm has a high classification accuracy on the mixed data sets of Scheirer-Slaney Music-Speech Corpus MUSAN.(3)On the basis of the previous two works,we design and implement an audio classification system based on convolutional neural network,aiming at audio recognition and music classification.The system uses the best model parameters of the above algorithms.In this system,users can upload local audio files to the system memory,and then the system extracts and normalizes the audio files in the memory to get the feature matrix.After that,the user calls the audio classification module of the system to classify the audio.Firstly,the system calls the audio classification algorithm based on spectrum graph minimax peak-valley trajectory to identify the audio music and identify the music files in the audio.Then,the system calls the time-frequency domain bidirectional audio classification algorithm based on convolutional neural network to classify music files.The system combines the two audio classification algorithms proposed by us to realize the audio data classification function.
Keywords/Search Tags:Audio classification, Mel-Spectrogram, Convolutional neural network, Deep learning, Supervise, Time-frequency domain, Peak track, Attention
PDF Full Text Request
Related items