Research Of Audio Alassification Algorithms Based On Convolutional Neural Network And Its Applications

Posted on:2022-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhu

Full Text:PDF

GTID:2518306614454534

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

As an important data processing method in machine learning,classification has been widely used in many fields.Among them,audio classification has always been a hot spot and difficulty in research,and an effective method to solve this difficulty is convolutional neural network.However,most excellent convolutional neural networks are applied to image tasks,and the performance of transferring models applied to image tasks to audio classification tasks is not expected.Therefore,it is the primary task to improve the performance of audio classification to design the model according to the characteristics of audio data.For the audio type classification task,it is reasonable to use more mature models in the image field,but the models in the image task are all applied to the image task or are trained based on the image data set and not designed for the characteristics of audio data,so there are certain problems:1.Models designed for image pattern recognition tasks cannot capture the features of audio data correctly because audio data is different from image data.2.When separating voice and audio data,too few peak trajectory features are obtained,leading to low classification performance.In order to solve the above problems,this paper focuses on the audio classification algorithm based on convolutional neural network and makes some improvements to it.The main research work and achievements are as follows:(1)Aiming at the problem that the image task model cannot extract the features of audio Mayer spectrum graph effectively,a time-frequency bidirectional audio classification algorithm based on convolutional neural network was proposed.In the original audio classification algorithm,researchers used the model used in the image task.The model in the image task was designed for the image data,but the audio data was different from the image data.As a result,the model could not extract the audio features effectively,resulting in the performance bottleneck of the algorithm.To overcome this problem,a supervised timbre module and a supervised time module are designed in this algorithm.The two modules focus on the time axis and frequency axis of the audio Mayer spectrum respectively,so as to fully extract audio features.In addition,we also added the attention module to pay attention to the channel information of the feature matrix.Finally,we calculated the loss of the network and fed it back to the supervised timbre module,supervised time module and attention module for updating.We apply the algorithm to music data sets GTZAN and Dortmund,dance music data set Ballroom,extended dance music data set ExtendBallroom,and environmental sound data set UrbanSound8K.Experimental results show that the algorithm can effectively extract the features of audio spectrum.The classification accuracy is high.(2)Aiming at the problem that only maximum peak trajectory features of spectrum graph can not represent spectrum graph effectively,an audio classification algorithm based on maximum peak and valley trajectory of spectrum graph is proposed.Audio classification algorithm based on maximum peak spectral trajectory while effectively focus on the audio dropped by the peak moment,but only attention peak trajectory cannot effectively use the audio,so we add the minimum valley track of the Mel-spectrum,and extend the maximum peak feature of Mel-spectrum to the maximum minimum peak valley feature of Mel-spectrum.After the peak trajectory algorithm is used to calculate the peak trajectory,on the basis of calculating the maximum peak trajectory,the minimal peak-valley trajectory is calculated,which is connected to become the final feature matrix,and then the feature matrix is input into the model to obtain the classification results.We apply the algorithm to GTZAN Music/Speech Collection Experimental results show that the proposed algorithm has a high classification accuracy on the mixed data sets of Scheirer-Slaney Music-Speech Corpus MUSAN.(3)On the basis of the previous two works,we design and implement an audio classification system based on convolutional neural network,aiming at audio recognition and music classification.The system uses the best model parameters of the above algorithms.In this system,users can upload local audio files to the system memory,and then the system extracts and normalizes the audio files in the memory to get the feature matrix.After that,the user calls the audio classification module of the system to classify the audio.Firstly,the system calls the audio classification algorithm based on spectrum graph minimax peak-valley trajectory to identify the audio music and identify the music files in the audio.Then,the system calls the time-frequency domain bidirectional audio classification algorithm based on convolutional neural network to classify music files.The system combines the two audio classification algorithms proposed by us to realize the audio data classification function.

Keywords/Search Tags:

Audio classification, Mel-Spectrogram, Convolutional neural network, Deep learning, Supervise, Time-frequency domain, Peak track, Attention

PDF Full Text Request

Related items

1	Audio Scene Classification Based On Deep Learning
2	Research And Implementation Of Audio Alignment Based On Deep Learning
3	Research On Attention Based Image Classification With Deep Learning
4	Abnormal Audio Detection Based On Deep Learning
5	Research On Time Series Data Classification Methods Based On Deep Learning
6	Research Of Domain Adaptation Methods Based On Deep Convolutional Neural Networks
7	Research On Fine-grained Image Classification Based On Deep Convolutional Neural Network And Dual-domain Attention Mechanism
8	An Effective Audio Classification Method Based On Data Augmentation Strategy
9	Research On Affective Classification Of Commodity Comments Based On Deep Learning Of Attention
10	Research On Text Classification Based On Deep Learning