Font Size: a A A

Acoustic Scene Classification Under Adverse Conditions Of Multi-channel And Unbalanced Data

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:B Y WuFull Text:PDF
GTID:2428330611998185Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sound signals are ubiquitous in life,and people have been trying to make use of them in the field of artificial intelligence.With the development of deep learning technology,the ability of machine sound signal processing is constantly improving.Acoustic scene classification,an emerging research field of signal processing,has received attention increasingly in recent years,and it has been effectively used in applications such as situational awareness.In daily life,there are many types of recording devices.Due to the differences in the channels of these devices,the audio data recorded by them may be different even when they are used in the same place at the same time.In addition,the amount of sound data obtained by using different devices may also be different for a variety of reasons,thus there is still a problem of data imbalance behind it.The effects of channel differences and data imbalances bring great difficulties to the research of acoustic scene classification tasks.This paper will focus on addressing the problem of robust acoustic scene classification under the adverse condition of the multi-channel and unbalanced data.Firstly,a convolutional network acoustic scene classification method based on FBank features is proposed.We extract FBank acoustic features for all sound data regardless of device types,and classify data using convolutional neural networks.Here the cross entropy is used as the loss function of the model.This method serves as the baseline system for this paper.Secondly,an acoustic scene classification method based on channel-independent embedding features is proposed.This method uses the obtained parallel data pairs recorded simultaneously by different devices.The parallel data pairs contain same semantic information,but is coming from different channel.Because of this,the FBank feature extracted from the original audio data will contain the channel information for different devices.As a result,this information is not only unrelated to the acoustic scene,but also will affect the classification accuracy of the model.Therefore,the mean square error we use in this method is the difference between the embedding features of the parallel data pairs,where the embedding feature is the output of the last convolutional layer in the model of the baseline system.And we use both weighted sum of the mean square error and the cross-entropy result as the loss function of the model.The proposed method achieves performance improvement on all devices compared to the previous baseline system.Finally,a multi-channel acoustic scene classification method based on transfer learning is proposed.After the model is trained by the large dataset to learn enough knowledge,the network of embedding feature extraction with superior performance and the network of classification of embedding features are obtained.Then,the network for extracting features is copied and uses transfer learning on the dataset with small amount of data for the same purpose.When training on small dataset,we only finetune the embedding feature extraction network,while the other part of network parameters remain fixed.For this part,we use the mean square error to measure the difference between the embedding features of the input parallel data pairs,KL divergence to measure the similarity of probability distribution of parallel data pairs embedding features prediction.Lastly,we combine the weighted sum of the mean square error,KL divergence,and cross-entropy as our final loss function.Compared with the baseline system,the scene classification accuracy of the dataset with less data is significantly improved.
Keywords/Search Tags:Acoustic Scene Classification, multi-channel unbalanced data, parallel data pairs, embedding features, transfer learning
PDF Full Text Request
Related items