Research On Sound Classification Model In Few-shot Scene

Posted on:2021-11-19

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Fan

Full Text:PDF

GTID:2518306467971779

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Sound signal classification is one of the basic technologies in the field of signal processing.Widely used in natural language processing,multimedia technology and other fields,occupying an important position.After the appearance of EU's "General Data Protection Regulation" and other regulations,sound signal samples have become difficult to collect.The development of Internet,big data,5G communications in the meanwhile have brought sound data into a new period of rapid iteration.Under this background,few-shot scenes have gradually become the research focus of sound signal classification.Convolutional neural network(CNN)or the comprehensive use of mathematics,image,acoustics and other related technologies to build classification model is the current mainstream method in the field of few-shot sound signal classification,but the classification accuracy of existing methods is still not high.In view of the shortcomings in the field of few-shot sound signal classification,this paper mainly does the following work and innovations:(1)Discuss the physiology basis of Mel filter and point out its shortcomings of the naive process,analyze the cause of attenuation in deep neural network under few-shot,and verify the existence of this problem in the ESC dataset on the 10-layer and 24-layer convolutional neural network.(2)Based on Mel filtering,propose Adaptive Mel Filter(AMF)algorithm to optimize the variable parameters of naive Mel filtering process with "back propagation" and extract the Mel spectrum with higher discrimination.(3)Using the Deep Residual Network(DRN)as framework,using fine-tuning and migration to adjust the input and output layers according to the actual needs of the sound signal classification,while appropriately adding the pooling and full-connected layers,and using the weight of DRN on the image net as initial weight to establish Transfer Depth Residual Network(TDRN)for spectrum classification.(4)Using AMF algorithm as the front spectrum extraction module,and TDRN as the post spectrum classification module,combined the two to construct Adaptive Mel Filter-Transfer Depth Residual Network(AMF-TDRN)mode for few-shot sound signal classification.Introduce the datasets of ESC-10 and music speech to simulate equal time multi-classification and equal time two-class classification few-shot scenes,and mix ESC-10 and music speech to generate music speech&ESC-10 dataset to simulate unequal time multi-class few-shot scenes.Using MF-TDRN,AMF-inception v3,10 layers CNN,MVGG16,m-mobile net,PEFBEs,CRBM as references models,perform verification and control experiments in various few-shot scenarios.The experimental results show that the classification accuracy of AMF-TDRN model in each scenario is 91.14%,96.00% and 95.24%,which have improved to different degrees compared with other models and have certain application value.

Keywords/Search Tags:

Sound signal classification, Adaptive Mel Filter, Deep residual network, Mel Spectrum, Few-shot scene, Transfer and fine tuning

PDF Full Text Request

Related items

1	Research On Sound Scene Classification Based On Deep Learning
2	Residual Network And Its Variant Network For Acoustic Scene Classification
3	Microfossils Image Few Shot Recongition Based On Deep Residual Network And Transfer Learning
4	Deep Learning Based Sound Recognition Classification System
5	Audio Scene Classification Based On Deep Learning
6	Target Tracking Based On ResNet Motion Scene Classification
7	Research On Imbalanced Fine-grained And Few-shot Image Classification Based On Deep Learning
8	Research On Fine-grained Image Classification Based On Deep Residual Network
9	A Method Of Environmental Sound Classification Based On Residual Networks And Data Augmentation
10	Research On Audio Recognition Method Based On Residual Network And Random Forest