Font Size: a A A

Research On Sound Classification Model In Few-shot Scene

Posted on:2021-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Y FanFull Text:PDF
GTID:2518306467971779Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Sound signal classification is one of the basic technologies in the field of signal processing.Widely used in natural language processing,multimedia technology and other fields,occupying an important position.After the appearance of EU's "General Data Protection Regulation" and other regulations,sound signal samples have become difficult to collect.The development of Internet,big data,5G communications in the meanwhile have brought sound data into a new period of rapid iteration.Under this background,few-shot scenes have gradually become the research focus of sound signal classification.Convolutional neural network(CNN)or the comprehensive use of mathematics,image,acoustics and other related technologies to build classification model is the current mainstream method in the field of few-shot sound signal classification,but the classification accuracy of existing methods is still not high.In view of the shortcomings in the field of few-shot sound signal classification,this paper mainly does the following work and innovations:(1)Discuss the physiology basis of Mel filter and point out its shortcomings of the naive process,analyze the cause of attenuation in deep neural network under few-shot,and verify the existence of this problem in the ESC dataset on the 10-layer and 24-layer convolutional neural network.(2)Based on Mel filtering,propose Adaptive Mel Filter(AMF)algorithm to optimize the variable parameters of naive Mel filtering process with "back propagation" and extract the Mel spectrum with higher discrimination.(3)Using the Deep Residual Network(DRN)as framework,using fine-tuning and migration to adjust the input and output layers according to the actual needs of the sound signal classification,while appropriately adding the pooling and full-connected layers,and using the weight of DRN on the image net as initial weight to establish Transfer Depth Residual Network(TDRN)for spectrum classification.(4)Using AMF algorithm as the front spectrum extraction module,and TDRN as the post spectrum classification module,combined the two to construct Adaptive Mel Filter-Transfer Depth Residual Network(AMF-TDRN)mode for few-shot sound signal classification.Introduce the datasets of ESC-10 and music speech to simulate equal time multi-classification and equal time two-class classification few-shot scenes,and mix ESC-10 and music speech to generate music speech&ESC-10 dataset to simulate unequal time multi-class few-shot scenes.Using MF-TDRN,AMF-inception v3,10 layers CNN,MVGG16,m-mobile net,PEFBEs,CRBM as references models,perform verification and control experiments in various few-shot scenarios.The experimental results show that the classification accuracy of AMF-TDRN model in each scenario is 91.14%,96.00% and 95.24%,which have improved to different degrees compared with other models and have certain application value.
Keywords/Search Tags:Sound signal classification, Adaptive Mel Filter, Deep residual network, Mel Spectrum, Few-shot scene, Transfer and fine tuning
PDF Full Text Request
Related items