Font Size: a A A

Application Of Multi-scale Spectral Image Feature And Multi-task Learning In Audio Authentication

Posted on:2023-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:L F ZhongFull Text:PDF
GTID:2558306914952729Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of multimedia technology,the problem of malicious audio forgery is becoming increasingly serious.At the same time,with the complexity of audio recognition scenarios,a single model learning task can no longer meet the needs of practical application scenarios,so the model multi-task learning arises at the historic moment.At the same time of reducing model complexity,model migrability becomes more and more important.Therefore,it is of great significance and application value to accurately and efficiently identify multi-task requirements such as audio truth-falsehood and audio speaker.Based on this,this paper uses the audio data provided by Ali Tianchi(named dataset 1 and 2 respectively)for empirical research,and establishes a multi-scale spectrum image feature and multi-task learning EfficientNet model for audio recognition in multi-task scenarios.The specific contents are as follows:In chapter 1,introduces the research background and significance of this paper,and summarizes the research status at home and abroad from two aspects of acoustic characteristics and audio recognition classification model.In chapter 2,mainly introduces the deep neural network model and acoustic characteristics used in this paper,and finally introduces the relevant principles of evaluation indicators.In chapter 3,descriptive statistical analysis is made on the audio data provided by Ali Tianchi.Based on the results of audio data analysis,the audio segmentation,noise reduction and silence removal,audio feature extraction and spectral image conversion,EfficientNet model building,audio multi-scale fusion and integration system model was designed.In chapter 4,the simulation details related to the experiment are introduced.Experiments were conducted to evaluate the ability of different spectral image features and different scale segmentation integration to detect different models of audio.The experimental results show that:(1)Compared with the multi-scale fusion resnet-152,Inception-V3 and Inception-Resnet-V2 models,the multi-scale fusion efficientnet-L2 has the optimal effect in multi-task scenarios.For dataset 1(dataset 2),the accuracy of binary classification task was improved by 8.26%,9.64%and 2.46%(12.80%,11.60%and 2.44%),respectively.The accuracy of multi-classification task increased by 9.61%,8.56%,1.97%(12.80%,11.60%,2.44%),respectively.(2)Multi-scale segmentation and fusion technology can effectively extract audio features in a period of time,ensure the consistency of audio feature distribution between training set and test set,and effectively improve the accuracy of model audio recognition.Finally,this paper summarizes the research and prospects the future research work.
Keywords/Search Tags:Audio authentication, Multi-task learning, Multi-scale segmentation, Spectral image feature, EfficientNet model
PDF Full Text Request
Related items