Application Of Multi-scale Spectral Image Feature And Multi-task Learning In Audio Authentication

Posted on:2023-09-25

Degree:Master

Type:Thesis

Country:China

Candidate:L F Zhong

Full Text:PDF

GTID:2558306914952729

Subject:Applied Statistics

Abstract/Summary:

With the development of multimedia technology,the problem of malicious audio forgery is becoming increasingly serious.At the same time,with the complexity of audio recognition scenarios,a single model learning task can no longer meet the needs of practical application scenarios,so the model multi-task learning arises at the historic moment.At the same time of reducing model complexity,model migrability becomes more and more important.Therefore,it is of great significance and application value to accurately and efficiently identify multi-task requirements such as audio truth-falsehood and audio speaker.Based on this,this paper uses the audio data provided by Ali Tianchi(named dataset 1 and 2 respectively)for empirical research,and establishes a multi-scale spectrum image feature and multi-task learning EfficientNet model for audio recognition in multi-task scenarios.The specific contents are as follows:In chapter 1,introduces the research background and significance of this paper,and summarizes the research status at home and abroad from two aspects of acoustic characteristics and audio recognition classification model.In chapter 2,mainly introduces the deep neural network model and acoustic characteristics used in this paper,and finally introduces the relevant principles of evaluation indicators.In chapter 3,descriptive statistical analysis is made on the audio data provided by Ali Tianchi.Based on the results of audio data analysis,the audio segmentation,noise reduction and silence removal,audio feature extraction and spectral image conversion,EfficientNet model building,audio multi-scale fusion and integration system model was designed.In chapter 4,the simulation details related to the experiment are introduced.Experiments were conducted to evaluate the ability of different spectral image features and different scale segmentation integration to detect different models of audio.The experimental results show that:(1)Compared with the multi-scale fusion resnet-152,Inception-V3 and Inception-Resnet-V2 models,the multi-scale fusion efficientnet-L2 has the optimal effect in multi-task scenarios.For dataset 1(dataset 2),the accuracy of binary classification task was improved by 8.26%,9.64%and 2.46%(12.80%,11.60%and 2.44%),respectively.The accuracy of multi-classification task increased by 9.61%,8.56%,1.97%(12.80%,11.60%,2.44%),respectively.(2)Multi-scale segmentation and fusion technology can effectively extract audio features in a period of time,ensure the consistency of audio feature distribution between training set and test set,and effectively improve the accuracy of model audio recognition.Finally,this paper summarizes the research and prospects the future research work.

Keywords/Search Tags:

Audio authentication, Multi-task learning, Multi-scale segmentation, Spectral image feature, EfficientNet model

Related items

1	High Resolution Remote Sensing Image Multi-scale Segmentation Support By Spectral Graph Theory
2	Multi-task Semantic Segmentation Method Based On Attention And Feature Fusion
3	Research On Image Pixel-level Multi-visual Task Learning
4	Research On Image Segmentation Based On Multi-task Learning Deep Neural Networks
5	3D Model Classification Based On EfficientNet And Multi-Feature Fusion
6	Study On The Multi-scale And Multi-feature Image Segmentation Algorithms
7	Study On Hierarchical Multi-task Learning Algorithms For Large-scale Image Classification
8	Application Of Multi-Task Based Audio Feature Extraction In Audio Captioning System
9	Study Of Image Segmentation Methods Based On Multiscale Fast Spectral Clustering
10	Research On Semantic Segmentation Based Occluded Person Re-identification Methods