Font Size: a A A

Application Research Of Environmental Sound Classification And Voiceprint Identification Based On Deep Learning

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:W P LiaoFull Text:PDF
GTID:2518306470963269Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Both environmental sound classification and voiceprint identification belong to the category of audio information processing.Environmental sound classification is the application of signal processing,computer and artificial intelligence technologies to analyze and process specific environmental sound signals and achieve automatic classification and recognition of sound signals.It is widely used in smart homes,scene analysis and intelligent monitoring;voiceprint identification is to find the person who matches the specific audio voiceprint from the known voiceprint set,which is widely used in criminal detectives,intelligent monitoring,financial security and other fields.With the development of artificial intelligence technology,environmental sound classification and voiceprint identification have attracted more and more attention from industry researchers.Early environmental sound classification adopted signal processing and machine learning methods.With the proposal of convolutional neural network(CNN),the current environmental sound classification mostly uses CNN models,but there are cases that the model structure and network depth are different.And there is a lack of unified guiding principles,which caused great confusion for learners.Compared with the environmental sound classification,the voiceprint data is not compact within the class and has small disparity between different classes.The voiceprint identification technology requires higher demand,and the existing CNN classification function softmax has the problem of poor classification effect.In response to these problems,the main research work of this paper is as follows:1.For the CNN model proposed by Piczak and Zhang in environmental sound classification,the network structure is different.The network depth is too shallow or too deep,which leads to the problem of under-fitting or over-fitting in the model.This paper finds out the CNN model structure and relevant parameter setting principle through comparative experiments.At the same time finding out a relatively superior CNN model to provide a reference for environmental sound classification applications.This article first discusses the number and depth of convolutional layers and pooling layers in the CNN network structure of Piczak and Zhang,then adjusts the size and number of convolutional kernels in the network structure of them,and finally explores optimization effect in different structures and network depths through comparative experiments,so as to obtain the optimal network structure model Changed CNN and its related parameter settings.The comparative experimental results on the Urban Sound8 K public dataset show that the optimized Changed CNN model structure has a certain improved effect.When the network depth and structure settings are moderate,the accuracy rate is improved compared with the Piczak and Zhang network models.2.Aiming at the problem that the built-in softmax classifier of the single CNN voiceprint identification classification model handles the problem of in-class compactness and small inter-class gap voiceprint data,the CNN + Light GBM combined voiceprint identification model is proposed.The combined model uses CNN to extract high-level features of audio data,compares different commonly used classification algorithms,replaces CNN's built-in softmax classifier with Light GBM classifier,and derives the classification algorithm.The comparative experimental results on the public dataset Voxceleb2 show that the classification accuracy of the CNN + Light GBM combination model is not only better than use of a single CNN model,but also better than the CNN + other classification algorithm models,which proves the rationality of the CNN + Light GBM combination model proposed in this paper.
Keywords/Search Tags:Environmental Sound Classification, Voiceprint Identification, Convolutional Neural Network, LightGBM Model, Mel Frequency Cepstrum Coefficient
PDF Full Text Request
Related items