Application Research Of Environmental Sound Classification And Voiceprint Identification Based On Deep Learning

Posted on:2021-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:W P Liao

Full Text:PDF

GTID:2518306470963269

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Both environmental sound classification and voiceprint identification belong to the category of audio information processing.Environmental sound classification is the application of signal processing,computer and artificial intelligence technologies to analyze and process specific environmental sound signals and achieve automatic classification and recognition of sound signals.It is widely used in smart homes,scene analysis and intelligent monitoring;voiceprint identification is to find the person who matches the specific audio voiceprint from the known voiceprint set,which is widely used in criminal detectives,intelligent monitoring,financial security and other fields.With the development of artificial intelligence technology,environmental sound classification and voiceprint identification have attracted more and more attention from industry researchers.Early environmental sound classification adopted signal processing and machine learning methods.With the proposal of convolutional neural network(CNN),the current environmental sound classification mostly uses CNN models,but there are cases that the model structure and network depth are different.And there is a lack of unified guiding principles,which caused great confusion for learners.Compared with the environmental sound classification,the voiceprint data is not compact within the class and has small disparity between different classes.The voiceprint identification technology requires higher demand,and the existing CNN classification function softmax has the problem of poor classification effect.In response to these problems,the main research work of this paper is as follows:1.For the CNN model proposed by Piczak and Zhang in environmental sound classification,the network structure is different.The network depth is too shallow or too deep,which leads to the problem of under-fitting or over-fitting in the model.This paper finds out the CNN model structure and relevant parameter setting principle through comparative experiments.At the same time finding out a relatively superior CNN model to provide a reference for environmental sound classification applications.This article first discusses the number and depth of convolutional layers and pooling layers in the CNN network structure of Piczak and Zhang,then adjusts the size and number of convolutional kernels in the network structure of them,and finally explores optimization effect in different structures and network depths through comparative experiments,so as to obtain the optimal network structure model Changed CNN and its related parameter settings.The comparative experimental results on the Urban Sound8 K public dataset show that the optimized Changed CNN model structure has a certain improved effect.When the network depth and structure settings are moderate,the accuracy rate is improved compared with the Piczak and Zhang network models.2.Aiming at the problem that the built-in softmax classifier of the single CNN voiceprint identification classification model handles the problem of in-class compactness and small inter-class gap voiceprint data,the CNN + Light GBM combined voiceprint identification model is proposed.The combined model uses CNN to extract high-level features of audio data,compares different commonly used classification algorithms,replaces CNN's built-in softmax classifier with Light GBM classifier,and derives the classification algorithm.The comparative experimental results on the public dataset Voxceleb2 show that the classification accuracy of the CNN + Light GBM combination model is not only better than use of a single CNN model,but also better than the CNN + other classification algorithm models,which proves the rationality of the CNN + Light GBM combination model proposed in this paper.

Keywords/Search Tags:

Environmental Sound Classification, Voiceprint Identification, Convolutional Neural Network, LightGBM Model, Mel Frequency Cepstrum Coefficient

PDF Full Text Request

Related items

1	A Research Based On The Technique Of Unit Identification Of The Animal Sound
2	Deep Learning Based Sound Recognition Classification System
3	Research On Systems For Voiceprint Recognition Based On Vector Quantization And Neural Network
4	Research On The Voiceprint Recognition System With Background Noise
5	Study Of Recognition Technology For Abnormal Sound Based On MFCC
6	Environmental Noise Classification System Based On Convolutional Neural Network
7	Research On Environmental Sound Recognition Method Based On Deep Learning
8	Research On Environmental Sound Classification Method Based On Deep Learning
9	The Identification Method Of Cough Signals Based On Mel-Frequency Cepstrum Coefficient
10	Based On The Mel Frequency Cepstrum Coefficient Palmprint Recognition Research