Font Size: a A A

Research On Speaker Recognition Algorithm Based On Deep Neural Network

Posted on:2020-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2428330596995035Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition is a biometric-based identity authentication technology,also known as voiceprint authentication.It distinguishes the identity of different speakers according to the individual information and characteristics of different speakers' phonetics.Because speaker recognition technology has the advantages of good expansibility,simple equipment,low cost and easy acceptance by users,it can be widely used in the fields of banking system,criminal investigation,national defense and military affairs and so on.With the development of more than half a century,speaker recognition has made a good development,and a large number of products have come out,However,the existing speaker recognition system still has some problems,such as sensitivity to environmental noise and low security,which need to be solved and improved.By virtue of a large number of Gauss probability density curves,the Gauss mixture model achieves a better classification of speaker speech features,and achieves good recognition performance in the field of speaker recognition.It is a classical recognition model in the field of speaker.However,experiments show that the Gauss mixture model depends on the scale of the speaker's voice.Higher recognition performance often requires a large number of speaker's voice trained to achieve.And it is sensitive to environmental noise and has poor robustness.Deep learning is a pattern recognition technology developed in recent years.It has made great breakthroughs in image classification and recognition,and has the ability of autonomous learning,which can continuously optimize the extracted features according to the target.Therefore,it can extract deep speaker speech features which are insensitive to environmental noise.And with the powerful ability of pattern classification,the extracted speaker speech feature parameters can be well classified and recognized.In order to improve the performance of speaker recognition system,in this paper,deep learning technology is introduced into feature parameter extraction and feature parameter modeling and recognition in speaker recognition.The main work is as follows:(1)The basic technology of speech recognition is introduced.Speech preprocessing in speaker recognition includes speech denoising,endpoint detection,windowing and framing.This paper mainly introduces two kinds of speech enhancement technology,and makes an experimental comparison.The importance of endpoint detection and the endpoint detection technology adopted in this paper are introduced.The importance and necessity of windowing and framing are introduced.The common speaker speech feature parameters are introduced,and the classical MFCC feature parameters are deduced in detail.The mainstream speaker recognition model is introduced.(2)Bottleneck feature extraction and recognition performance verification based on deep neural network are studied.In this paper,deep learning is introduced into the extraction of speaker speech parameters,and the basic principle and extraction process of bottleneck feature are described in detail.Combining with Gauss mixture model,bottleneck feature is applied to speaker recognition.Detailed experiments verify the performance improvement of bottleneck feature relative to MFCC feature parameters.(3)Aiming at the weak robustness of Gauss mixture model,the performance of composite features based on bottleneck feature and MFCC feature parameters in deep neural network is studied.In this paper,two different fusion methods of speaker feature parameters are studied,and they are combined with different depth neural networks.The good performance of the compound feature in the back-end classification and recognition of the deep neural network is verified.The experimental results show that this recognition method has a great improvement in the anti-noise ability and security.
Keywords/Search Tags:Speaker recognition, Gauss mixture model, Deep Learning, bottleneck feature
PDF Full Text Request
Related items