Font Size: a A A

Research On Speaker Recognition Based On Deep Learning

Posted on:2017-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:L LvFull Text:PDF
GTID:2348330491462761Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of human society and the way of people's interaction becoming more variable, as one of the special biological feature human own, acoustic information plays an important role in recognition or verification of human identity, which is called the most natural feature of human by some researchers. The technology of speaker recognition or voiceprint recognition is based on human's acoustic feature, which has had some significant applications in many domains, such as the Internet, military safety, remote control, communication system, access control system and so on.The neural network has developed rapidly since the 80s of last century. In recent years, the theory of deep learning proposed by Hinton, who is the professor of Toronto university, has got good results in image recognition domain. Its recognition rate on MNIST handwritten figures has proved to be near 99 percent. Deep learning is a way based on deep or more-layer neural network. Compared to traditional neural network, it has overcome the shortcuts that is easy to fall into local minimum on non-convex function and has the advantage of learning features from down to up. This paper has proposed comprehensive comparison and analysis on speaker recognition technology with the algorithm of feedforward network, Auto-encoder, Deep Belief Network respectively.The main works are as follows:Firstly, the research of speaker recognition in history and its current status is discussed. Then advantages and shortcuts of different technologies are analyzed, which indicates that Neural Network and Deep Learning are new frontiers in speaker recognition.Secondly, research on pre-processing work of speaker recognition model is studied, including enframing, MFCC, models and algorithms. The figuration of MFCC is specified.Thirdly, the use of Feed-Forward Neural Networks (FFNN) on speaker recognition and how the number of layers and neurons in network affect the recognition rate are tested. Also, the union of Gaussian Mixture Model and FFNN is proposed, which has improved the recognition rate and model's robustness. The Neural Network here works in the space of GMM's probability, which helps to capture the co-information between different speakers.Forthly, the use of deep learning on speaker recognition is studied. Mainly two deep models are analyzed, which are Auto-encoder and Deep Belief Network. It has been proved that deep learning is better than general FFNN in the domain of speaker recognition. The Hybrid Denosing Auto-encoder and Restricted Boltzmann Machine Model is firstly proposed. Its efficiency in different compound states is studied, which shows that using Auto-encoder in low level and Restricted Boltzmann Machine combines their advantages, which leads to higher recognition rate. Also, its performance becomes better with the network becomes deeper.Fifthly, Rectifier Linear Unit is used to replace sigmoid function in Deep Neural Network which is proved to enhance the system. The system's performance is tested respectively with pre-training and without pre-training. The experiment shows that the training speed of deep model grows rapidly with the help of Rectifier Linear Unit. Also, from the aspect of sparseness, deep model without pre-training using Rectifier Linear Unit can achieve the same sparseness as the deep model with pre-training, which explains its higher recognition rate than that with sigmoid function, even close to deep model with pre-training. But the combination between Rectifier Linear Unit and the deep model with pre-training performs not very well, which deserves our further research.
Keywords/Search Tags:Speaker Recognition, Neural Network, Auto-encoder, Deep Belief Network, Deep Learning
PDF Full Text Request
Related items