Font Size: a A A

Research Of Speaker Recognition Technology Based On Fusion Features

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:B ZouFull Text:PDF
GTID:2428330614963962Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech is one of the most convenient medium to obtain and convey messages in human daily life.It carries rich information.Since the structure of each person's vocal tract is determined at birth and each person is unique,it makes the biotechnology of using the speech to identify people become a reality.In the field of biometrics,two important factors affecting the recognition results are the classification model and characteristic parameters.Once the model is selected,the recognition performance mainly depends on the selection of feature parameters.High quality feature parameters can not only reduce the probability of system misjudgment,but also shorten the time of training model and classification.Therefore,it is an important research subject to extract features highly related to speaker's identity from the speech signal.Aiming at the above problems,a speaker recognition system based on deep and shallow fusion features and a speaker recognition system based on optimized weight coefficient fusion features are proposed in this paper.The main researches are as follows:(1)Firstly,the research background and significance of speaker recognition are fully understood,and then the basic knowledge of speaker recognition is elaborated.It mainly include the speech preprocessing technology,the specific extraction process of acoustic feature parameter MFCC,and the principle of SVM classifier.In addition,in order to remove redundant information that may be contained in the features,two feature selection strategies are introduced,which provide technical basis for subsequent research.(2)In order to find a more robust feature that can fully represent the speaker's identity information,a speaker recognition system based on deep and shallow fusion features is proposed after a deep study of GMM and DNN.The traditional feature represents the physical information about the channel structure,which is a shallow representation.While the DNN mines deeper feature and it's a more abstract description.The fusion of the two types of features can make the performance complementary.In this method,the MFCC parameters are extracted firstly,and then two different branches are used for further processing.On the one hand,it is input into the DNN to extract depth features and then deep Gaussian supervector is obtained through GMM.On the other hand,traditional Gaussian supervector are obtained directly through GMM.Finally,a new feature is formed by combining the two features horizontally,which is used to train SVM and identify the speaker.The experimental results show that the proposed fusion feature can effectively improve the recognition rate.(3)When the number of speakers increases,the system's recognition rate will decrease.In addition,in the speaker recognition system based on fused features,different features have different contribution to the final recognition results.In order to measure it more accurately,a speaker recognition system based on fused features of optimized weight coefficient are proposed through the study of two optimization algorithms.Before the three types of features are fused,GA or SA algorithm is used to optimize the weight coefficients,and then each feature is multiplied by its own coefficients to construct a new speaker recognition system.The experimental results show that the recognition performance with weighted fusion feature is better than the feature when they are fused directly.
Keywords/Search Tags:speaker recognition, Deep Neural Network, Gaussian Mixture Model, feature fusion, Genetic Algorithm, Simulate Anneal
PDF Full Text Request
Related items