Font Size: a A A

Research On Voice Activity Detection And Low-Dimensional Vector Extraction For Speaker Recognition

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2428330620965633Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the maturity of speaker recognition technology,voiceprint features have been widely used in the field of information security as the object of identity authentication.An important front-end task for implementing speaker recognition is the speaker diarization.The main method is to do separation and noise reduction on original speech,segmenting and clustering of speech segments belonging to the same speaker in a multi-speaker context.After comparing the speech segment to be recognized with the information of the known speech segment,it can be confirmed which speaker belongs to this speech segment.There are already many speaker recognition methods based on neural networks,but the recognition accuracy still needs to be improved.The main research goal of this thesis is to analyze the key influencing factors of speaker recognition based on neural network and its improvement methods.To this end,this thesis first explores a speaker segmentation system based on a time-delay neural network to extract low-dimensional vectors.Through experimental results,it is found that improved voice activity detection and adjusting boundary re-segmentation can reduce the segmentation error rate,the enhancement of training data set and adjustment of neural network parameters will improve the system robustness.In the following speaker recognition system,we focus on exploring methods of neural network-based voice activity detection and weakening neural network overfitting,including:(1)In the data preprocessing stage,a network based on a combination of convolutional neural networks,long-term and short-term memory networks,and deep neural networks is proposed to improve voice activity detection.This method allows speech to be modeled simultaneously in frequency and time.The experimental results show that effective segmentation of speech discontinuities significantly improves speaker recognition rate and reduces speaker diarization rate.(2)Aiming at the phenomenon of overfitting in speaker diarization system,the method of neural network extracting low-dimensional vectors is improved.By increasing the dimension reduction layer to reduce the parameter size,the gradient flow is enhanced by layer-by-layer connection to the network.The experimental results have been verified in both systems.Compared with the original neural network and the traditional Gaussian model,the diarization performance and recognition performance are significantly improved.
Keywords/Search Tags:speaker recognition, speaker diarization, voice activity detection, low-dimensional vector extraction
PDF Full Text Request
Related items