Research On Voice Activity Detection And Low-Dimensional Vector Extraction For Speaker Recognition

Posted on:2021-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Zhang

Full Text:PDF

GTID:2428330620965633

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the maturity of speaker recognition technology,voiceprint features have been widely used in the field of information security as the object of identity authentication.An important front-end task for implementing speaker recognition is the speaker diarization.The main method is to do separation and noise reduction on original speech,segmenting and clustering of speech segments belonging to the same speaker in a multi-speaker context.After comparing the speech segment to be recognized with the information of the known speech segment,it can be confirmed which speaker belongs to this speech segment.There are already many speaker recognition methods based on neural networks,but the recognition accuracy still needs to be improved.The main research goal of this thesis is to analyze the key influencing factors of speaker recognition based on neural network and its improvement methods.To this end,this thesis first explores a speaker segmentation system based on a time-delay neural network to extract low-dimensional vectors.Through experimental results,it is found that improved voice activity detection and adjusting boundary re-segmentation can reduce the segmentation error rate,the enhancement of training data set and adjustment of neural network parameters will improve the system robustness.In the following speaker recognition system,we focus on exploring methods of neural network-based voice activity detection and weakening neural network overfitting,including:(1)In the data preprocessing stage,a network based on a combination of convolutional neural networks,long-term and short-term memory networks,and deep neural networks is proposed to improve voice activity detection.This method allows speech to be modeled simultaneously in frequency and time.The experimental results show that effective segmentation of speech discontinuities significantly improves speaker recognition rate and reduces speaker diarization rate.(2)Aiming at the phenomenon of overfitting in speaker diarization system,the method of neural network extracting low-dimensional vectors is improved.By increasing the dimension reduction layer to reduce the parameter size,the gradient flow is enhanced by layer-by-layer connection to the network.The experimental results have been verified in both systems.Compared with the original neural network and the traditional Gaussian model,the diarization performance and recognition performance are significantly improved.

Keywords/Search Tags:

speaker recognition, speaker diarization, voice activity detection, low-dimensional vector extraction

PDF Full Text Request

Related items

1	Design And Implementation Of Speaker Diarization System
2	Research On A New Method Of Speaker Verification
3	Research On Feature Extraction And Model Algorithm For Speaker Recognition
4	Study Of Extraction And Optimization Characteristic Parameters In Speaker Recognition
5	The Reserch And Application Of Speaker Detection And Tracking Technology
6	The Research On Technology Of Building A Speaker Recognition Database
7	Research On Speaker Log System Based On Bayesian Method
8	Research Of Speaker Recognition System Based On Mixed Festure Parameters And GMM-UBM
9	Research And Implementation On Speaker Recognition
10	Research On Single-Channel End-to-End Target Speech Extraction Models