In recent years,speaker recognition has been widely used as one of the important biometric authentication technologies.Since the introduction of the Gaussian mixture model,the speaker recognition technology in silent environments has been improved to a certain extent,but in practical applications,background noise,limited speech duration and other factors limit the performance of the GMM model.At present,short speech speaker recognition technology is still one of the difficulties in the field of speech signal processing.Aiming at this difficulty,this thesis mainly studies the short speech speaker recognition method from the feature selection and model construction,and applies the method to the separation process of mixed speech when two people are speaking at the same time.The main contents of the research are as follows:(1)Aiming at the two problems of background noise interference and limited speech duration,multi-feature fusion is carried out in consideration of the contribution and distinguishability of each dimension of feature parameters.First,using the F-ratio and D-ratio feature evaluation criteria,based on the classifier,calculate the dimensional discrimination coefficient of the feature parameters,and select the dimension of the fusion feature according to the discrimination coefficient.Then conduct experiments on four single feature parameters to analyze the influence of dimension,difference term and fusion mechanism on the recognition effect,and determine the best feature parameter scheme.Finally,through comparative experiments,it is verified that the fusion feature parameters have good noise immunity and robustness in both stationary and non-stationary noise environments,and can still accurately represent speaker personality characteristics in short speech speaker recognition tasks.(2)Construction and comparison of short speech speaker recognition models based on deep learning.Speech is a kind of time series data,so the recurrent neural network is selected as the main structure of the speaker recognition model.Taking into account the phenomenon of inter-character voice,the two-way parameter transfer method is added for model training.This thesis combines the advantages of bi-directional gated recurrent unit(Bi-GRU),attention mechanism(Attention)and block-level feature equalization mechanism(BFE)to build a speaker recognition model based on deep learning——Bi-GRU+Attention+BFE.Through experiments,the impact of the input sequence time step and the structure of the model on the recognition results is analyzed;and compared with the traditional Gaussian mixture model and other deep network models,it proves that the model has good performance in short speech speaker recognition.(3)Analyze the current research difficulties in monophonic two-person speech separation,and study the application of short speech speaker recognition methods in speech separation.Application of short speech speaker recognition method in speech separation.In the process of separating mixed speech when two people are speaking at the same time,the short speech speaker recognition method is used on the reconstructed voiced segments,and according to the recognition results,the voiced segments from the same speaker are combined into a complete auditory stream,so as to get the complete voiced sound of the target speaker.Separate speech matching experiments prove that the speaker recognition model based on deep learning has a strong ability to recognize separated speech and effectively improves the accuracy of separated speech. |