Font Size: a A A

Research On Text-Independent Speaker Recognition

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2428330623968110Subject:Navigation, guidance and control
Abstract/Summary:PDF Full Text Request
Speaker recognition,also known as voiceprint recognition,refers to extracting identity information from human voices to achieve identification of people.Compared with the ideal conditions in the laboratory,the problem of speaker recognition in actual application scenarios is that the cross-channel recognition results are poor,and in practical applications,in order to facilitate audio collection,the sample size used to train the voiceprint model is small.Therefore,in order to apply speaker recognition to the engineering application of multi-device audio collection,training,and testing in a smart home scenario,it is necessary to focus on speaker recognition in the case of a small sample size.First,this paper establishes a database suitable for speaker recognition in the case of small samples and multiple devices.The number of speaker recognition databases is 31,and the recording time of each person is 10 minutes.The reading part is about 8 minutes.This part is used as the training set.The free speech part is about 2 minutes.This part constitutes the test set.In this paper,multiple devices are used for data collection,and the appropriate parameters are selected for different devices to perform the preprocessing step.For each device,the audio collected by the corresponding device is used for speaker model establishment and recognition.When the training equipment and the test equipment are inconsistent,the recognition accuracy rate drops seriously.Therefore,for the audio to be tested,after the device recognition,the speaker model trained by the corresponding device is used for recognition,thereby improving the speaker recognition accuracy rate of the speaker recognition platform formed by multiple recording devices.Secondly,the Mel Cepstral Frequency Coefficient and Gaussian Mixture ModelBackground General Model are used as the baseline model of the speaker recognition algorithm,and experiments are conducted on the self-built database.This paper designs and implements three types of improvement schemes for commonly used speaker recognition models,and improves the baseline models of commonly used speaker recognition by 2%,4.94% and 9.14% respectively.The first type of improvement is to select the combination of commonly used audio features and speaker recognition models to obtain the optimal feature and model combination for each type of device,which ultimately increases the recognition rate by 2%.The second type of improvement is to improve the baseline model through eight types of data enhancement methods based on the Gaussian mixture model-background general model,and select the optimal data enhancement method for different devices.The results prove that,compared with the baseline system,the recognition rate of the improved system can be improved by 4.94%.The third type of improvement is to use the enhanced empirical mode decomposition algorithm to decompose the original audio signal,extract multiple types of features and combine them,design a multi-channel residual network to perform multi-class speaker recognition,and select the optimal for different devices Combination of features.The results prove that,compared with the baseline system,the recognition rate of the improved system is improved by 9.14%.In summary,this paper improves the speaker recognition system based on the small sample of the smart home system in various ways,so that the speaker recognition effect has been significantly improved.
Keywords/Search Tags:Text-independent, Speaker Recognition, Feature Extraction, Empirical Mode Decomposition
PDF Full Text Request
Related items