Font Size: a A A

Research Of Key Problems In Voice Password Recognition

Posted on:2012-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2178330338491954Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Voice password recognition is a practical application of text-dependent speaker recognition. It is a kind of effective authentication and is widely used in the security system. The voice password recognition not only pays attention to the characteristic of the speaker's vocal tract, but also takes the content of speech into account. The recognition accuracies can be easily affected by several factors such as the environmental noise, the leak of the password, the data sparseness, etc. It is important for us to resolve these difficulties and improve the performance of the voice password recognition.If the impostors do not know the content of the password, the classical text-dependent recognition algorithms can easily distinguish the target speaker and the impostors. When the voice passwords are completely disclosed (the content of the imposters'speech is the same as the registers'), the recognition accuracy will decrease greatly. This paper focuses on the above problem. Many algorithms are proposed to improve the performance of the voice password recognition.Robust front end is one of the key parts in the voice password recognition. On the one hand, a new VAD (voice activity detection) method is proposed, which combines the energy-based VAD method and the model-based VAD method. Efficient acoustic feature for the recognition can be obtained by detecting the accurate speech endpoint. The equal error rate (EER) of the voice password system using the new VAD method can be reduced by 4.4% compared to the EER of the baseline. On the other hand, this chapter proposes a frequency selection (FS) method based on the acoustic characteristic. The FS method increases the speaker discrimination. The EER can be reduced by 27.9% compared to that of the baseline.The temporal characteristic of the speech is also concerned in this paper. A new N-gram nearest neighbor method is proposed by using the temporal correlation of the frames. This algorithm improves the recognition rate. The EER of this algorithm can be reduced by 7.7% compared to the baseline. Through this experiment, we can prove the importance of the phoneme in the voice password recognition as well.The duration of the speech in the voice password recognition is always very short; this data sparseness problem will lead to the model overfitting. A new HMM (hidden Markov model)-UBM (universal background model) algorithm is proposed to resolve this problem. A mono-phone HMM-UBM is firstly trained using the speaker-independent corpus, and then the hypothesized speaker model is obtained by adapting the parameters of the UBM using the speaker's training speech and the MAP (maximum a posteriori) estimation. This algorithm solves the data sparseness problem well and the EER is 6.57%. When the FS method mentioned in chapter three is used, the EER of the proposed system can be further decreased 31.3%.
Keywords/Search Tags:voice password, voice activity detection, frequency selection, temporal correlation of the frames, hidden Markov model, universal background model
PDF Full Text Request
Related items