Font Size: a A A

Research And Implementation Of Statistical Model-based Speaker Recognition

Posted on:2011-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:L J LiFull Text:PDF
GTID:2208360308466651Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Speaker recognition belongs to a kind of speech signal processing, it identifies speaker by mining the personalized features in speech signal which represent speaker's physiological and physical charactors. The key technologies of Speaker Recognition are feature extracting and speaker model building. This thesis studies the text-independent speaker recognition from theses above two aspects.With the boom of multimedia data, which make effective management of data in music database to be more and more critical, the research on music data which combines the speech signal processing technology and the characteristics of music data has become a valuable and hot topic recent years. This thesis applys the technology of speaker recognition to music signal processing.Based on extensively study on feature coefficients and modeling methods which are widely used in speaker recognition, This thesis mainly uses Mel-Frequency Cepstral Coeffcient (MFCC) as feature of speech and GMM as the models of the speech when processing speech signal. CMFCC feature which can improve the performance of the system effectively is proposed based on MFCC: it is obtained after doing mean subtraction processing with MFCC. Besides, liner combination model (LGMM) is proposed to separate the pvoc and svoc in music effectively based on research of speaker recognition. The method of creating LGMM is stated as follows: Firstly, create GMM for hand-labeled pvoc and svoc data. Then create another GMM for pvoc and svoc data using pure singing data and pure accompaniment data respectively. Finally, obtain a final probability model for pvoc and svoc through the linear combination of these two GMMs in each class.The main work in this thesis is as follows:1. Create text-independent speaker recognition system without background noise using MFCC, CMFCC, GMM and UBM-GMM.2. Apply MFCC and GMM to the separation of singing data (pvoc) and accompaniment data (svoc) in music, and develop the specific method and process of creating LGMM. And use it to separate the pvoc and svoc data in music. 3. According to LGMM, we separate the singing data (pvoc) and accompaniment data (svoc) in music firstly, and create MFCC and GMM based singer recognition system using pvoc and svoc data.4. Analysis the effects of the length of training data and the number of mixture components of GMM on the system performance, compare the performance of the system builded using MFCC, CMFCC feature and GMM, UBM-GMM, LGMM model through experiment. The result demonstrated that CMFCC feature and UBM-GMM, LGMM model can improve the correct rate of the system effectively.A lot of experiments have been done and the experiment results shows that MFCC and GMM can be used in speaker recognition, music signal processing. It also proves that CMFCC feature and LGMM model can improve the performances of system effectively.
Keywords/Search Tags:Speaker Recognition, Singer Recognition, Seperation of pvoc and svoc, GMM, MFCC
PDF Full Text Request
Related items