Font Size: a A A

Speaker Recognition Based On Support Vector Machine

Posted on:2008-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:J DuFull Text:PDF
GTID:2178360212996716Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition is a kind of biometrics technology. It extracts speech features from collected speech utterance, trains speaker's model, then determines speaker's identity. Speaker recognition has broad practical application in many fields such as electric business and information security. With the continuous development of science and technology, the research of speaker recognition also obtained a plenteous result. But how to build up a valid classification model based the limited information to improve the performance, especially the performance of text-independent system, is still a problem and has a widespread concern in this realm currently.Support Vector Machine (SVM) is a new and very promising classification technique. The approach is properly motivated by statistical learning theory. SVM shows many good properties over other methods in solving limited samples problems and non-linear high dimension of pattern recognition problems. The SVM technique has already got an application and obtained better result in character/script identification, face detection, image processing and gene examination etc. This paper applies the SVM technique in speaker recognition, compared with other speaker recognition methods, SVM emphasizes the difference between different samples, so has a stronger classification ability.This paper studies the speaker recognition using Support Vector Machine, mainly aiming at voice activity detection(VAD),solving the large calculation time and calculation quantity problem of SVM ,the expand application in other speech field . The main work of this paper includes the following:(1) The accurate detection of beginning and ending of speech is very important to the high correct rates in speaker recognition. This paper detects the voice activity using the improvement to classical double threshold method—Variable Frame Rate (VFR) based on the Teager energy .Taking the Teager energy as the feature to establish the threshold value,which more accurately describes the "energy" information of the vibrational signal; Adopting the different frame rate to different frequency segment can examine a speech endpoint accurately and meanwhile can achieve a more quickly speed and a more accurate rate in speaker recognition; This VAD method is verified that it is robust to Gaussian white noise through experiments.(2) This paper constructes some two-dimension data randomly and trains them with the common SVM training algorithm, it has got a good classification result, but we know the speech characteristic parameter is the high-dimension and large-scale vector, so this paper adopts the Sequential Minimal Optimization (SMO) algorithm. I put forward an improvement to the SMO algorithm, relevant to pick twoαi for optimization, using the sequencing array instead of alternating loop through all examples, the experiment certificates that SMO algorithm really has the advantage of taking up little memory and fast calculate speed, meanwhile,the improvement to SMO algorithm in this paper is proved that it can save time 50% approximately compared with pre-improved situation.(3) Because the different parameters will have different influence in the speaker recognition experiment, I choose the most suitable parameters for this paper through a series of experiments. At last, I use"16 Mel-frequency cepstral coefficients"as the feature vectors, use the linear kernel function and the punishment coefficient C=0.5 when training the speaker model ,choose the suitable test samples scale by simulating human's ear. Experiment result proves that this method not only can make use of limited speech information well, but also can reduce the calculation quantity and quicken the speed.(4) The binary decision tree algorithm is used in the speaker recognition. Because of the distinct characteristic difference between the male and the female speaker, this paper divides all speakers to two category sets (male and female) firstly, then uses the SVM-decision tree algorithm for further verdict after the gender classification. The SVM decision tree algorithm is just a series of wiping loops, which means that once any of SVM sub-classification machine excludes one speaker, then we can get rid of all the sub-classification machines related with this speaker. The excluded machines are unnecessary to be involved in the next verdict step. Thus the N kinds of classified problem, only need N-1 sub-classification machines to be distinguished.Because the number of classification machine need to be trained and judged have cut down sharply, the calculation have been decreased. If the result of the first gender classification is wrong,then feeds back to the other gender classification sets for verdict again, using the same method. This can avoid some mistakes that the speakers are confused in the tone characteristic.The speech samples used for training and testing are recorded from 18 different speakers. Each of them speak 50 Chinese phrases twice. The first 20 samples are used to generate the trained speaker classes, the same 20 samples recorded second time are used to test in text-dependent system. The other samples are used for testing in text-independent system. The highest correct rates of text-dependent and text-independent system are respective 98.61% and 96.29%. The experiment proves the above method can achieve higher correctness, meanwhile, can simplify calculation, quicken speed.(5) This paper expands the support vector machine algorithm application to the Mandarin digit speech recognition. The speech samples used for training and testing are recorded from 18 different speakers. Each of them speak 10 Mandarin digit twice. The correct rates of"speaker-dependent"and"speaker-independent"system are respective 89% and 81.88%, which means that above method can be used to distinguish the Mandarin digit, but still need a further improvement.
Keywords/Search Tags:Speaker recognition, Teager Energy Operator, Variable Frame Rate, Mel-frequency cepstral coefficients, Support vector machine, Sequential minimal optimization, Binary decision tree algorithm
PDF Full Text Request
Related items