Speaker Recognition Based On Continuous Hidden Markov Model

Posted on:2007-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:H R Wang

Full Text:PDF

GTID:2178360182496900

Subject:Control theory and control engineering

Abstract/Summary:

In the field of automatic biometric identification, speaker recognition has attracted great attentions of many research organizations and companies. Automatic speaker verification (ASV) system extracts biometric using the speech data at the beginning time, and then judges the speaker's identity with pattern matching method. Due to the different demand of text content, ASV can be divided into text-independent ASV and text-dependent ASV. This paper concentrates on the research of text-dependent ASV.Continuous hidden Markov model (CHMM) is used to build speakers' reference model in this paper. Speaker verification includes several steps as below.The input voice is analyzed by front-end module first. Front-end module includes filtering, digitizing, pre-emphasizing and short-time framing. Then locate endpoints of speech segments. Feature vectors of speech segments are extracted afterwards. Finally, build CHMM or make an identity judgment. Endpoint detection, initial CHMM parameters estimation and CHMM training are main subjects in this research.In order to solve the problem that endpoint detection depends on experiential thresholds so much that its adjustability to environment variety is weak, criterion of speech beginning, criterion of speech ending and intelligent endpoint detection algorithm (IEDA) is proposed in this paper. Experiential thresholds dependency can be weaken as much as possible with criterion of speech beginning and criterion of speech ending by simulating the process that human locates speech segment endpoints through reading figures of the voice . Endpoint detection reliability in various environment can be increased meanwhile.Endpoint detection experiments of isolated words indicate that performance of IEDA is much better than that of two thresholds endpoint detection algorithm (TTEDA). In the original voice endpoint detection experiment, total IEDA frame-error rate was 3.17%,while TTEDA total frame-error rate was 18.05%. TTEDA total frame-error rate increased to 5.7 times as compared with IEDA total frame-error rate. After 5dB white noise was added into the voice, when endpoint detection parameters kept unchanged, IEDA total frame-error rate descended to 20.42%, while TTEDA total frame-error rate was 72.12%. Endpoint detection of isolated words and continuous digit-strings indicate that IEDA has sound adjustability to environment variety. IEDA has gotten rid of the great dependency on experimental thresholds. IEDA has simulated the process that human locates speech endpoints by reading figures of the voice. The experimental result has proved that tightness of criterion of speech begging and criterion of speech ending. These two criterions have uncovered the characteristic of speech beginning and ending. IEDA parameters adjustment is convenient and fast. It is advantageous to adjust the parameters with changes of environment.Equilibrium K-means clustering (EMKC) method that makes clusters distribute approximate equably in the training vectors space is introduced in this paper to get more reliable initial CHMM. When standard K-means clustering (SKC) method was used in initial CHMM estimation, there was only 1 vector in some clusters. As a result, corresponding Gauss probability functions covariance matrixes can't be computed. Initial CHMM can't be estimated. The reason to this problem is that SKC method divided the training vectors space unequally. Clustering that divides training vectors very unequally will lead imprecise estimates of mean value vectors and covariance matrixes. A punishment variable is introduced to limit too many vectors to congregate in one or several clusters. It is helpful to make the partition of training vectors samples space more equably. Initial CHMM estimation experiments have proved that SKC cluster unequal rate is 2.3 times as much as EMKC cluster unequal rate, and SKC quantization distortion is 1.4 times as much as EMKC quantization distortion. EMKC method is more suitable to get initial CHMM than SKC method.Because the training speech data is not adequate and assumed CHMM can't describe the practical speech completely, a discriminative training method is used to estimate CHMM in this paper. Discriminative training method is one kind of mutual information training method. When the true distribution of training samples can't be accurately described by the assumed statistical models, mutual information estimation is better than maximum likelihood estimation. The optimization aim of the discriminative training is to minimize recognition error rate, and not to maximize the appearing probability of the training samples as maximum likelihood estimation. Maximum likelihood estimation method demands adequate training data, however in realistic tasks the sparse training data doesn't meet this demand. Furthermore, as the human voice is influenced by health, mood, environment and so on, it changes all the time. HMM built with the past speech data can't describe the distribution of the whole future speech data completely. In the recognition experiment of ten mandarin isolated numbers, recognition rate of CHMM built with discriminative training method is 87.78%, while the recognition rate of CHMM built with the Baum-Welch iterative training method is 76.11%. The discriminative training method has increased the recognition rate by 15.33%. When 15-th order Mel frequency cepstrum coefficient is used as the feature vector and CHMM is trained with discriminative training method, speaker identification recognition rate is promoted from 96.88% that is obtained by CHMM trained with Baum-Welch iterative training method to 99.22%.

Keywords/Search Tags:

speaker identification, endpoint detection, speech activity detection, K-means clustering, hidden Markov model, discriminative training method, maximum mutual information

Related items

1	Research On Speaker-Independent Speech Recognition System Based On HMM
2	Speaker Recognition In Noisy Environment
3	The Research Of Small Vocabulary Speaker-independent Isolated Word Speech Recognition System
4	The Research Of Small Vocabulary Speaker-Independent Isolated Word Speech Recognition System
5	Research On Improved Speaker Segmentation And Clustering Algorithm
6	Research On Speaker Segmentation And Clustering
7	Research And Implementation On Speaker-independent Chinese Continuous Digit Speech Recognition System
8	Studied The Visual Behavior Of The Hidden Markov Model-based Analysis And Anomaly Detection
9	Study On Isolated Mandarin Speech Recognition Technology
10	Research On The Grouping Speech Recognition System Based On HMM