Font Size: a A A

Speaker Recognition Research Based On Chinese Vowel Mapping Methods

Posted on:2008-11-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:B QianFull Text:PDF
GTID:1118360245979130Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Speech is the most convenient, fast and natural tool to communicate with other people. In recent thirty years, along with the development of science and technology, the research of speaker recognition technique has achieved many productions, which will bring us more convenience in our daily life. However, in different application, the standards and requirements become much more higher and the system is susceptible to different influence. On one hand, speech signal is non-stationary, which requires high adaptability in real system; On the other hand, speaker recognition system will be influenced by many factors, such as noises, training time and the distortion of communication channels. The foremost thing in speaker recognition is how to extract appropriate features, which only reflect speaker's identity information and avoid the semantic disturbance, and how to establish an effective model, which can effectively make use of the available data and be robust to actual different environments. In this paper, we researched on speaker recognition system from two aspects, separating exactly identity information and improving the robustness of system based on Chinese mandarin, and proposed novel algorithms and models.Firstly, in this paper, we presented a novel framework of speaker recognition based on Chinese vowel mapping technique. The base of this framework is the decomposition of Chinese multi-vowel with single-vowel phonemes. According to contrast the spectrum, features, single-vowel phoneme glide statistical distribution and the performance of vowel classification, we confirmed that Chinese vowel could be separated into several single-vowel phonemes based on the short time characteristic. Then we built up a new mapping table from multi-vowel to single-vowel phoneme as the assistant of the latter research through a great deal of experiment and theory. The new framework added a special model to implement the separating and organized several single-vowel classifiers to replace the traditional classification module, which can not only avoid the disturbance of semantic information and achieve higher performance, but also intensify the pertinence of classifiers compared with the traditional classifiers. In the new framework, it adopts short time frame as the basic identify unit, which makes it more compatible to real time system.Under the new framework, we improved the method of vector quantization based on the classifier of Chinese vowel. Because each VQ classifier only deals with one certain kind of phoneme, it can avoid the influence of semantic information, and achieve higher accuracy and performance with smaller codebook than traditional VQ method; However, in order to assure the quality of codebook, it needs a great deal of data during training and testing phase, so we proposed a new Chinese speaker identification system based on biomimetic pattern recognition combining foregoing new framework. We improved the nearest neighbor algorithm to find the cover of each phoneme in the eigenspace for every speaker. During the identification phase, the final decision will be made according to the relationship between the cover and the feature characteristic. Experimental results demonstrate that the system can efficiently reduce the requirement of data. During the research, we find that the new system will introduce in classifying error more or less and decelerate the recognition speed because the new framework increased a special vowel classification module. Owing to this, we proposed a novel neural network ensemble system based on Chinese vowel mapping technique using the ensemble learning theory. During recognizing phase, the system needn't special vowel classification, so it can avoid error in some sense and speed up the whole system.Furthermore, we still research on pre-processing module and decrease the disturbance of noise for our new framework. A self-adaptive vowel-frame detection algorithm based on energy distribution analysis in frequency domain was presented to extract vowel frame more accurately. We also proposed a new method by modeling the background noise to statistically estimate Gaussian Mixture Model for the pure speaker information. At the end, a robust speaker verification method based on weighted feature compensation transformation is presented during the feature processing and model compensation.The sufficient theory analysis and experimental results demonstrated that the presented model and algorithms based on novel framework have achieved higher accuracy, speed and enhanced the robustness in different conditions compared with many traditional methods. Specially, we succeed in separating personal identification information from semantic information based on classifying the Chinese vowel, which will be a new way to transform the text-independent system into text-dependent speaker recognition system.
Keywords/Search Tags:Speaker Recognition, Vowel Classification, Chinese Vowel Mapping Technique, Vector Quantization, Biomimetic Pattern Recognition, BP Neural Network, Neural Network Ensemble, Vowel Frame Detection, Pitch Detection, Gaussian Mixture Model
PDF Full Text Request
Related items