Font Size: a A A

Language Identification Research Based On SVM

Posted on:2008-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z L DiFull Text:PDF
GTID:2178360215959541Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the technique of speech recognition, language identification is paid more and more attention for its significance as one of the aspects of speech recognition. Language identification is a kind of technology of identifying the language of an utterance automatically by using a computer, whose development is based on speech recognition. From the seventies of the last century up to now, though it is just several decades, many kinds of ways of language identification with their own characteristics have already come into being, most of which are not mature. At present, the research of language identification in our country is still in its beginning stage and is less extensive.Language identification is accomplished under the condition of text-independence and speaker-independence, thus it is necessary for language identification to eliminate the individual information of the signal of speech sound of different languages as far as possible so as to achieve a better effect of recognition.First of all, the speech characteristics of the languages are analyzed to find the differences among various languages. The characteristic coefficients of the speech are picked up and they are represented by the vector.Then this paper puts forward that the outliers of the training vector should be trimed by making use of the weighted K nearest neighbor. For each training vector between every two languages, the vector whose Euclidean distance to it is one of the K smallest should be found. The sort of the training vector should be judged if identical to the majority of the K nearest neighbors. Each characteristic vector of the K nearest neighbors makes a different effect on trimming the objective characteristics. The nearest one contributes most and the K nearest one contributes least. Different weight value can be given to the K nearest neighbors. After summing up the weighted sort symbols, judge if it is identical to the objective vector, keep it if identical, and delete it if not identical. At last, the one-against-one support vector machine(SVM) is trained by the training vectors which have been trimed. The test vectors are voted to be classified by the one-against-one SVM. The language which gets the most votes is considered as the language of the unknown speech.The experiment result shows that with the small number of training vectors, the average recognition rate of KNN-SVM is 78.66% and the average recognition rate of SVM is 76.15%. With the same number of training vectors, the number of KNN-SVM's support vectors is smaller than that of SVM'. It results in the classification time of KNN-SVM less than that of SVM. The performance of KNN-SVM is better than that of SVM.
Keywords/Search Tags:Language identification, Acoustic characteristics, Support vector machine, K nearest neighbor
PDF Full Text Request
Related items