Font Size: a A A

Study And Realization Of Text Categorization In Chinese Speech Recognition Results

Posted on:2009-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhaoFull Text:PDF
GTID:2178360272970842Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The research in speech recognition technology began in 1950s,after 50 years of development,speech recognition theory has matured.Till 1990s,with the era of multimedia, speech recognition technology was more and more ripe,and the speech recognition system became realization from the laboratory.Speech recognition technology can be used in speech communications system,voice-activated telephone exchange system,data query,booking system,hotel service,health care,banking,computer control,industrial control and other areas.In order to achieve such a possibility,it is essential to classify the text recognized from voice signals.In this paper,based on the framework of a traditional Chinese text categorization system, a categorization system for Chinese speech recognition text is realized,which uses the improved SVM,KNN and Naive Bayes algorithms.Then the texts of speech recognition results are divided into 10 categories which includes more than 1200 Samples for the experimental analysis of the system.In allusion to the characteristic that speech recognition text has divided into terms,The system simplified the pretreatment process of text categorization and remove the word segmentation process of traditional Chinese text categorization system.By the side of classification construction algorithm,this paper focused on support vector machines. Comparing the different kernels of SVM with each other,the conclusion is radial basis kernel is the best.The performance of the text categorization is lower,because the samples of the library is aliasing and unequally distributed in the 10 category,but it can be promoted by using biased hyperplane and optimization parameters,weighting plus and minus samples, balancing the error rate of positive and negative samples.Through experiments,different feature selection algorithms and three different categorization algorithms are analyzed and compared with each other.The system can obtain the highest recall and precision by IG method,but by MI method the results are very unsatisfactory.Further more,the improved SVM in this classification system has very obvious advantage and reaches a very high accuracy,and it is good to be used as a research platform.
Keywords/Search Tags:Text Categorization, SVM, KNN, Naive Bayes
PDF Full Text Request
Related items