Font Size: a A A

Research And Implementation Of Key Technologies In Chinese Web Text Classification

Posted on:2012-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:X B ZhangFull Text:PDF
GTID:2248330395955583Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization of information process techniques and computer network,Web pages on the Internet is rising at the rate of index. In order to quickly and easily todeal with this information, The Web text classification technology came into being, andhas become a research hotspot in the field of text data mining and information retrievaltechnology.Research on foreign and domestic automatic classification techniques, thisdissertation discusses several critical techniques,from automatic achieving of textclassification knowledge to design of classifier, which affect the results ofclassification in a Chinese Web text classification (CWTC) system. Then it makes adeep research on how to improve precision and speed of the CTC system, at the sametime guarantees the system has a strong stability. Finally the CWTC system isimplemented. Firstly, the CWTC system uses the principle of MapReduce toPretreatment the large number of Web text, and then which uses the improved vectorspace model to do feature representative. Through the research of text feature selectionmethods, a new combined feature selection method is proposed. In the aspect ofdesigning classifier, an improved Na ve Bayes classifier which using the improvedindependent component analysis algorithm enhances the classifier’s performance. Then,in order to improve the system performance, this dissertation creates a structure ofmultiple classifiers, which integrated SVM and the improved Bayes classifiers, andgets better classification performance than any other single classifier. Through manytesting experiments and a statistical analysis of the experiment’s results, it is provedthat the Chinese Web text classification method mentioned in this thesis can achievethe goals which are listed above.In view of the above research results, we describe design and realization details ofthe prototype system.
Keywords/Search Tags:Chinese Web Text Classification, Vector Space Model, BayesTheory, Multiple Classifiers Combination
PDF Full Text Request
Related items