Font Size: a A A

Research Of Chinese Web Text Classification

Posted on:2008-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaoFull Text:PDF
GTID:2178360242478842Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Web text classification is to put every text in the Web text set into a class. It's an important technology of Web Text Datamining, and a rising research direction of intellective information searches and processing. Because of the later start and the particularity of Chinese, the development of Chinese text classification technology is slower relatively.This thesis analyzes the significance of Web text classification, and introduces the study status especially Chinese Web text classification both domestic and abroad. This thesis introduces the process and the important technology detailedly: first is pretreatment of the Web text, then, this presis introduces the expression of text, index generation and feature choosing, including several method of feature choosing. Then, this thesis introduces some algorithms of text classification, including SVM, KNN, Naive Bayes and so on. This presis also introduces the common criterias of evaluation for text classification algorithms.This thesis improves Chinese Web text classification technology emphatically. We use the SVM-KNN algorithm which combine the SVM and KNN in Web text classification to correct some disadvantage of traditional SVM algorithm, so as to gain better result of text classification; We propose a method to adjust the training set based on the texts density, so as to reduce the operation complexity and increase the accuracy of KNN algorithm; We propose a method, use Unsupervised Text Clustering algorithms (UTC) to guid text classification, so as to deal with text classification without training set. For every algorithm proposed, this thesis gives experiment data to prove the effectiveness of it.At last, we design and implement a Chinese Web text classification system. This thesis introduces every module of this system and the choosing of training set and testing set. The experiments this thesis refers are all performed on this system.
Keywords/Search Tags:Text Classification, SVM, KNN
PDF Full Text Request
Related items