Font Size: a A A

The Study Of Chinese Text Classification Based On Web

Posted on:2009-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:L L XingFull Text:PDF
GTID:2178360248956577Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, electronic text information greatly increases. How to organize and manage information and find the needed information quickly, exactly and fully has already become the questions urgently awaited to be resolved in information science and technology. As the key technology of Web text mining, Web text classification can solve the problem of information disorder and "explosion" to a great extent. As the technological foundation of information filtering, search engine and digitized library, Web text classification has extensive application prospects.The present research situation of Web text mining and classification is analyzed in this paper, and the problems that need to further study and solute in the Web Chinese text classification domain are pointed out. Based on these, the technology of Chinese text classification based on Web is deeply researched. The main research content is shown as follows:(1) The key technologies of Web Chinese text classification such as text automatic participle, text expression, feature weight computation, dimension descending are multianalyzed and discussed. The insufficiency of the present feature weight computation as well as the advantages and disadvantages of commonly used feature selection methods are pointed out.(2) Several general text classification methods are introduced and the support vector machine (SVM) based on statistics study theory is mainly researched. The advantages and disadvantages of SVM applied to Web text classification are analyzed and elaborated. The reduction theory and variable precision rough set model are mainly discussed based on the deep research to rough set. The feasibility of rough set applied to Wet text classification is deeply analyzed.(3) In view of the insufficiency of exitsing weight computation methods, the function of HTML tags embellishing homepage contention is researched and a weighted strategy to HTML tags is designed after analyzing the character of Web text, and then a weight computation method of variable precision rough set based on Web text is proposed.(4) A complementary Web Chinese text classification mixed algorithm is proposed based on the above information. In the algorithm, the rough set acts as a front-end processor. The traditional SVM is optimized from classification efficiency and precision through the reduction theory and weight computation the paper proposes. SVM acts as end classifier. The data is classified trough the advantages of SVM after reduced and weighted, in order to further guarantee the performance of classification. Finally, the main implementation process of the mixed algorithm is detail analyzed, and the mixed algorithm's effectiveness is validated through the experimental verification. And the model this algorithm applied to auxiliary technology macro decision-making is proposed.
Keywords/Search Tags:Web text classification, SVM, rough set, reduction, weight computation, decision-making
PDF Full Text Request
Related items