The Study Of Chinese Text Classification Based On Web

Posted on:2009-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:L L Xing

Full Text:PDF

GTID:2178360248956577

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, electronic text information greatly increases. How to organize and manage information and find the needed information quickly, exactly and fully has already become the questions urgently awaited to be resolved in information science and technology. As the key technology of Web text mining, Web text classification can solve the problem of information disorder and "explosion" to a great extent. As the technological foundation of information filtering, search engine and digitized library, Web text classification has extensive application prospects.The present research situation of Web text mining and classification is analyzed in this paper, and the problems that need to further study and solute in the Web Chinese text classification domain are pointed out. Based on these, the technology of Chinese text classification based on Web is deeply researched. The main research content is shown as follows:(1) The key technologies of Web Chinese text classification such as text automatic participle, text expression, feature weight computation, dimension descending are multianalyzed and discussed. The insufficiency of the present feature weight computation as well as the advantages and disadvantages of commonly used feature selection methods are pointed out.(2) Several general text classification methods are introduced and the support vector machine (SVM) based on statistics study theory is mainly researched. The advantages and disadvantages of SVM applied to Web text classification are analyzed and elaborated. The reduction theory and variable precision rough set model are mainly discussed based on the deep research to rough set. The feasibility of rough set applied to Wet text classification is deeply analyzed.(3) In view of the insufficiency of exitsing weight computation methods, the function of HTML tags embellishing homepage contention is researched and a weighted strategy to HTML tags is designed after analyzing the character of Web text, and then a weight computation method of variable precision rough set based on Web text is proposed.(4) A complementary Web Chinese text classification mixed algorithm is proposed based on the above information. In the algorithm, the rough set acts as a front-end processor. The traditional SVM is optimized from classification efficiency and precision through the reduction theory and weight computation the paper proposes. SVM acts as end classifier. The data is classified trough the advantages of SVM after reduced and weighted, in order to further guarantee the performance of classification. Finally, the main implementation process of the mixed algorithm is detail analyzed, and the mixed algorithm's effectiveness is validated through the experimental verification. And the model this algorithm applied to auxiliary technology macro decision-making is proposed.

Keywords/Search Tags:

Web text classification, SVM, rough set, reduction, weight computation, decision-making

PDF Full Text Request

Related items

1	Research On Text Emotion Classification Based On Rough Set
2	Research On Text Classification Based On Rough Set
3	The Research On Text Classification Technology Based On The Rough Set Theory
4	Research On Relative-Attribute Reduction Algorithm And Decision-Making Method Based On Rough Set
5	Research On Multi-objective Attribute Reduction Based On Decision Rough Set Model
6	The Attribute Reduction Model Based On Decision-making Capacity
7	Multi-class Cost-sensitive Learning Based On Decision-making Rough Set Model
8	Research On Optimization Of Text Classification Based On Improved Rough Set Model
9	Study On Chinese Text Classification Algorithm Based On Rough Set And It's Application
10	Data Mining Method Research Based On Rough Set Theory In Incomplete Decision-making System