Font Size: a A A

Research On Web Chinese Text Automatic Categorization Based On RS-SVM

Posted on:2011-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:J W RongFull Text:PDF
GTID:2178330332982013Subject:E-commerce
Abstract/Summary:PDF Full Text Request
With the application and popularization of information technology, particularly the rapid development of Internet technology, information is growing explosively, and filling every aspect of our lives, people in daily life always need to obtain information, analyze information, using information. How effectively to mine the beneficial information which we need and interested in from the intricate information, becomes a problem in the area of computer application. And text categorization is an important means for data mining, so this paper does further research in this field.Firstly, this thesis particularly introduced the related technology of text classification according to the procedure of Web Chinese text classification, particularly researched and analyzed the key technologies of text classification including text pre-treatment, text representation, text dimension pre-reduction, text classification methods and so on. Secondly this thesis systematically elaborated the basic theory of the rough sets and support vector machine. In order to improve the classification performance of the system and to reduce the classification running time, this paper puts forward a new kind of text classification algorithm based on the combination of rough sets and support vector machine (SVM). It uses knowledge reduction algorithm of rough sets to reduce the dimension of pre-treated data, to reduce and delete the redundant attributes. It also uses the generalization ability and classification ability of support vector machine to train the test data, thus to achieve the purpose of complementary advantages. When the thesis was expatiating the rough set theory, particularly introduced its core theory of knowledge reduction algorithm, and proposed an improved heuristic attribute reduction algorithm, in order to improve rough set theory's ability of dimension reduction and greatly to reduce the dimension of the text. When the thesis was introducing the basic concepts of support vector machine, it was focused on two classification algorithm and multi-classification algorithm. For two classification algorithm, based on the pre-researcher's results, this thesis proposed a modified SVM two classification algorithm of combination of kernel function. For multi- classification algorithm, based on the comparison of "one-vs-rest", "one-vs-one", decision directed acyclic graph and binary decision tree algorithm, it proposed an ameliorative layer of clustering center distance binary decision tree SVMs multi-classification algorithm. Finally designed and realized the Web Chinese text classification system based on improved RS-SVM algorithm, and used it to classify the Web Chinese texts which were searched on the Internet. The results verify the superiority of the improved algorithm on area of the Web Chinese text automatic categorization.
Keywords/Search Tags:Text Categorization, Rough Set, Attribute Reduction, SVM, Binary decision tree
PDF Full Text Request
Related items