Font Size: a A A

Study On Text Classification Based On Rough Set And Support Vector Machine

Posted on:2009-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:H Z Y XiaFull Text:PDF
GTID:2178360278471004Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The present situation of text classification and the existing problems were first systematically elaborated; the relevant technologies of the text classification were introduced and explored according to the classification flow. The key technologies, text representation, feature selection and text classification algorithms, etc., were selective analyzed and investigated.The principles of Rough Set theory, Support Vector Machine (SVM) were systematically summarized and investigated. Attribute reduction algorithms, SVM training and classification algorithms were discussed respectively.In order to raise the accuracy rate of text classification and decrease running time of SVM classification algorithm, aimed to the characteristics of high dimensionality and sparsity after text representation, an improved attribute reduction algorithm based on attribute significance function was proposed after investigated and analyzed the weak points of the some existing attribute reduction algorithms, its feasibility and lower time complexity of the improved algorithms were theoretically proved by the comparative analysis between the improved one and others.A kind of text classification method combined with the merits of Rough set and SVM theoies was proposed. The improved attribute reduction algorithm was utilized to further reduce the dimension of text feature items after feature selection, decreased the influences of the redundancy attributes, shorten the training time of SVM algorithm, and hereby a text classification system combined with Rough set and SVM was designed and implemented, the classification effects before and after using improved attribute reduction algorithm were compared, explored the selection of penalty parameter C to affect the classification result. The experimental result shown that better classification effects ccould be acquired by adopting the mixed classification method when the dimension of the text feature space was larger than 2500, so the improved attribute algorithm was improve in practice under the high dimension situation.Finally, the achievements and insufficient points of the article were concluded, the next research was looked ahead.
Keywords/Search Tags:Text Classification, Feature Selection, Rough Set, Support Vector Machine, Attributes Reduction
PDF Full Text Request
Related items