Font Size: a A A

Research On The Feature Selection Technique For Text Categorization

Posted on:2009-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhengFull Text:PDF
GTID:2178360245986675Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With development of at full speed of the technology of the computer and WWW, the electronic file information on Internet increases sharply. In the face of so vast information , people urgently need to look for a way that can obtain necessary information fleetly and accurately. And text categorization as the technological foundation is used in such fields as information filtering, information retrial, search engine,text database,digitized library ect. There are extensive application prospects, so it becomes the hot problem.This paper study systematically text automatic categorization from three ways including vector modelrepresentation, feature select and classifer training, Meanwhile applied the Rough sets theory in the text classification.1. The concept of text classification and the vector space model, classification's evaluation parameter, the rough set theory is introduced.2.Arming at the key technologies question in classification, the while process of text representation were discussed—text pretreatment, feature selection, weight computation, generation vector space, Meanwhile proposing a method of text pretreatment based on part of speech choosing and an feature selection method based on extended mutual information, introduceing improvement weight formula MTF-IDF. Three better methods of text categorization—Naive Bayes,KNN and SVM were studied and compared at present. There are any compared experiment on the existing methods of feature selection and the weight formula.3. Combined advantages of Rough sets, a method of feature selection based on Rough Set property reduction technology were proposed and it could realized the text feature selection though using reduction technology of Rough sets. The results indicated that the feature selection method is high based on the Rough sets than Others feature selection methods by experiment contrasting.4.0ne text categorization experimental system was carried out. By using this system, feature selection and weight computation can be studied, trained and tested for different language materials directly.5. This article were summarized and prospected.
Keywords/Search Tags:feature selection, text categorisation, support Vector Machine, Rough sets
PDF Full Text Request
Related items