Font Size: a A A

Research And Application Of Text Feature Reduction And Classification Rule Extraction

Posted on:2008-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:W J MaFull Text:PDF
GTID:2178360242967579Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of the internet's information amount, it is hard for people to extract the required information quickly and effectively from so much information. If the computer can identify and handle information and then users provide appropriate support and assistance, it will be able to improve the current plight faced by users greatly and make the use of information efficiently. Therefore, the question of text classification has become the recent research focus.In this paper, we worked from every specific step of text classification methods. Firstly, from the feature dimensional reduction aspect, because the traditional computational methods has some disadvantages on the weight of features, and based on the actual correlation between the features and texts, we improved the traditional CHI value approach, and in the new method we considered the positive and negative correlations between the features and texts. Then, in view of the traditional characteristics of dimensionality reduction stage using only a single method as feature selection or feature extraction, in this paper the characteristics of dimensionality reduction methods integrated feature extraction and two feature selection process. Firstly, using pattern aggregation theoretical models to extract features, merge the features which have the similar contributions to text classification, then a new mapping feature space is formed. Based on this, the text model will be transformed into decision table model of rough set, using attribute reduction algorithm of rough set for feature selection, resulting in the final document for the mean characteristics Set. Using rough set of the final value reduction algorithm for text classification rules extraction, thus gained the final text classification rules. In this paper, using the common standard data sets for experiments, the final number of features, the length of rules, classification accuracy of the evaluation and recall all the indicators proposed in this paper the classification rule extraction methods of evaluation.Experimental presents the text classification rule extraction method has a very good drop-dimensional effect, and the classification of higher accuracy rate and recall rate of this method is effective.
Keywords/Search Tags:Text Classification, Feature Dimension Reduction, Pattern Aggregation, Rough Set
PDF Full Text Request
Related items