Research Of Feature Selection Method For Chinese Text Classifization

Posted on:2013-08-06

Degree:Master

Type:Thesis

Country:China

Candidate:J H Chen

Full Text:PDF

GTID:2248330392951372

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of technology and networkâ€™s penetration, more and moredata is available to people and most of these data is in the form of text. Theseunstructured form of data leads to a status with large volume of data but withrelatively rare information. Text mining technology has provided an effective way tosolve this problem. Text classification techno1ogy is a branch of text miningtechnology, which means it is one key technology of managementing and organizingcomplex text data effectively. Text mining can help people organize and streaminformation effectively. Two important research directions of text classification are:feature selection method and text classification algorithm.Feature selection refers to select the feature terms which can best represent thecharacteristics of text from high-dimensional feature term space. Good featureselection method on one hand can reduce the dimension of the text feature space,resulting in the improvement of text classification efficiently, on the other hand goodfeature selection method can improve the accuracy of text classification throughremoving invalid feature terms. Good text classification method is able to improvetext classification result directly.Current feature selection algorithms frequently used in text categorization merelytake the correlation between feature and class into account but pay less attention tocorrelation between the features. In view of this situation, this paper proposes asyntaxic feature selection algorithm, which based on category discriminating powerand correlation analysis. The algorithm firstly uses discrimination power to extract thefeatures that reveal larger differences among categories to reduce the sparsity offeature spaces, and then employs correlation analysis of features to measure relativitybetween features and categories and redundancy among features, so can acquire thefeature subset which are more representative and have no redundancy each other.Experiments demonstrate that the proposed algorithm can improve the performance ofthe classifier effectively.

Keywords/Search Tags:

text categorization, feature selection, category discriminating power, C-correlation, F-correlation, relevant independency

PDF Full Text Request

Related items

1	An Improved Approach To Feature Selection Of Chinese Text Categorization Based On Correlation Grouping Principle
2	Research Of Feature Selection Based On Comprehensive Measure In Text Categorization
3	Research On Text Categorization Algorithm Based On Category Homogenization
4	Feature Selection Methods For Text Categorization
5	Research Of Text Feature Selection Algorithm Based On Hadoop
6	Behavior Recognition Feature Selection Method Based On Category Interactivity And Feature Correlation And Redundancy
7	The Method Of Text Categorization Scheme Selection And Development Of A Prototype System
8	Research And Implementation Of Feature Selection In Chinese Text Classification
9	Chinese Text Categorization Based On Correlation Rules Mining
10	Video Underlying Feature Selection And Evaluation Of Correlation Analusis With The Audience