Font Size: a A A

Research And Implementation Of Feature Selection In Chinese Text Classification

Posted on:2012-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:S B LinFull Text:PDF
GTID:2218330362954377Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of society, especially the rapid development of network technology, various types of information get an exponential growth. Text classification can manage huge and heterogeneous data effectively. Information retrieval and filtering, which based on text classification, helps people get the required information in the huge data and helps people work more effectively. Text classification techniques have become popular and significant research topic.This thesis does the detailed study and analysis on key techniques of text classification firstly, then focuses on the study of feature selection and proposes a new feature selection method. Finally, we design and realize the TC system by new method.①Do analysis on the process and key techniques of TC, and do study on text feature selection methods. We find that negative feature and poor correlation feature effect the quality of selected feature by comparing several common methods which based filter model. Feature selection, this paper proposes a new approach of feature selection for TC, which is based on the strong class correlation and positive class correlation, named SP. SP can eliminate the effect of negative feature and poor correlation feature effectively by selecting positive and strong features, and then get high quality features.②SP has been applied in designing and realizing the Chinese text classification system (CTCS), we do the overall design of CTCS and detailed design of modules of CTCS. This paper study on Chinese grammar analysis tool package ICTCLAS and Full-text search package Lucene, and then combines ICTCLAS and Lucene to be a solution of realizing CTCS, finally realize the CTCS.③We do many comparison experiments on new feature selection method SP and common method, such as DF, CHI.etc. This paper evaluates the result of classification by several classification performance evaluations. The result of experiments indicates the new feature selection method SP can select quality features, construct low- dimensional feature vector and reduce the dimensionality of feature space. SP has a good performance on feature selection in Chinese text classification, reflecting the degree of difference among classes.
Keywords/Search Tags:Text Classification, Feature Dimensionality Reduction, Feature Selection, Class Positive Correlation, Class Strong Correlation
PDF Full Text Request
Related items