Font Size: a A A

Research On Feature Selection Method In Text Classification

Posted on:2013-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2248330377458549Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Text classification has been one of the key research in the field of data mining, its purpose is to make large number of unclassified documents classified into its category. Text classification is also an important foundation of information technology of text information processing, including information retrieval and filtering. Therefore, the improve of the level of text classification technology can give support to the improve of information technology and its application. The research aim of text classification technology has been changed form the research of the basic theory to the simple and high speed algorithm. Therefore, efficient algorithm is one of our goal of our research.This paper outlines the basic concepts and algorithms in text classification, then eviews each part of the whole text classification process, Especially some important part in detail, such as feature selection and the classification algorithm. Then improvement of the bi-normal seperation feature selection method in text classification is proposed. This paper analyzed the original algorithm and found that there is not the concept of word frequency statistics, which results in a shortage of the original algorithm. By adding the concept of dispersion in the formula, the improved bi-normal seperation method showed better perfermance than the original and the other feature selection method in the Chinese text classification experiment.The article also makes the experimental analysis on the bi-normal separation feature selection method and the improved form. And the result shows that bi-normal separation method is superior to other algorithms on the classification performance and improved one is better than the original. Experiment also shows that this method has advantage in the best number of features selected. The article also makes analysis of comparative experiments on kernel function of SVM with the bi-normal separation feature selection method, and the experiment of the optimization experiments on the penalty factor and the parameter of radial basis kernel function.
Keywords/Search Tags:text classification, feature selection, bi-normal separation
PDF Full Text Request
Related items