Font Size: a A A

Research On Algorithm Of Feature Selection And Weighting In Text Classification

Posted on:2017-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:L Y WangFull Text:PDF
GTID:2348330482976771Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text classification is an effective way to organize and manage text information,but there some complicated problems in text classification which include high-dimension,lower ability of category distinguishing,which seriously affect the performance of text classification.In order to solve high-dimension problem in text classification,this paper compared and studied some feature selection algorithms,and selected the expected cross entropy feature selection algorithm which works best in dimension reduction.Meanwhile,this paper made an analysis from the frequency of features information within category,the information distribution entropy of within category and among different categories,an expected cross entropy feature selection method based on information entropy was proposed to resolve the insufficient consideration of the frequency of features in traditional expected cross entropy.Furthermore,the paper studied the feature weighting algorithm of TF-IDF,making an analysis from the concentration distribution of feature item within category,and uniformity distribution among different categories.This paper presented an improved TF-IDF feature weighting algorithm,resolving the insufficient consideration of the frequency of features.Based on this improved algorithm,we implemented the text classification systems.Contrast experiments of text classification had shown that,the high-dimensional problem was resolved by improved algorithm based on information entropy,and the optimal feature subset could be selected accurately,improving the performance of text classification.The improved TF-IDF algorithm resolved the ability of category distinguishing,and gave the more accurate feature weight,improving the accuracy of text classification.
Keywords/Search Tags:text classification, feature selection, information entropy, expected cross entropy, feature weighting
PDF Full Text Request
Related items