Font Size: a A A

Research And Improvement Of Feature Selection Algorithm In Text Classification

Posted on:2017-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:J J XuFull Text:PDF
GTID:2348330482486925Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an effective method of managing and organizing text information,text classification has always been the research hotspot in the field of text mining.But in text classification,there are some problems such as high dimensionality of features,sparsity and high class discretization and so on,which seriously affect its accuracy.In order to solve these problems,this paper choose feature selection algorithm as the main research object to put forward improved feature selection algorithms of mutual information and information gain.The improved mutual information feature selection algorithm developed a feature evaluate function of mutual information by introducing word frequency and information distribution of the features to remove influence of feature words of low frequency and information distribution within class on classification,so as to improve the accuracy of text classification.Through introduction of feature frequentness and information of discretization,the improved feature selection algorithm of information gain built up an evaluate function of information gain to reduced impact of unbalanced distribution of feature words and class discretization on the text classification.Then the probability of feature words that do not appear removed from the evaluate function to further optimize the feature evaluate function.Therefore,the accuracy of text feature selection and the effect of text classification can be improved.The realization of the text classification system in this paper is based on two improved algorithms above.Contrast experiment of text classification indicates that two improved algorithms proposed can select the best subset of features accurately,and are superior to traditional classification algorithms in recall,precision and F1 value of text classification.
Keywords/Search Tags:text classification, feature selection, mutual information, text feature, information gain
PDF Full Text Request
Related items