Font Size: a A A

Research Of Feature Selection Algorithm In Short Text Classification

Posted on:2014-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:T B LiFull Text:PDF
GTID:2248330401451914Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Due to the emergence of the social web, a massive short text information swarmsinto people’s life. The short text classification technology plays a very important rolethat how to quickly and accurately obtain the key information, to better help us to dotext mining or commercial mining of the massive short text data, and it has broadapplication prospects in the interest mining of users, hot topic tracking, buzzwordsanalysis and early warning etc. The process of short text classification contains: textpreprocessing, feature representation, feature selection, feature weighting computation,construction of classifier etc, the Feature selection is a very important step for short textclassification because of its short length, to contain less effective information, to leadthe high dimension of feature set. so accurately and effectively reduction thedimensionality is very crucial. The features selection that selects a set of features that isthe largest contribution to the classification from the original feature set. An efficientfeature selection algorithm not only reduces the dimension of feature set, but alsoimproves the effect of text classification. Therefore, how to design an efficient featureselection algorithm is very important. This paper mainly performs the following workaccording to the above problem:This paper firstly briefly introduced the research background and significance,analyzed and summarized the research status of the short text classification, andemphasis introduced the research results and research hot spot of feature selectionalgorithm in the short text classification, and briefly described the theoretical foundationand involved related technologies of the text classification. The feature selectionalgorithm research mainly involves two aspects that how to design the evaluationfunction and determine the search strategy, this paper proposed a short text featureselection algorithm based on fuzzy entropy that to aim at the problem of how to designevaluation function in the feature selection algorithm, to consider the fuzziness ofsample feature, fuzzy entropy is applied to feature selection that fuzzy entropy is used tomeasure the weights of feature, and according to the characteristics of the short text classification, intra-class dispersion and inter-class dispersion is used to design themembership function of fuzzy entropy. The simulation experiment show that theeffectiveness of the algorithm.Direct at the threshold k of feature subset size is difficult to determine in thecommon feature selection algorithm, to take the different values of k to obtain theclassification effect will vary widely. This paper proposed a short text feature selectionalgorithm based on particle swarm optimization algorithm that the particle swarmoptimization algorithm has the advantage of simple concept, easy to implement and topossess the strong global search ability. This algorithm firstly pre-select features fromthe original text set by fuzzy entropy, and then make the second time of feature selectionof the primary feature set based on improved particle swarm optimization algorithm. Inorder to overcome the premature convergence of the particle swarm, the superiority ofcloud model is used to dynamically determine the inertial weight; In order to make thebetter search efficiency of particle swarm, initialized the particle population accordingto the size of the fuzzy entropy of features, and adopted iterative threshold to control theend of algorithm. The simulation experiment show that the effectiveness of thealgorithm.
Keywords/Search Tags:Short Text Classification, Feature Selection, Fuzzy Entropy, Particle SwarmOptimization Algorithm
PDF Full Text Request
Related items