Font Size: a A A

An Invasive Weed Optimization Algorithm For Text Feature Selection

Posted on:2014-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2248330398482553Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Computer Information Technology, there are huge amounts of text information resources which have been bombarding people. There is an urgent need for us to mine the valuable information and knowledge from these massive, heterogeneous text information resources quickly and effectively. Thus, data mining emerged. Text categorization is an important research data mining and text feature selection is the key technology and the core issue to text categorization.Domestic and foreign researchers have proposed a variety of text feature selection method. Generally there are several aspects:text feature selection method based on the assessment of functions, such as document frequency, term frequency, information gain, expected cross entropy, mutual information, odds ratio, the weight of evidence for text, and so on; text feature selection method based on semantic understanding; text feature selection method based on the related characteristic; text feature selection method based on genetic algorithm. These methods select text feature by calculating the related parameter values of the terms and selecting the better parameter values of terms in the text. The translation of some low parameter values with more useful information, however, For a more comprehensive text feature, there are some limitations to directly neglect the terms with low parameter values.In order to enhance the text feature selection more comprehensively and to improve the accuracy of it, this paper comes up with an invasive weed optimization algorithm for text feature selection, The biggest advantage of the weed optimization algorithm is that this method holds the non-feasible solutions might carry with more important information than the feasible solutions, so given the non-feasible solutions a chance of survival, although the chances are fewer; the offspring individuals are being randomly spread near their parents according to Gauss normal distribution with the standard deviation of the random function adjusted dynamically during "the evolution process, thus, the algorithm explores new areas aggressively to maintain the diversity of the species in the early and middle iterations,enhance the feature selection of the optimal indiviuals in final iterations. Such mechanism ensures the steady convergence of the algorithm to global optimal solution We use this mechanism to the more comprehensive text feature selection which can improve the accuracy of the text feature selection.This research work mainly includes the following aspects:First of all, building a model based on an invasive weed optimization algorithm for text feature selection;The invasive weed optimization algorithm is a new type of numerical optimization method, providing favorable conditions and means for solving nonlinear problems. Then we used IWO in text feature selection.Secondly, text preprocessing:divide dimension and divide terms of the text, calculate the weight values of term in each dimension; convert synonymous terms and calculate the full text weight values of the terms;We studied the divide dimension algorithm of the text, terms segmentation algorithm, and the full text of the weight value calculation algorithm of the terms.Thirdly, the initialization and reproduction of the term feature populations.During the execution of the IWO, the reproducing operation is an important part of the population evolution, and the reproduction of the population depends on the adaptability of the population. Therefore, we need to compute terms’ adaptability to ensure that the feature terms’ reproduction. We studied the initialization of the feature terms’ population and computing adaptive problems in reproducing.Experimental results show, the invasive weed optimization algorithm for text feature selection method can give the low weight value of entry with feature selection opportunity, and ensure the feature selection advantage of the entry with high weight value, make text feature selection more comprehensive, thereby enhancing the completeness and accuracy of the text feature selection.
Keywords/Search Tags:Invasive Weed Optimization, Text Feature, Feature Selection, Term Feature, Term Feature Populations
PDF Full Text Request
Related items