Font Size: a A A

Two Types Of Bionics Algorithm In The Application Of Text Classification

Posted on:2012-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z NingFull Text:PDF
GTID:2218330338970860Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, users can access to increasing amount of information, most of which is text-type data, an efficient management and effective use of technology in processing such disorder data-text mining technology in the past few decades becomes a hot research field, text classification is an important research direction in the field. Since 90 years, text categorization has introduced in statistical method and machine learning method, replacing the previous knowledge -based engineering classification method, also emerge a large number of studies about the key technologies of text categorization, These studies include in the text preprocessing, feature selection, text representation model, classification algorithm and classification performance evaluation and so on. in processing massive data development of the Internet brought, a variety of text processing methods have shown some difficulties. Such as the amount of data is large,the large dimension of the established vector space model, a long time for pre-processing and computing, a lot of noise data in the data set and low accuracy problem of classification algorithm. In this paper, feature selection in text categorization and classification algorithm is studied.Good point set genetic algorithm is a random search algorithm, re-designs crossover with the theory of good point set of number theory, to guide the ancestors of higher fitness model "family" orientation, Compares with the genetic algorithms, this algorithm improves the accuracy and speed, and avoids early convergence. Covering algorithm starting from geometric point of view, mappes the vector of input sample to the sphere of high-dimensional space, and cover each type of sample with areas as little as possible through training to form classification network model. Particle swarm algorithm is a evolutionary algorithms of simulating migratory birds, similar to genetic algorithm, starting from random initial to iterative search for the best solution, and evaluates the quality of solution with the fitness, but it has no two operations of crossover and mutation in the iteration process,and is easy to implement, high precision and fast convergence of the algorithm.This paper combinates the principles of search for better sample in the ancestors of higher fitness model of good point set genetic algorithm with simple and effectiveness of simple K nearest neighbor algorithm, proposes a feature selection method based on good point set genetic algorithm; For covering algorithm is good for high dimensional data processing, but there is a contradiction between classification accuracy and generalization ability.this paper combines cover algorithms and particle swarm optimization algorithm, gives an improved particle swarm optimization covering algorithm. Finally, text classification system is constructed in this paper, through experiment and comparative analysis in three groups of data and performance evaluation with F1 measure,its results show that the proposed algorithm can effectively improve the classification accuracy and efficiency.
Keywords/Search Tags:good point set, feature selection, Particle Swarm Optimization
PDF Full Text Request
Related items