Font Size: a A A

Research And Application Of Text Classification Based On Heuristic Algorithm

Posted on:2018-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2348330512984826Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the expanding of the Internet scale and application, vast amounts of information and resources in the form of electronic have been recorded, and text is one of the most frequently used method. For large quantity of text content, data retrieval,management and data mining present new challenges, and these were dominated by text classification based on pattern recognition technology. In natural language processing and information filtering applications, text exist complex relevance, diversity label and frequent changes, at the same time, the conflicts intensified. Text mining is confronted with lots of difficulties, and also needs the integrated solutions with high accuracy and low consumption of time and space.Heuristic algorithms derive from that people do in life and practice, and they summed up for resolving the problem of experience, rules and methods by observing the laws of nature, biological, physical and social behavior. Heuristic models have flexible application, implementation of highly effective and reliable features for solving combinatorial optimization problems, ant these also provide the new ways to solve classification problems. Existing text categorization model has already been tried in heuristic methods for optimization, such as local search and genetic algorithm. However,there are problems in long iteration time and undesirable accuracy.This paper present the improvement text classification models based on heuristic algorithm, and apply to achieve the educational network environment purification systems. Firstly, we proposed a criterion, which is named LW and used for evaluating feature set in measurement of linearly separable. LW is a linear and high accuracy measurement. It calculates with low complexity and strong immunity to noise. The higher LW is, the higher the degree of linearly separable will be. The performance of feature collection in classification problems will be better, especially in the model of a linear objective function. Second, combined heuristic algorithms, genetic and simulated annealing algorithms, with text-mining technologies, LW-GA and LW-SA feature selection model is proposed. LW-GA use LW and genetic algorithm, which could solve the issues for search feature in high-dimensional space and time-consuming iterative evaluation. LW-SA use LW and simulated annealing algorithms, which could be used for traversing feature space and controlling iteration stopping. On the actual data set, a series of experiments were designed, which achieved good results in ensuring the reliability and greatly reducing the execution time. Finally, as the core of text classification technology, we design and implementation of network environment education network system, which is essentially integrated solution for information identifying, filtering and blocking. Deployment and testing in the field, the education network environment purification system can be used to purify the campus environment,maintenance of adolescent physical and mental health.
Keywords/Search Tags:text classification, heuristic algorithms, feature selection, genetic algorithm, simulated annealing algorithms
PDF Full Text Request
Related items