The Research And Simulation On The Key Techniques Of Text Mining

Posted on:2015-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Chen

Full Text:PDF

GTID:2308330473953963

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Text mining is a part of data mining, which is integrated into a relatively hot research area in natural language processing. Although in the face of complex range of text information, people still are able to organize and combine the information effectively so that the information can be retrieved and located accurately, to improve the efficiency of finding the useful data for users.Based on the analysis of the text mining and its overall framework, this paper researches the following three parts of the text mining:(1) For text feature extraction technique, this paper presents a method based on improved genetic algorithm. This approach takes full advantage of the global optimization ability of genetic algorithms. It applies MI(mutual information) feature extraction method in calculating the adaptation of genetic algorithm firstly, and by using it the correlation between text features and categories will be improved, in order to improve the accuracy of feature extraction ultimately. Then the ant colony algorithm is introduced into selection process of genetic algorithm, to guide the direction of its problem of large randomness, improve the efficiency of the algorithm and save time ultimately. Finally, simulation experiments are conducted to test the accuracy of feature extraction results and execution time of the algorithm. In this way, the efficiency of the algorithm is evaluated.(2) For text clustering, this paper proposes a method based on improved ant colony clustering model. This method makes full use of the self-organization of ant colony clustering algorithm and the insensitivity to the early data input sequence, and improve the shortcomings of it. To solve the convergence problem of ant colony clustering, the aggregation of hierarchical clustering is added to reconstruct the cluster, and a global memory is added to control the whole to prevent clustering too slowly. At the same time, the details of the parameters are optimized to increase the environment adaptation of artificial ants, and ultimately improve the accuracy of the clustering results. In the end, the value precision, recall and 1F are evaluated and the algorithm is proved to be efficient.(3) For text classification, this paper presents an improved KNN algorithm. Since KNN is a lazy algorithm which establishes classifier only in classification process, lowering the efficiency of classifying. This proposed method optimizes KNN and makes it more efficient by trimming the training sample set. Then the algorithm is proved to be more efficient in the aspect of time optimization compared to other peer algorithms.

Keywords/Search Tags:

feature extraction, genetic algorithm, ant colony algorithm, text categorization, text clustering

PDF Full Text Request

Related items

1	Research On Key Problems In Text Mining
2	Research And Implementation Of Recommendation Algorithm Based On Association Rules And Text Categorization
3	Based On Rough Set Text Automatic Classification Study
4	Research On Text Categorization Based On Genetic Algorithm And Fuzzy Clustering
5	Text Classification Technology And Applied Research
6	Research On Text Clustering Based On Text Dimension Reduction And Ant Colony Algorithm
7	Knn Text Classification Algorithm Based On The Semantics Of The Center
8	Design And Implementation Of Kazak Text Categorization System
9	The Research Of Text Representation And Feature Selection In Text Categorization
10	Research And Application Of The Clustering Analysis Based On Improved RNA Genetic Algorithm