Research On Text Clustering Based On Swarm Intelligence Algorithm

Posted on:2016-06-21

Degree:Master

Type:Thesis

Country:China

Candidate:S J Lv

Full Text:PDF

GTID:2308330464464994

Subject:Computer Science and Technology

Abstract/Summary:

The advent of the era of Web2.0, makes the text on the network information in explosive growth, Itâ€™s speed is faster than peopleâ€™s use of effective information. How to obtain valuable information effectively, quickly and accurately from the vast, multi-source heterogeneous, high noise and strong aging data becomes a problem to be solved. And itâ€™s also a big chal enge for the field of computer science. The technology of Internet search ability and text analysis contributed to the birth of text mining, and text clustering as the important means of effective organization, text information and navigation, has been applied in information filtering, improve text classification result, automatic sorting of the digital library services and document collection, etc. and get more and more attention from the researchers. The paper studies the standard artificial fish algorithm and carries on some improvement. On this basis, this paper proposes a fusion fish algorithm with K-means algorithm. By the UCI data sets, some contrast experiments has been carried on for comparing the new algorithm with K-means, PSO, and AFSA respectively. It is found that the hybrid algorithm solve the K- means algorithm that easy to fall into local minimum values to a certain extent and is sensitive to the initial value problem, and improve the convergence precision of K-means and AFSA algorithm.Paper is focused on the research o f Web text clustering algorithm, in the following is the main several aspects work:(1) Starting from the relevant theories of text mining, Analysis and compare three common text representation model and their corresponding text similarity measurement methods. In view of the high-dimensional data space complexity, the techniques of text vector dimension reduction are put forward. And sums up the new task faced by text mining. Meanwhile, a detailed introduction about the text clustering technology is made.(2) The standard artificial fish swarm algorithm is introduced. According to the characteristics of its slow convergence speed in later stage, the adaptive strategy view is proposed. in the early iteration of artificial fish algorithm,it uses the fixed view, with the increase of the number of iterations, adaptive reduced vision value is used. Meanwhile, in order to speed up the optimization efficiency, introduc e the location information of global optimal individual to form MAFSA. And the superiority of the improved algorithm is verified by function optimization.(3) K- means has the characteristics of low time complexity, simple implementation, fast, and the good scalability to deal with large data sets and the shortcoming of the initial value sensitivity, but AFSA is not sensitive to parameters and initial values, but has fast convergence, so on the basis of MAFSA, a new artificial fish algorithm(KAFSA) is put forward. K-means algorithm is introduced into the modified artificial fish school algorithm(MAFSA). After the random part of the artificial fish complete each iteration of artificial fish algorithm, a n iteration of K- means algorithm proceeds. By UCI data sets, hybrid algorithm has been proved to have better clustering than single K- means, PSO, and improved AFSA.(4) Selecte the Newsgroup English document collection as experimental data source, through text pretreatment technology such as the word sharding, removing the punctuation and other unrelated symbols, stopping words, using the commonly used vector space model and the IF- IDF dimension reduction to achieve the text clustering and output evaluation results.

Keywords/Search Tags:

Text clustering, K-means Algorithm, Hybrid Artificial Fish School Algorithm, adaptive strategies

Related items

1	Application And Research Of Artificial Fish School Algorithm In Dtc System
2	Research On The Modified Artificial Fish Swarm Intelligent Optimization Algorithm And Its Application
3	School Choice Of Auxiliary Information System For Rural Areas
4	Study On The Analysis Of Reader Behavior Using A Clustering Algorithm Based On Artificial Fish Swarm Algorithm
5	Research On Regular Expression Grouping Based On Artificial Fish School Algorithm
6	Analysis And Research On Improved Artificial Fish Swarm Algorithm
7	Research On The Fuzzy Time Series Forecasting Model Based On Improved Artificial Fish Swarm Algorithm
8	Research On Anomaly Detection Based On FCM With Adaptive Artificial Fish Swarm
9	The Cruise Strategy Of The MUUV Based On Artificial Fish School Algorithm
10	Tactic Of Optimization Of Ant Colony Algorithm And Its Application