Font Size: a A A

Research On Text Clustering Based On Swarm Intelligence Algorithm

Posted on:2016-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:S J LvFull Text:PDF
GTID:2308330464464994Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The advent of the era of Web2.0, makes the text on the network information in explosive growth, It’s speed is faster than people’s use of effective information. How to obtain valuable information effectively, quickly and accurately from the vast, multi-source heterogeneous, high noise and strong aging data becomes a problem to be solved. And it’s also a big chal enge for the field of computer science. The technology of Internet search ability and text analysis contributed to the birth of text mining, and text clustering as the important means of effective organization, text information and navigation, has been applied in information filtering, improve text classification result, automatic sorting of the digital library services and document collection, etc. and get more and more attention from the researchers. The paper studies the standard artificial fish algorithm and carries on some improvement. On this basis, this paper proposes a fusion fish algorithm with K-means algorithm. By the UCI data sets, some contrast experiments has been carried on for comparing the new algorithm with K-means, PSO, and AFSA respectively. It is found that the hybrid algorithm solve the K- means algorithm that easy to fall into local minimum values to a certain extent and is sensitive to the initial value problem, and improve the convergence precision of K-means and AFSA algorithm.Paper is focused on the research o f Web text clustering algorithm, in the following is the main several aspects work:(1) Starting from the relevant theories of text mining, Analysis and compare three common text representation model and their corresponding text similarity measurement methods. In view of the high-dimensional data space complexity, the techniques of text vector dimension reduction are put forward. And sums up the new task faced by text mining. Meanwhile, a detailed introduction about the text clustering technology is made.(2) The standard artificial fish swarm algorithm is introduced. According to the characteristics of its slow convergence speed in later stage, the adaptive strategy view is proposed. in the early iteration of artificial fish algorithm,it uses the fixed view, with the increase of the number of iterations, adaptive reduced vision value is used. Meanwhile, in order to speed up the optimization efficiency, introduc e the location information of global optimal individual to form MAFSA. And the superiority of the improved algorithm is verified by function optimization.(3) K- means has the characteristics of low time complexity, simple implementation, fast, and the good scalability to deal with large data sets and the shortcoming of the initial value sensitivity, but AFSA is not sensitive to parameters and initial values, but has fast convergence, so on the basis of MAFSA, a new artificial fish algorithm(KAFSA) is put forward. K-means algorithm is introduced into the modified artificial fish school algorithm(MAFSA). After the random part of the artificial fish complete each iteration of artificial fish algorithm, a n iteration of K- means algorithm proceeds. By UCI data sets, hybrid algorithm has been proved to have better clustering than single K- means, PSO, and improved AFSA.(4) Selecte the Newsgroup English document collection as experimental data source, through text pretreatment technology such as the word sharding, removing the punctuation and other unrelated symbols, stopping words, using the commonly used vector space model and the IF- IDF dimension reduction to achieve the text clustering and output evaluation results.
Keywords/Search Tags:Text clustering, K-means Algorithm, Hybrid Artificial Fish School Algorithm, adaptive strategies
PDF Full Text Request
Related items