Font Size: a A A

Research Of Text Clustering Technology Based On Colony Intelligence

Posted on:2010-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y J MaFull Text:PDF
GTID:2178360302959198Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, more and more electronic form information existing in the Internet online-resources is a major origin for people to obtain information. Facing massive information, people need organize and manage these resources effectively, which facilitates subject discovery and information retrieval. As an unsupervised classification method, text clustering is an automatic processing for text sets. According to the characteristics of the text, the text sets can be divided into several categories which make the similarity as small as possible within the category, and the similarity as far as possible without the category. Feature selection and clustering algorithm are the most important two steps in text clustering technology, so the research for these issues are focused in this paper.Firstly, clustering being lack of classificatory information, unsupervised feature selection method is difficult for selecting the most distinguishing characteristic term, Aims at above mentioned problem, An integrated unsupervised feature selection method applying in text clustering was proposed in this paper. This method uses the supervised feature selection method which has been applying in text classification successfully into text clustering. First, this method utilize K-Means clustering algorithm which selecting different K value to get different classificatory information, then use supervised CHIR feature selection method to select optimal feature subset.Secondly, due to the ant's moving is random in ant-based text clustering algorithm, which leads to too many dispersed points on the grid space and the convergence speed too slowly. Aims at above mentioned problem, an ant-based fast text clustering approach using pheromone was proposed. This approach utilizes pheromone left by ants to avoid ant's moving randomicity, which can make the ant move towards direction which has high pheromone concentration at each step, and the direction of moving is the orientation where the text vectors are relatively concentration, which can reduce the time ant's finding clusters, accelerate the convergence speed of the algorithm, and enhance the accuracy of clustering results.Finally, an experiment platform based on ant-based fast text clustering using pheromone was realized with VC++ development tool, then test these research results of text clustering technology, and analysis of clustering results performance, this research provide direction for the further research.
Keywords/Search Tags:Text clustering, Vector Space Model, Feature Selection, Ant-based Clustering Algorithm, Pheromone
PDF Full Text Request
Related items