Font Size: a A A

Research Of Clustering Analysis And Its Application In Document Mining

Posted on:2007-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z H YangFull Text:PDF
GTID:2178360182995490Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
We now have lived in an information society. Each kind of information inflates suddenly. The data mining and the knowledge discovery arises at the historic moment, and displays the formidable vitality, which can help people use the information effectively. This paper systematically studies and analyses the data mining technique, document mining and clustering analysis, and propose some improved algorithms.Clustering analysis is an important part of the Data Mining research. Clustering is the process of grouping the physical or the abstract object set into classes or clusters, so that the objects within the same cluster have high similarity in comparison to one another, but low similarity in different clusters.Because of the importance and specialization of the cluster analysis in data management, the research in this field have got a great advancement in recent years, and a number of clustering algorithms have been founded, for example: Partitioning methods;Model-methods, etc.At first, the paper summarizes and analyzes kinds of clustering algorithms, and analyzes the key techniques of clustering algorithms. Then, this paper gives two improved algorithms. One is an improved of SOM. By analyzing how to initialize the connecting power value, aiming at the problems of random power value initialization and long training time of net, a new method is put forward, which use K typical points to initialize the connecting power value. The improved SOM can reduce the training time of net. The other is a hybrid clustering method that combines SOM and K-means algorithm. Firstly the SOM algorithm is used to cluster, and then the clustering results is used to initialize the center points in the K-means algorithm. The hybrid clustering. method can improve the clustering performance.In the end, a document clustering system is given, and the improved algorism is verified utilizing Reuters-21578 and Web data.
Keywords/Search Tags:Data Mining, Clustering Analysis, Document Mining, K-means, Self-Organizing feature Maps (SOM)
PDF Full Text Request
Related items