Font Size: a A A

Clustering Algorithm In The Web Mining Applications

Posted on:2008-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:M FanFull Text:PDF
GTID:2208360212478912Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, it has increasingly large amounts of information, and it has become the world's largest reservoir of information dissemination. Currently numerous information is on the Internet, but the metheod of analyzing the content of information yet to be resolved. Clustering theory can solve this problem to some extent; by using it, users will not only save time but can also greatly improve efficiency. This thesis aims to discuss the theory, algorithms and applications of clustering technology.First we describe the basic concept of the clustering and analyse the existing models.We then discuss the signification of clustering validity index in the cluster algorithm, and propose a new clustering algorithm based on cluster validity indices. Compared with other algorithm, this algorithm can run without providing clustering parameters. We choose at each step the merging of clusters that results in the greatest increase (or smallest decrease) of clustering validity index. And we take gravitation model as calculating similarity distance; dispose the abnormal points that could appear. Experimental results show that our methods can detect the number of clusters based on given data set exactly, and the error rates of clustering results are lower than the other clustering algorithms. The algorithm outperforms the other cluster algorithms based on clustering validity indices with higher efficiency.We propose an improved Longest Common Subsequence method in hot topic detection.And we use this method for solving the problems of new word, nonstandard grammar and calculating the similarity between files. The system can run without words segment and need not prior knowledge. Experiments show that the clustering system based on this algorithm can depose lots of titles with high accuracy and rapid speed.We adopt the spectral method in community finding to clustering the users in common topic of discussion on the network, while the users have the same reply to...
Keywords/Search Tags:clustering analysis, clustering validity index, LCS, document clustering, community finding
PDF Full Text Request
Related items