Font Size: a A A

Research On Clustering Algorithm And Its Application In Page Clustering

Posted on:2010-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:J FangFull Text:PDF
GTID:2178360275478238Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
Internet has become an enormous, distributed, global information service center. How to provide fast, efficient and accurate information for users which they need, with a high degree of relevance of a cluster of Web pages, has become an important aspect of the field. An effective way to solve this problem is carries on the reasonable cluster analysis to the pages, so as to sort, memory, index and integration for Web information more efficiently. The main purpose of this paper is to improve the clustering algorithm and its application on clustering page.Contributions of the dissertation are as follows:(1)The theories, methods and process of clustering analyze will be researched on this paper.(2)This paper selected K-means algorithm of Partitioning-based Method and DBSCAN algorithm of Density-Based Method for improve research in the clustering algorithm. Use the datasets for algorithm experimental. In this paper, the improvement aim of K-means algorithm is to change the selection of the initial cluster centers, that is, how to find the spatial distribution and data photogenic at the same initial cluster centers, the improved method is based on the natural distribution of data to select the beginning of cluster center. The DBSCAN algorithm, the parametersĪµand Minpts are automatically selected to reduce the impact of the subjectivity.(3)It has given the page cluster step and the algorithm description.(4)Select data sets on the page clustering algorithm for example. A system has realized for the two algorithm's page cluster processes. Analysis and evaluation the example confirmation result. The experimental result indicated in the selected data sets: after the improvement two algorithms improve in the rate of accuracy;the accuracy of two improved algorithms is no less and with the experimental data sets decreased; The K-means improved algorithm surpasses the DBSCAN improved algorithm obviously in the time performance.
Keywords/Search Tags:Clustering analysis, K-means algorithm, DBSCAN algorithm, page clustering
PDF Full Text Request
Related items