Font Size: a A A

Between The Different Types Of Data Clustering Algorithm

Posted on:2011-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiuFull Text:PDF
GTID:2208360305997623Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the fast development of Information Technology, we can get lots of data. These data always have different types. The subject of this paper is to establish relationships between these data, and cluster them. We propose two steps to cluster data of different types. The first step is data preprocessing, and we highlight on preprocessing of web data. We also propose a web page cleaning algorithm based on block template, and a duplicate web page elimination algorithm based on Bloom Filter. Our web page cleaning algorithm extracts templates with smaller granularity, and results in a higher precision. On the other hand, the use of Bloom Filter greatly reduces time complexity and space complexity of web page elimination algorithm. The second step is clustering. We propose to use different clustering algorithm to cluster different types of data (K-means algorithm for web data, and Association Analysis for form based data). In this way, we can take full advantage of characteristics of different types of data. After clustering these data respectively, we integrate the results of clustering, and figure out the relationships between these data. In our experiments, we use these algorithms to provide advertisements for credit card customers. The results of our experiments shows that our algorithms can accomplish the web page data preprocessing work, establish relationships between web page data and form based data, and provide better personalized advertisement service.
Keywords/Search Tags:clustering, web page cleaning, block template, web page duplicate elimination, Bloom Filter, K-means, Association Analysis
PDF Full Text Request
Related items