Font Size: a A A

A Semi-supervised Kmeans Algorithm Based On Density Detection And Information Gain

Posted on:2016-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:D GuoFull Text:PDF
GTID:2308330479477730Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years, The amount of information has increased explosively, huge amounts of data have become the norm. At the same time, we have little knowledge with so large amount of information. Traditional decision system can’t satisfy the urgent needs of people now. Data mining is one of the best ways to solve this problem. Cluster analysis is an important branch in the field of data mining. Semi-supervised clustering is a hot topic in clustering, by making full use of a small amount of labeled data, semi-supervised clustering algorithm takes advantages of both supervised learning and unsupervised learning, without marking a large amount of data. Semi-supervised clustering algorithm is easy to implement and closer to the actual application with high precision.This paper has the systematic research and improvement of the Seeded-kmeans clustering algorithm. Specific research work is organized as follows:(1) We discuss the background of data mining and technical support, and point out data mining’s study significance, application background and the future development. In order to have a better research on this topic, we make a presentation on kmeans algorithm and 2 kind of semi-supervised kmeans algorithm.(2)One disadvantage of Seeded-kmeans algorithm is that it equates the importance of dimension data for multidimensional data. So we introduce information gain to calculate the attributes’ weight before we do Seeded-kmeans. Another disadvantage of Seeded-kmeans is that it’s sensitive to isolated points and noisy points. So we use density detection to delete the isolated points and noise points thus improving the algorithm.(3)By synthesizing information gain and density detection, We get an improved algorithm. As it shows, the improved algorithm takes advantages of information and density detection. Experiments indicates that it works in higher cluster precision and small timecomplexity.Finally, we conclude the work and prospect for the future research direction.
Keywords/Search Tags:Clustering Kmeans, Semi-supervised, Seeded-kmeans algorithm, Information gain, Density detection
PDF Full Text Request
Related items