A Semi-supervised Kmeans Algorithm Based On Density Detection And Information Gain

Posted on:2016-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:D Guo

Full Text:PDF

GTID:2308330479477730

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

In recent years, The amount of information has increased explosively, huge amounts of data have become the norm. At the same time, we have little knowledge with so large amount of information. Traditional decision system can’t satisfy the urgent needs of people now. Data mining is one of the best ways to solve this problem. Cluster analysis is an important branch in the field of data mining. Semi-supervised clustering is a hot topic in clustering, by making full use of a small amount of labeled data, semi-supervised clustering algorithm takes advantages of both supervised learning and unsupervised learning, without marking a large amount of data. Semi-supervised clustering algorithm is easy to implement and closer to the actual application with high precision.This paper has the systematic research and improvement of the Seeded-kmeans clustering algorithm. Specific research work is organized as follows:(1) We discuss the background of data mining and technical support, and point out data mining’s study significance, application background and the future development. In order to have a better research on this topic, we make a presentation on kmeans algorithm and 2 kind of semi-supervised kmeans algorithm.(2)One disadvantage of Seeded-kmeans algorithm is that it equates the importance of dimension data for multidimensional data. So we introduce information gain to calculate the attributes’ weight before we do Seeded-kmeans. Another disadvantage of Seeded-kmeans is that it’s sensitive to isolated points and noisy points. So we use density detection to delete the isolated points and noise points thus improving the algorithm.(3)By synthesizing information gain and density detection, We get an improved algorithm. As it shows, the improved algorithm takes advantages of information and density detection. Experiments indicates that it works in higher cluster precision and small timecomplexity.Finally, we conclude the work and prospect for the future research direction.

Keywords/Search Tags:

Clustering Kmeans, Semi-supervised, Seeded-kmeans algorithm, Information gain, Density detection

PDF Full Text Request

Related items

1	A Study On Semi-supervised Clustering Algorithm Based On Domain Knowledge
2	Improvement Of Kmeans Clustering Algorithm And Its Application In Information Retrieval System
3	Research On Chinese Short Text Classification Based On Semi-Supervised Clustering
4	Mesh Simplification Algorithm Based On Kmeans
5	Research On Text ClusteringTechniques For Mobile Application Bug Report
6	An Improved Semi Supervised Clustering Of Given Density And Its Application In Lithology Identification
7	An Adaptive Semi-supervised Clustering Algorithm Based On Multiple Density Characteristics Of Data
8	Research On Intrusion Detection System Based On Classical Clustering Algorithm And Association Algorithm
9	Research And Implementation Of A Hybird Recommendation System Based On Auto Encoder And Canopy-Kmeans Algorithm
10	Research On Risk Degree-Based Safe Semi-Supervised Fuzzy Clustering Algorithm