Variable Selection And Outlier Detection For Automated K-means Clustering

Posted on:2017-06-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Wu

Full Text:PDF

GTID:2348330503990903

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

The social economic and information technology are developing faster and faster, so people pay more and more attention on data mining. However, clustering methods play a very important role in data mining. Thus, k-means clustering is widely used in so many clustering methods. It belongs to the clustering based on partition, and there are many problems when people use k-means clustering. First of all, we all know that,the variable selection and outlier detecting are the keys to clustering methods. In addition, k-means clustering has some problems of itself, such as the number of clustering and the selection of initial centroid.To solve the series of problems of k-means clustering, here we provide an automated k-means clustering combined with variable selection and outlier detection. The automated k-means clustering consists three processes:(1) select random sample from full data sets,apply Ward's clustering method and decide the number and initial center of the clustering depend on Mojena's rule;(2)apply VS-KM(Variable-selection heuristic for K-means Clustering) to select the suitable subset of variables to define cluster structure;(3)identify outliers by hybrid approach combining a clustering based approach and distance based approach.At last of the thesis, we will analyze data of Wiki questionnaire to show the effectiveness of the method provided.

Keywords/Search Tags:

Automated k-means clustering, Hierarchical clustering, Variable election, Outlier detecting, VS-KM, Adjusted rand index

PDF Full Text Request

Related items

1	An Improved Hierarchical Clustering And Outlier-detecting Algorithm And Application On The Data-mining Platform
2	Study On The Application Of The Improved K-means Clustering Algorithm In Image Retrieval
3	New Non-hierarchical Clustering Objetives And The Algorithms To Optimal Clustering
4	Optimization Of K-means Clustering Algorithm For Data With Outliers
5	Optimization Of K-MEANS Clustering Algorithm For Data With Outliers
6	The Study And Development Of Hierarchical-K-means-Based Clustering Algorithm
7	Research On Text Clustering Based On Division And Hierarchy
8	Cluster Study Based On Functional Magnetic Resonance Imaging Data
9	Research On Hybrid Ant Colony Clustering Algorithm
10	Clustering-based And Density Outlier Detection Method