Font Size: a A A

Variable Selection And Outlier Detection For Automated K-means Clustering

Posted on:2017-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WuFull Text:PDF
GTID:2348330503990903Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The social economic and information technology are developing faster and faster, so people pay more and more attention on data mining. However, clustering methods play a very important role in data mining. Thus, k-means clustering is widely used in so many clustering methods. It belongs to the clustering based on partition, and there are many problems when people use k-means clustering. First of all, we all know that,the variable selection and outlier detecting are the keys to clustering methods. In addition, k-means clustering has some problems of itself, such as the number of clustering and the selection of initial centroid.To solve the series of problems of k-means clustering, here we provide an automated k-means clustering combined with variable selection and outlier detection. The automated k-means clustering consists three processes:(1) select random sample from full data sets,apply Ward's clustering method and decide the number and initial center of the clustering depend on Mojena's rule;(2)apply VS-KM(Variable-selection heuristic for K-means Clustering) to select the suitable subset of variables to define cluster structure;(3)identify outliers by hybrid approach combining a clustering based approach and distance based approach.At last of the thesis, we will analyze data of Wiki questionnaire to show the effectiveness of the method provided.
Keywords/Search Tags:Automated k-means clustering, Hierarchical clustering, Variable election, Outlier detecting, VS-KM, Adjusted rand index
PDF Full Text Request
Related items