Research And Application Of Improved K-means Algorithm In Multivariate Analysis System

Posted on:2017-11-03

Degree:Master

Type:Thesis

Country:China

Candidate:C H Wu

Full Text:PDF

GTID:2428330566953020

Subject:Computer science and technology

Abstract/Summary:

PDF Full Text Request

Cluster analysis is an important step of data mining.Cluster analysis can find the structure and properties characteristics of unknown data,it is an unsupervised data analysis process.With the rapid development of the information society of today,data analysis has been more and more important guiding significance method for the production of life.K-means algorithm is a traditional method of clustering algorithm based on partition,because of its simplicity,efficiency and scalability that has been widely studied and applied.K-means algorithm exist some problems,which need to give the number of clusters.K-means randomly selected initial cluster centers,leading to instability result and decreased efficiency of the algorithm,always getting local optimum result.Furthermore,when dealing with a large number of high-dimensional data,the traditional K-means algorithm is not adaptable.To solve these problems,this paper made some research and improvement.Specific tasks of this paper are as follows:1?Because the traditional K-means algorithm can not determine the number of clusters,this paper studies using cluster validity index to determine the number of clusters,and mainly introduced the DB Index,CH Index and XB Index,with many experiences found the DB Index have a good effect.2?On the initial clustering center selection,This paper fully studies the K-means algorithm clustering process,found that the selection of the initial clustering center should be separated and close to the actual cluster centers as much as possible.This paper uses a radius to divide,and select the initial cluster centers in order.Through multiple experiments,and the experimental results show that the improved algorithm has a good improvement in the clustering effectiveness and the efficiency of algorithm.3?In the multivariate analysis system,K-means algorithm used for the clustering analysis module.This paper presents a method to calculate the distance between mixed properties.In addition,limited by the computing ability of single computer,when dealing with the massive high-dimensional data,single computer can not finish the cluster analysis.K-means clustering algorithm has good parallel computing features.Therefore,this paper implements the K-means algorithm in Hadoop.By multiple comparison experiments,in Hadoop implements improved K-means algorithm has good lift than traditional K-means algorithm efficiency.

Keywords/Search Tags:

Cluster analysis, K-means algorithm, The initial cluster centers

PDF Full Text Request

Related items

1	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
2	Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm
3	Research And Application Of K-means Clustering Algorithm
4	Research And Application Of Improved K-means Algorithm In Multivariate Analysis System
5	Research On Initial Cluster Centers Choice Algorithm And Clustering For Imbalanced Data
6	Research And Application Of Fuzzy Clustering Algorithm
7	Research On Advertisement Recommendation System Based On Data Mining
8	Improvements And Implementation Of K-means Clustering Algorithm
9	Improved K-means Clustering Based On Genetic Algorithm
10	Differentially Private K-means Clustering