Research Of Clustering Algorithms

Posted on:2008-05-11

Degree:Master

Type:Thesis

Country:China

Candidate:D Q Li

Full Text:PDF

GTID:2178360215959209

Subject:Computer application technology

Abstract/Summary:

Clustering is one of the most important technologies of data mining, which is used to discover unknown object class in database. As it has be researched extensively for many years, the main research concentrates on the clustering which is based on the distance, for instance is the most classical algorithm. There are many improvements, in view of the insufficiencies of k-means, has be made by a lot of scholars. Although some of these improvements causes certain enhancement in efficiency or the stability of k-means, but it does not means there are no work need to do, especially the initialization algorithm and stability of k-means wait for enhancement. This paper proposes an improved initialization algorithm based on extended binary sort tree of k-means, which enhanced the stability and veracity of the clustering result. On the other hand, Considering the clustering on large database, a new improvement for RA algorithm is put forward to increase the veracity of the results.The main jobs of the thesis are as followed:Firstly ,making researches on classical clustering algorithm called k-means, and proved the disadvantage of k-means by analyzing experiment.Secondly, in allusion to the k-means clustering result is not steady and easy to fall into partially most superior but not overall situation most superior, a new initialization algorithm of k-means, which is based on extended binary sort tree, is put forward in this thesis. A large number of experiments are carried out to validate its performance.Thirdly, the sampling is common used in clustering when the database is very large, the RA algorithm is a classical initializing algorithm based on sampling. This thesis analyzed the merit and shortcoming of RA and improved it.Finally, on the base of former work, proposed the method of analyzing the stock by the listed company finance data based on clustering, and give a demonstration with the example.

Keywords/Search Tags:

Data Mining, Clustering Analysis, K-means Algorithm, Extended Binary Sort Tree

Related items

1	K-means Based On Binary And Svm Decision Tree Algorithm Of Data Mining Research
2	Research And Application Of Data Mining In In-Surance Customer Data
3	Research And Improvement Of K - Means Clustering Algorithm
4	Research On Incremental Clustering Algorithm In Data Mining
5	KK-means Clustering Method Improved Based-on Minimum Cost Spanning Tree And Its Applications In Seismic Data
6	Improvement And Empirical Study Of K-means Ciustering Algoirthm On Panel Data Analysis
7	The Improvement On The Fuzzy C-means Algorithm
8	Scmi-superviscd K-means Clustering Algorithm In Data Mining
9	Research Of Improving For K-means Clustering Algorithm
10	The Research Of K-means Clustering Algorithm In Data Mining