Font Size: a A A

Research Of Clustering Algorithms

Posted on:2008-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:D Q LiFull Text:PDF
GTID:2178360215959209Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering is one of the most important technologies of data mining, which is used to discover unknown object class in database. As it has be researched extensively for many years, the main research concentrates on the clustering which is based on the distance, for instance is the most classical algorithm. There are many improvements, in view of the insufficiencies of k-means, has be made by a lot of scholars. Although some of these improvements causes certain enhancement in efficiency or the stability of k-means, but it does not means there are no work need to do, especially the initialization algorithm and stability of k-means wait for enhancement. This paper proposes an improved initialization algorithm based on extended binary sort tree of k-means, which enhanced the stability and veracity of the clustering result. On the other hand, Considering the clustering on large database, a new improvement for RA algorithm is put forward to increase the veracity of the results.The main jobs of the thesis are as followed:Firstly ,making researches on classical clustering algorithm called k-means, and proved the disadvantage of k-means by analyzing experiment.Secondly, in allusion to the k-means clustering result is not steady and easy to fall into partially most superior but not overall situation most superior, a new initialization algorithm of k-means, which is based on extended binary sort tree, is put forward in this thesis. A large number of experiments are carried out to validate its performance.Thirdly, the sampling is common used in clustering when the database is very large, the RA algorithm is a classical initializing algorithm based on sampling. This thesis analyzed the merit and shortcoming of RA and improved it.Finally, on the base of former work, proposed the method of analyzing the stock by the listed company finance data based on clustering, and give a demonstration with the example.
Keywords/Search Tags:Data Mining, Clustering Analysis, K-means Algorithm, Extended Binary Sort Tree
PDF Full Text Request
Related items