Data Mining, Cluster Analysis Algorithm Research And Application

Posted on:2008-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yan

Full Text:PDF

GTID:2208360245462080

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Clustering is an important area of application for a variety of fields including data mining and is an important method of data partition or grouping. there are 5 kinds of clustering algorithm includes partitional algorithm, hierarchical algorithm, density-based algorithm, grid-based algorithm and model-based algorithm. But there are many disadvantages in these clustering algorithms, for example, working only on numeric values, efficiency, sensitive to initial starting conditions, sensitive to the order of data input, best solutions, relying on parameters inputed and so on.DBSCAN is a density-based clustering algorithm that can efficiently discover clusters of arbitrary shape and can effectively handle noise. But, there are two disadvantages eager to overcome. one is that it requires large volume of memory support, especially dealing with large-scale database. Another is that it requires determining the global parameter Eps. Once Eps is not appropriate, clustering quality will be reduced, especially when the cluster density and the distance between clusters are not even. In this paper, an improved DBSCAN algorithm is presented on the basis of data partitioning.K-means is a partitioning algorithm that constructs a partition of a database of n objects into a set of K clusters where K is an input parameter. Clustering use an iterative procedure, if this algorithm converges to one of numerous local minima, it terminates and outputs result. So it is obvious that outputs are especially sensitive to initial starting condition for random selections about K initial starting points, which will lead to bad solutions. In the classical k-means algorithm, the value of k must be confirmed in advance. It is difficulty to confirm accurately the value of k in reality. In this paper , a novel algorithm of choosing initial values for k-means document clustering is proposed.According to the exam scores of 48 second students of our department in 2005-2006 academic year, we classify them by using cluster analysis methods.

Keywords/Search Tags:

data mining, clustering, DBSCAN, K-means

PDF Full Text Request

Related items

1	Construct Of J2EE-Based Data Mining System And Research On Clustering Technology
2	The Study On The Clustering Algorithms
3	Data Mining, Cluster Analysis Algorithm Research And Application
4	A Study On Clustering-Based Method For Detecting Network Intrusions With New Types
5	Data Mining In The School Computer Room Information Management Application
6	The Study Of Application And Analysis About Clustering Algorithm In Data Mining
7	Research On Text Clustering Algorithm Based On DBSCAN
8	TDOA Localization Technique Based On Data Mining
9	Study And Application Of Clustering Analysis In Data Mining
10	Scmi-superviscd K-means Clustering Algorithm In Data Mining