Font Size: a A A

Data Mining, Cluster Analysis Algorithm Research And Application

Posted on:2008-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2208360245462080Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering is an important area of application for a variety of fields including data mining and is an important method of data partition or grouping. there are 5 kinds of clustering algorithm includes partitional algorithm, hierarchical algorithm, density-based algorithm, grid-based algorithm and model-based algorithm. But there are many disadvantages in these clustering algorithms, for example, working only on numeric values, efficiency, sensitive to initial starting conditions, sensitive to the order of data input, best solutions, relying on parameters inputed and so on.DBSCAN is a density-based clustering algorithm that can efficiently discover clusters of arbitrary shape and can effectively handle noise. But, there are two disadvantages eager to overcome. one is that it requires large volume of memory support, especially dealing with large-scale database. Another is that it requires determining the global parameter Eps. Once Eps is not appropriate, clustering quality will be reduced, especially when the cluster density and the distance between clusters are not even. In this paper, an improved DBSCAN algorithm is presented on the basis of data partitioning.K-means is a partitioning algorithm that constructs a partition of a database of n objects into a set of K clusters where K is an input parameter. Clustering use an iterative procedure, if this algorithm converges to one of numerous local minima, it terminates and outputs result. So it is obvious that outputs are especially sensitive to initial starting condition for random selections about K initial starting points, which will lead to bad solutions. In the classical k-means algorithm, the value of k must be confirmed in advance. It is difficulty to confirm accurately the value of k in reality. In this paper , a novel algorithm of choosing initial values for k-means document clustering is proposed.According to the exam scores of 48 second students of our department in 2005-2006 academic year, we classify them by using cluster analysis methods.
Keywords/Search Tags:data mining, clustering, DBSCAN, K-means
PDF Full Text Request
Related items