Font Size: a A A

Study On Clustering Analysis And Clustering Result Evaluating Algorithms

Posted on:2007-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ChenFull Text:PDF
GTID:2178360185477469Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development in IT, the need of analysis and manage a tremendous amount of data becomes more and more instant. The same situation exists in data mining field. Data mining is a non-trivial extraction process of implicit, previously unknown, and potentially useful information. Cluster analysis is a primary method for data mining. The main task of cluster analysis is to partition data points into several clusters. Data points that are close to each other will be sent to the same cluster and data points that far from each other will be sent to different clusters.Many Clustering algorithms have been proposed, and they are classified into several types: partitioning type, hierarchy type, mixed type and density-based type. They have their own virtues and disadvantage respectively. Each algorithm is only fit for certain data set and it is very hard to evaluate each clustering result. Notwithstanding, some clustering result evaluating algorithms can measure the quality of clustering result, they cannot adjust and update clustering algorithm to perform a better clustering operation. In addition, a typical problem is that it is very hard to tune clustering parameters appropriately. The figure of data set is unknown before clustering operation and different data set need different parameters to cluster it. It will output a low quality clustering result if user set unsuitable parameters before clustering operation. So it is crucial for clustering analysis to select appropriate parameters.This thesis proposes an effective clustering mode and a novel clustering result evaluating mode. The clustering mode has two integral parameters, which are between zero and certain integer. The evaluating mode evaluates clustering results produced by clustering mode and gives each a mark, We can assign parameters different value pairs to cluster data set repeatedly and evaluate each clustering result. Finally, chose a clustering result that has the highest mark among all the ones. This algorithm is called SECDU. Although SECDU can get the best clustering result, the efficiency, however, is very low because there are a lot of rounds of clustering operations involved. An improved algorithm named SECUDF can solve this problem. It can reduce clustering operation times greatly by applying "hill-climbing algorithm". Both SECDU and SECDUF can tune parameters automatically and output high quality clustering result...
Keywords/Search Tags:clustering analysis, density unit, quality of clustering result, evaluating, hill-clibming algorithm
PDF Full Text Request
Related items