Study On Clustering Analysis And Clustering Result Evaluating Algorithms

Posted on:2007-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Chen

Full Text:PDF

GTID:2178360185477469

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the fast development in IT, the need of analysis and manage a tremendous amount of data becomes more and more instant. The same situation exists in data mining field. Data mining is a non-trivial extraction process of implicit, previously unknown, and potentially useful information. Cluster analysis is a primary method for data mining. The main task of cluster analysis is to partition data points into several clusters. Data points that are close to each other will be sent to the same cluster and data points that far from each other will be sent to different clusters.Many Clustering algorithms have been proposed, and they are classified into several types: partitioning type, hierarchy type, mixed type and density-based type. They have their own virtues and disadvantage respectively. Each algorithm is only fit for certain data set and it is very hard to evaluate each clustering result. Notwithstanding, some clustering result evaluating algorithms can measure the quality of clustering result, they cannot adjust and update clustering algorithm to perform a better clustering operation. In addition, a typical problem is that it is very hard to tune clustering parameters appropriately. The figure of data set is unknown before clustering operation and different data set need different parameters to cluster it. It will output a low quality clustering result if user set unsuitable parameters before clustering operation. So it is crucial for clustering analysis to select appropriate parameters.This thesis proposes an effective clustering mode and a novel clustering result evaluating mode. The clustering mode has two integral parameters, which are between zero and certain integer. The evaluating mode evaluates clustering results produced by clustering mode and gives each a mark, We can assign parameters different value pairs to cluster data set repeatedly and evaluate each clustering result. Finally, chose a clustering result that has the highest mark among all the ones. This algorithm is called SECDU. Although SECDU can get the best clustering result, the efficiency, however, is very low because there are a lot of rounds of clustering operations involved. An improved algorithm named SECUDF can solve this problem. It can reduce clustering operation times greatly by applying "hill-climbing algorithm". Both SECDU and SECDUF can tune parameters automatically and output high quality clustering result...

Keywords/Search Tags:

clustering analysis, density unit, quality of clustering result, evaluating, hill-clibming algorithm

PDF Full Text Request

Related items

1	Research On Density Peaks Clustering
2	Research On Clustering Algorithm Based On Density Analysis
3	Multi-Improvement On Density-Based Clustering Algorithm And Its Applications
4	Research Of Density-based Clustering Algorithm By KNN
5	Research On Clustering Algorithm Based On Grid Point Density Estimation
6	Research Of Web Text Clustering Technology And Clustering Result Visualization
7	Research On Dynamic Clustering And Incremental In Data Mining
8	Algorithm For Clustering Data Streams Based On Density Units Covered
9	An Improved FCM Algorithm And Its Application Research
10	Research And Improvement On Density-Based Clustering Algorithm