Cluster Analysis Algorithm And Its Application

Posted on:2011-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:L L Xu

Full Text:PDF

GTID:2178360305455436

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

In this paper, cluster analysis algorithm and application of cluster analysis were generated on the background and current situation has been discussed, the issue of the clustering algorithm, discussed in detail. In the course of the study, we first study the existence of the necessity of clustering analysis and cluster analysis of the status of our country's development problems. In recent years, Internet and Wbe rapid spread of data processing technology, people have access to information means not only remain in the hand or computer, network use is also uncommon. But these tools can only do processing on the data surface, the internal structure of the relationship between the data are often not a solution, people need to find a quick solution.Data Mining (DataMninig:DM) is a very broad cross-disciplinary, it is the focus of data from a large number to find previously unknown, actionable information in the process. Cluster analysis applications involve many aspects, is very broad, including meteorological analysis, image processing, fuzzy control, computer vision, weather forecasting, pattern recognition, biomedical, chemical, food inspection, biological species classification, market segmentation, performance assessment and other areas, in people's life and work played an indispensable role. Cluster analysis of data mining is an important way. Clustering problem is actually a group of data divided into several groups, each group in the object has a great similarity between the different groups as a big difference. Between these groups to find an intrinsic link between the data. This process is actually one without supervision state to find the optimal division process. Accuracy of this process will be related to the follow-up analysis of the data, it should pay attention to verify the accuracy of cluster analysis and evaluation. Cluster validity assessment can refer to the following indicators:clustering quality measure, clustering algorithm and a data set for the degree of division of the best number of clusters.In ChapterⅢof cluster analysis explain the basic concepts and theorems. Clustering is applied on the data set of data grouped in some way, the nature of things with similar distinction to be classified. also a large number of data partitioning into groups Guocheng, that the objects into several classes, in the same data in a class Zhong high similarity between objects, Erbu kind of data object Chabiejiaotai. It found that the internal structure of the data set plays a very important role.Involved in the cluster analysis to the class definition, and some cluster analysis of the nature and characteristics of the problem, the definition of cluster analysis of distance and similarity coefficients to do a detailed presentation, through research, introduced including Ming's distance, Mahalanobis distance, lance distance, Jason's distance and diagonal distance and definition made equidistant, and several factors including the angle cosine similarity, correlation coefficient, index of similarity coefficient, nonparametric methods, and even out coefficient. In cluster analysis, two important properties of monotonicity and the concentration and expansion of the nature of space, using the example of the way, made a detailed introduction and notes.This clustering algorithm provided a total of six, including the delineation of clustering among the hierarchical clustering method and the density of CURE clustering algorithm and the BIRCH algorithm, partitioning clustering among the K-MEANS algorithm, CLARA algorithm, and based on density algorithm DBSCAN algorithm.Guha and others made in 1998, CURE algorithm. The method chosen a fixed number of data space, and a representative number of points common to indicate the appropriate category, so you can identify with complex shapes and different sizes of clustering, to find a more suitable outlier. CURE uses multiple points using the method on behalf of a cluster, you can better deal with these problems. And in processing large amounts of data when using random sampling, partition methods to improve its efficiency so that it can efficiently handle large amounts of data.BIRCH algorithm is specifically made for large-scale data sets gathered based hierarchical clustering algorithm, which combines the level of cohesion and iterative relocation method. First of all levels with the bottom-up algorithm, and then iterative relocation to improve the results. Its main idea is: to scan the database, stored in a memory in the initial clustering feature tree, and then clustering feature tree leaf node cluster.Partition method (partitioning methods) usually refers to a given database, including N elements, using the splitting method to construct the K groups, each group represents a cluster, K

Keywords/Search Tags:

Cluster analysis, class, distance, similarity coefficient algorithm

PDF Full Text Request

Related items

1	Research On Text Similarity Algorithm Based On WMD Distance
2	Analysis Of Hierarchical Architecture Based On Distance
3	Routing Mechanisms Based On Class-cluster For Named Data Network
4	Algorithm Research Of Cluster Analysis In Analytical Customer Relation Management
5	An analysis of semi-supervised learning with the Guelph Cluster Class algorithm
6	Improvement And Application Of K-means Algorithm
7	The Algorithm Of Filling Missing Data Based On Cluster Analysis
8	Study On The Class Merging Cluster Algorithm Based On FCM
9	Analysis Linux Cluster Technology And In The Application Of Distance Learning
10	Research On Cluster Algorithm Of Similarity Segmentation Based On Point Sorting