Font Size: a A A

Cluster Analysis Algorithm And Its Application

Posted on:2011-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:L L XuFull Text:PDF
GTID:2178360305455436Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In this paper, cluster analysis algorithm and application of cluster analysis were generated on the background and current situation has been discussed, the issue of the clustering algorithm, discussed in detail. In the course of the study, we first study the existence of the necessity of clustering analysis and cluster analysis of the status of our country's development problems. In recent years, Internet and Wbe rapid spread of data processing technology, people have access to information means not only remain in the hand or computer, network use is also uncommon. But these tools can only do processing on the data surface, the internal structure of the relationship between the data are often not a solution, people need to find a quick solution.Data Mining (DataMninig:DM) is a very broad cross-disciplinary, it is the focus of data from a large number to find previously unknown, actionable information in the process. Cluster analysis applications involve many aspects, is very broad, including meteorological analysis, image processing, fuzzy control, computer vision, weather forecasting, pattern recognition, biomedical, chemical, food inspection, biological species classification, market segmentation, performance assessment and other areas, in people's life and work played an indispensable role. Cluster analysis of data mining is an important way. Clustering problem is actually a group of data divided into several groups, each group in the object has a great similarity between the different groups as a big difference. Between these groups to find an intrinsic link between the data. This process is actually one without supervision state to find the optimal division process. Accuracy of this process will be related to the follow-up analysis of the data, it should pay attention to verify the accuracy of cluster analysis and evaluation. Cluster validity assessment can refer to the following indicators:clustering quality measure, clustering algorithm and a data set for the degree of division of the best number of clusters.In Chapterâ…¢of cluster analysis explain the basic concepts and theorems. Clustering is applied on the data set of data grouped in some way, the nature of things with similar distinction to be classified. also a large number of data partitioning into groups Guocheng, that the objects into several classes, in the same data in a class Zhong high similarity between objects, Erbu kind of data object Chabiejiaotai. It found that the internal structure of the data set plays a very important role.Involved in the cluster analysis to the class definition, and some cluster analysis of the nature and characteristics of the problem, the definition of cluster analysis of distance and similarity coefficients to do a detailed presentation, through research, introduced including Ming's distance, Mahalanobis distance, lance distance, Jason's distance and diagonal distance and definition made equidistant, and several factors including the angle cosine similarity, correlation coefficient, index of similarity coefficient, nonparametric methods, and even out coefficient. In cluster analysis, two important properties of monotonicity and the concentration and expansion of the nature of space, using the example of the way, made a detailed introduction and notes.This clustering algorithm provided a total of six, including the delineation of clustering among the hierarchical clustering method and the density of CURE clustering algorithm and the BIRCH algorithm, partitioning clustering among the K-MEANS algorithm, CLARA algorithm, and based on density algorithm DBSCAN algorithm.Guha and others made in 1998, CURE algorithm. The method chosen a fixed number of data space, and a representative number of points common to indicate the appropriate category, so you can identify with complex shapes and different sizes of clustering, to find a more suitable outlier. CURE uses multiple points using the method on behalf of a cluster, you can better deal with these problems. And in processing large amounts of data when using random sampling, partition methods to improve its efficiency so that it can efficiently handle large amounts of data.BIRCH algorithm is specifically made for large-scale data sets gathered based hierarchical clustering algorithm, which combines the level of cohesion and iterative relocation method. First of all levels with the bottom-up algorithm, and then iterative relocation to improve the results. Its main idea is: to scan the database, stored in a memory in the initial clustering feature tree, and then clustering feature tree leaf node cluster.Partition method (partitioning methods) usually refers to a given database, including N elements, using the splitting method to construct the K groups, each group represents a cluster, K
Keywords/Search Tags:Cluster analysis, class, distance, similarity coefficient algorithm
PDF Full Text Request
Related items