Font Size: a A A

Study On Partition-based Clustering Algorithm

Posted on:2006-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:B J ZhengFull Text:PDF
GTID:2178360182477466Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer data acquisition tool as well as the relational database technology, at present various trades need to store the mass of data. The traditional data analysis methods have some difficulties in dealing with the mass data, resulting in more and more serious data disasters. Data mining technology has provided an effective way in solving this problem.Data Mining (Knowledge Discovery in Database) means that the knowledge and information is discovered from the dataset, which is connotative, useful and undiscovered. Compared with the traditional approach statistics, data mining technology concerns multi-subjects congregates the research results of AI,model identification, database, computer study and the information management system. Data mining is a newly-established frontier subject. Data mining is being used extensively and its future application is bright.Clustering analysis is an important part of the whole Data Mining system. Clustering is the process of grouping the data into classes or clusters so that objects within the same cluster have high similarity in comparison to one another, but are very different to objects in other clusters. Dissimilarities are assessed base on the attribute values describing the objects. Clustering processes are always carried out in the condition without pre-known knowledge, so the main task is to solve that how to get the clustering result in this premise. The following problems will be discussed:(1) What is data mining is first discussed, including the emergence background and definition of data mining. Then some important subjects of data mining at home and abroad are introduced, such as association rules, data generalization, data classification, data clustering etc. Finally, some challenges in the research and application of data mining are discussed, which contribute to the advanced development of data mining.(2) Comparison of the existing clustering algorithms.(3) Amelioration of the Partition-based Method. Partition-based method is a practical cluster way to cluster data set, but the efficiency of this method is strongly depend on the pre-known knowledge, especially it is necessary for this method to give the clusters' number in advance. A new method will be given in this thesis about how to deal with this problem. It avoids the awkward conditions that need user to provide parameters that is difficult to decide. What's more it can discover arbitrary shape cluster.
Keywords/Search Tags:Data Mining, Cluster Analysis, Partition Method
PDF Full Text Request
Related items