Font Size: a A A

Research Of K-means Clustering Algorithm

Posted on:2008-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:C FengFull Text:PDF
GTID:2178360242967567Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering is one of the most important technologies of data mining, which is used to discover unknown classification in data set. As it has a long history of research, the importance of clustering is affirmed by people. Clustering algorithms is one of the most important algorithms which is researched extensively in machine learning, data mining and pattern recognition. It has important effect on identify intra-connection between objects. Clustering is applied in sound recognition, character recognition of pattern recognition and so on. Clustering algorithms in machine learning are applied in image segmentation and image processing which can be used to deal with data compression and information search. Another important application is applied in data mining, space database, sequence and anomaly data analysis and other fields such as statistic, biology, geognosy, geography and market.This paper is about the research of K-means. At first, some related concepts of clustering are given. The chief point of the paper is the research on K-means. K-means, O(n) time complexity, is a partition method that it is easy to use and can work well with large data set. But there are some drawbacks as follows: defines clustering numbers K and initial centroids in advance; sensitive to the selected initial centroids; easier to get into local optimization and come into being round-shape clusters. This paper works on how to solve these problems. It proposes an improved algorithm MMDBK (Max-Min and Davies-Bouldin Index based K-means, MMDBK for short), based on little comparability between different clusters and large comparability in the same cluster, to set number K and the initial centroids. Davies-Bouldin Index is used to get the most suitable cluster number K and improved Max-Min distance is used to assure little comparability between different clusters. At last, KDD99 data set is used in the experiment to test the efficiency of MMDBK. The result shows that MMDBK is effective in intrusion detection.
Keywords/Search Tags:Data Mining, Clustering Analysis, K-means, Intrusion Detection
PDF Full Text Request
Related items