Research Of K-means Clustering Algorithm

Posted on:2008-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:C Feng

Full Text:PDF

GTID:2178360242967567

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Clustering is one of the most important technologies of data mining, which is used to discover unknown classification in data set. As it has a long history of research, the importance of clustering is affirmed by people. Clustering algorithms is one of the most important algorithms which is researched extensively in machine learning, data mining and pattern recognition. It has important effect on identify intra-connection between objects. Clustering is applied in sound recognition, character recognition of pattern recognition and so on. Clustering algorithms in machine learning are applied in image segmentation and image processing which can be used to deal with data compression and information search. Another important application is applied in data mining, space database, sequence and anomaly data analysis and other fields such as statistic, biology, geognosy, geography and market.This paper is about the research of K-means. At first, some related concepts of clustering are given. The chief point of the paper is the research on K-means. K-means, O(n) time complexity, is a partition method that it is easy to use and can work well with large data set. But there are some drawbacks as follows: defines clustering numbers K and initial centroids in advance; sensitive to the selected initial centroids; easier to get into local optimization and come into being round-shape clusters. This paper works on how to solve these problems. It proposes an improved algorithm MMDBK (Max-Min and Davies-Bouldin Index based K-means, MMDBK for short), based on little comparability between different clusters and large comparability in the same cluster, to set number K and the initial centroids. Davies-Bouldin Index is used to get the most suitable cluster number K and improved Max-Min distance is used to assure little comparability between different clusters. At last, KDD99 data set is used in the experiment to test the efficiency of MMDBK. The result shows that MMDBK is effective in intrusion detection.

Keywords/Search Tags:

Data Mining, Clustering Analysis, K-means, Intrusion Detection

PDF Full Text Request

Related items

1	Research Of K-means Clustering Algorithm
2	The Study Of Real-time Intrusion Detection Based On Data Mining
3	Research On Intrusion Detection Technology Based On Clustering Analysis
4	Research On Application Of Intrusion Detection Based On Improved FCM
5	Research On Intrusion Detection Based On Clustering Algorithm
6	The Research Of Algorithom On Data Mining And Application On Intrusion Detection
7	Research Of Intrusion Detection Based On Data Mining
8	Network Security Intrusion Detection Technology
9	Application Of Data Mining In Intrusion Detection System
10	Research And Application Of Clustering Algorithm Based On Feature Point Selection