Font Size: a A A

Study On Distributed Clustering And Its Application In Intrusion Detection

Posted on:2009-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:M M ZhengFull Text:PDF
GTID:2178360245975960Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering is an important task of data mining and has attracted tremendous interest among researchers. It is practical for use in many fields such as finance, telecommunication, insurance business, market analysis, anomaly detection, network security, science decision, and so on. Existing clustering algorithms are not suitable for distributed environment. Distributed clustering is a challenge research topic due to variety of real-life constrains including bandwidth, the storage of the site memory, etc. It decreases clustering efficiency in evidence and results in hugeness dataset while centralize all the local data. Distributed clustering and it applied in intrusion detection are studied, and innovative contributions are achieved as follows:1. An effective distributed K-Means clustering algorithm DK-Means is proposed to improve efficiency of the distributed clustering algorithm K-DMeans. DK-Means, which only broadcasts clusters information of each site in the distributed environment, effectively decrease network overload. Both theoretical analysis and experimental results show that the efficiency of DK-Means is superior to K-DMeans and it can reach the same clustering quality as K-Means.2. Introduce a distributed clustering algorithmα-DK-Means. It automatically partitions the data set into a reasonable number of clusters by dividing and combing clusters. Experimental results show that the efficiency ofα-DK-Means is superior to others.3. An effective density based distributed clustering algorithm DBDC~* is proposed to improve efficiency of the distributed clustering algorithm DBDC. It effectively decrease network overload, discover clusters with arbitrary shape and improve the quality of global clustering. Experimental results show that the efficiency of DBDC~* is superior to DBDC.4. An effective anomaly detection algorithm based on clustering is proposed to deal with mixed attributes. This algorithm, which gets cluster models by using the clustering algorithm on unlabeled training data, defines the distance between each pair of values in one categorical attribute, could deal with both the numerical and categorical attribute efficiently. Theoretical analysis shows that it holds not only the essence between different values in one categorical attribute, but also the originality dimensions of the dataset. At last, experiments results show that our method can detect intrusions more efficiently while maintaining a low false positive rate.5. A novel effective distributed anomaly detection algorithm ID-DC based on clustering is proposed to realize the DIDS. Algorithm ID-DC, which first gets cluster models by using the distributed clustering algorithm on unlabeled training data and then labels these models through algorithm Double-Reference, overcomes the drawbacks of relying on labeled training data which most current anomaly-based intrusion detection depend on and expects to automatically partition the data set into a reasonable number of clusters. Experiments results show that our method can detect intrusions efficiently while maintaining a low false positive rate.6. An actual intrusion detection system is implemented with JAVA language. Distributed clustering algorithms are used for training intrusion detection models. Experimental results on the system show that the presented algorithms are effective and efficient in detecting attacks.
Keywords/Search Tags:datamining, distributed datamining, clustering, distributed clustering, intrusion detection, distributed intrusion detection
PDF Full Text Request
Related items