Font Size: a A A

The Research And Realization Of Clustering Algorithm In Data Streams Mining

Posted on:2013-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2248330371475287Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the development of Internet and distribution computing,data stream a new kind of data type comes out, It is widely used in the field of Internet information monitoring, banks and securities analysis, unlimited sensor networks, weather forecasts and meteorological monitoring. Traditional data mining techniques can only be applied to static or a small number of data sets,but can not be easily extended to adapt to the fast, unlimited, continuous changes in the data stream applications.So,research and application of the theory of data stream becomes important, especially for data stream clustering algorithm and network intrusion detection.This article first analyzes the domestic and international research status in data stream clustering algorithm, both the advantages and disadvantages of static data mining algorithms and data stream clustering algorithm, and laid a foundation for later algorithm.Then, through in-depth study of existing data stream clustering algorithm, implementation a data stream clustering algorithm based on density p-Stream. According to Minkowski distance and the characteristics of the cosine similarity measure, introducing two new concepts:frequency and data summary information, also proposing a measure of the similarity methods of data multi-property. p-Stream using a tree structure and dynamic hash table for storing nodes and pointers to solve time and space complexity problem. In order to make the algorithm can be completed in a certain size of memory,propose a density threshold function for setting data stream clustering parameter. Off level clustering algorithm operating efficiency problem solved by a method based memory sampling to discover cluster.Finally, according to the characteristics of the data stream, design a clustering algorithm for data stream, network intrusion detection framework, and through the background of machine learning methods for real-time extension of abnormal data dictionary. Using KDD CUP1999dataset, confirmed the superiority of the proposed algorithm and achieve the desired results.
Keywords/Search Tags:Data Stream Mining, Data Stream Clustering Algorithm, ρ-Stream Algorithm
PDF Full Text Request
Related items