Font Size: a A A

Data Stream Clustering Algorithm And Its Application

Posted on:2012-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z H YuFull Text:PDF
GTID:2218330338463063Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, with the rapid development of information technology, a new data model called the data stream appears. It often arises from dynamic environment such as user clicking on the web, network intrusion detection, real-time monitoring systems or wireless sensor networks. Compared to traditional data sets, these vast amounts of data streams have fast, continuity, variety, infinity and other characteristics. So data stream mining is facing new demands and challenges. Cluster analysis as a data mining tool is an important topic, because it makes the data without marker group into different classes in accordance with the specified attributes, and has been widely studied and highly regarded in the near future. In this paper, we do research on data stream clustering algorithm and anomaly detection . The main tasks are described as follows:(1) We make a summary of the data flow model and related concepts of cluster, and describe the special requirements and arithmetic of current data stream clustering; the definition of anomaly detection, the existing methods and current challenges are also illustrated latterly .(2) In the high-speed network, data streams with high-speed and sudden features make high-speed network anomaly detection become a difficulty. A stream clustering algorithm based on SSClu tree for high-speed flow anomaly detection is proposed. The algorithm firstly introduces an SSClu tree to maintain summary information of the data stream; and as for high-speed characteristics of data stream, we use the pre-aggregation and the caching mechanism. The pre-aggregation is a process of beforehand cluster before data flow objects was inserted into SSClu tree clustering in order to dispose the situation of high-speed data stream; the caching mechanism temporarily is used to save the flow of data currently being processed to solve the arriving burst data stream. The simulation indicates that the algorithm can not only handles high-speed data streams in a timely manner, but also has a high clustering accuracy and ensures the high accuracy of anomaly detection.(3) Taking into account constraints of distributed environment and energy consumption in the wireless sensor network (Wireless Sensor Network, WSN), a clustering algorithm is proposed based on similarity flocking model stream (SCBSF) to solve the outlier detection for Wireless sensor networks. This algorithm use a flocking model simulating swarm activity to form self-organizing data clustering to make the algorithm more suitable for distributed environments of large data collected sets ; it also completes clustering of arbitrary shape by flocking rule without thinking of the traditional two-stage clustering to reduce the algorithm computation and storage complexity; taking the energy consumption into account in WSN, we reduce communication energy through collection nodes which use the initial cluster information .The initial cluster information is generated by the temporary similar data characteristics . Simulation shows that the algorithm not only has a good results of outlier detection, but also reduces the clustering process data calculation and transmission of energy consumption.
Keywords/Search Tags:Data Stream Model, Clustering Algorithm, Anomaly Detection, High-speed Stream, Wireless Sensor Networks
PDF Full Text Request
Related items