Font Size: a A A

Study Of Data Stream Anomaly Detection System

Posted on:2009-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:R H LiFull Text:PDF
GTID:2178360272960400Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, data stream which is a novel data structure has been widely used in our daily lives. Traditional databases have long been used for storing persistent data and querying those data offline. However, the past few years have witnessed an increasing amount of applications that produce data in the form of sequences. The online monitoring and analysis of data streams have been attracting increasingly attention in relevant area of database research.Nowadays, most ISP enterprises face the challenge of managing huge amount of network traffic data. In a telecom network, gathering and analyzing SNMP traffic data is one of most important method for administrators to manage network performance, find and solve network problems. In order to meet this requirement, we showcase RealMon, a real stream monitoring system aims at finding anomalies among thousands of network links. By the time we design and implement this system, we found that the data streams from telecom network are correlated with each other and those SNMP data contains a lot of data quality problems. Therefore, in this paper, we first put forward an algorithm to detect the outliers based on the change of correlation between streams and then we showcase a novel framework for data cleansing in real time. Based on these achievements, we demonstrate a real stream monitoring system, RealMon, which can analyze the SNMP data gathering from routers with heavy workload in online fashion. Our major contributions of this thesis include:1. A novel algorithm is proposed to detect the anomaly by continually monitoring the change of correlation between streams. It employs the method of Piecewise Aggregation Approximation to transform the raw data into character and finds the anomaly by calculating the Edit Distance between different streams. Extensive experiments are performed to verify the efficiency of our algorithm.2. The design of an extensible data stream cleaning framework is provided after we surveyed the common data stream quality problems. Our framework gains its extensibility by employing innovational modules so as to solve various problems separately. Some typical data cleaning algorithms are also implemented in this framework. 3. A data stream monitoring system, named RealMon, is implemented to detect anomalies among thousands of network links. Some renowned algorithms for data stream analysis are implemented in this system to monitor the huge amount of SNMP (Simple Network Management Protocol) messages, which are collected from routers in telecom backbone network. Some data cleansing algorithms are also integrated into the system to address the data quality problem among SNMP messages. The experiments show that the system could perform efficiently in the simulated environment.We believe our work is a good example of integrating theory with practice since we not only provide some key solution for anomaly detection and data cleansing, but also implement a novel system to detect anomalies among thousands of network links. Our work has great importance in data stream research area.
Keywords/Search Tags:Data stream, data quality, data stream cleansing, data stream anomaly detection system
PDF Full Text Request
Related items