Font Size: a A A

Research On Stream Data Anomaly Detection Platform Based On Concept Drift

Posted on:2020-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:B XuFull Text:PDF
GTID:2428330590473234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of sensor technology,Internet technology,and the implementation of IPV6,the Internet of Things will push Internet technology into a new era.In the world of Internet of Everything,users will put more demands on the collection and sharing of data,so that the amount of data grows at a rapid rate.Compared with traditional batch data,stream data as a new form of data has the following three characteristics: First,it has strong real-time requirements for data processing;Second,data distribution may change with time;The data is extremely large.For example,in the fields of financial stocks,network traffic monitoring,user shopping browsing information,wireless sensor networks,etc.,data exists in the form of streams.Due to the wide application of streaming data in actual scenarios,relevant research on the reliability of streaming data has quickly gained people's attention.As an important part of reliability analysis,anomaly detection has become one of the research hotspots of streaming data.Anomaly detection has a wide range of application scenarios,such as intrusion detection,log analysis,complex system fault detection,and smart home alarms.There is a big difference between the anomaly detection of stream data and the traditional data anomaly detection.The most important factor is that the stream data has the characteristic drift characteristics.Concept drift,that is,data distribution,changes over time.In the traditional anomaly detection algorithm,it is generally assumed that the data distribution is stable.Therefore,if the traditional anomaly detection algorithm is directly applied to the stream data,the concept drift cannot be identified and processed,and the detection performance is continuously deteriorated.In this paper,we will study the anomaly detection algorithm of stream data for the data flow with conceptual drift,and implement it on the Storm platform.The main research content of this paper consists of the following three parts:(1)Research on Anomaly Detection Algorithm Based on Concept DriftThis part of the study combines clustering,Markov model and window theory to optimize the hysteresis problem of the new concept detection in the existing stream data anomaly detection algorithm.(2)Research on Data Completion Algorithm Based on Context InformationThe problems solved in this part mainly include two categories: one is to classify the data in the state of concept drift;the other is to classify the data attributes in combination with context information when there are too many data attributes missing.(3)Storm distributed real-time anomaly detection platformCombining the theoretical studies in 1 and 2,a real-time anomaly detection algorithm is deployed on the Storm platform.The experimental results in this paper show that the improved anomaly detection algorithm has a more stable clustering effect and a stronger ability to acquire new concepts.The data completion algorithm with reference to the context information solves the problem that the original classification features are too much to be classified.
Keywords/Search Tags:concept drift, anomaly detection, clustering algorithm, Markov model, sliding window
PDF Full Text Request
Related items