Font Size: a A A

The Research On Data Stream Outlier Detection On The Context Of Concept Drift

Posted on:2018-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:F HuangFull Text:PDF
GTID:2348330512483326Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Outlier detection is a basic and important technology in data mining which is widely applied in network anomaly detection,credit card fraud,video surveillance and other fields.To present,many outlier detection methods have been proposed from different viewpoints such as distance,density,angle etc.These algorithms are mostly designed for static data sets which can be randomly accessed,their motivation is trying to solve problems about detection efficiency,high dimension of data and detection accuracy etc.But as the time for the era of big data is coming,more and more data sets appear in form of data stream,such as data of online transactions,sensor data etc.Data stream is a potential infinite sequence of data points which have special characteristics,such as high speed,uncertainty,dynamic evolution,these characteristics make outlier detection for stream data more challenging than that for regular(static)data.This thesis considers outlier detection on data streams as the research target,mainly focusing on the problem about concept drift.Current methods of outlier detection on data streams are mainly the extension of traditional outlier detection algorithms for data stream or take advantage of sliding window to focus on recent data and simplify data stream.However,these methods can't capture current concept accurately,what's more important,they don't consider the internal relation between concept drift and outlier detection.In light of these problems,this thesis firstly proposes a data stream outlier detection algorithm which is based on prototype learning.It is able to maintain dynamically representative data instance for data stream instead of discarding historical data like time window model.And then from the view of the reciprocal relationship between data stream outlier detection and concept drift detection,the thesis proposes a novel data stream classification framework which can couple this two modules together by quantifying the degree of new instances in real time.The two work cover the core of research content of this thesis which possesses the following major contributions:Firstly,this thesis proposes a prototype-based data stream outlier detection algorithm.It can dynamically learn the representativeness of data on data streams by considering the density of data,and then take some strategies such as synchronization compress to maintain a two-level data collection which is representative for current data stream.According to the collection,the algorithm can detect distance-based outlier.Secondly,a novel data stream classification framework is proposed by this thesis.This framework contains a data stream outlier detection method which is based data compress and a concept drift detection method which can capture current concept by comparing model arguments which is learning from two continuous time windows.And then the framework combines these two algorithms by quantizing the degree of data instance on data stream.The two algorithms promote each other,form a virtuous circle,and ultimately get satisfying outlier detection results.Finally,the thesis does many experiments by using a series of data set and many state-of-art algorithms,the results verify the efficiency and effectiveness of the proposed algorithms.
Keywords/Search Tags:outlier detection, data stream, dynamic, concept drift, prototype learning
PDF Full Text Request
Related items