Font Size: a A A

Research On Outlier Detection In Evolving Data Streams

Posted on:2010-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HuFull Text:PDF
GTID:2178360275469140Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Along with the computer and network communication technology rapid development and the expansion of applications,data processed is no longer data collection with limited storage,but the evolving data streams characterized by short arrival time and dynamic change over time,in the fields of sensor network management,financial risk analysis,internet traffic management and network intrusion detection etc.Traditional database techniques can not use limited space to deal with mass and high-speed flow data,and obtain useful information in real-time.How to make accurately outlier detection in real-time to data in these scenes and achieve the relevant application requirements has become a hot research topic in data streams mining.Due to evolving data streams quickly arrive and can be traversed only once,one of the biggest challenges facing the outlier detection in data streams is how to capture the real time changes of data stream fastly, timely response and get approximate testing results.A comprehensive survey on the outlier detection in data streams is given.In order to effectively improve the speed and accuracy of detection,LOF algorithm and SR tree are united.An algorithm based on the decomposition of tensor is proposed to deal with high dimensional data streams.Based on analysis of existing research results,this paper models a distributed outlier detection in data streams.Two novel formal definitions of abnormal points are given.An outlier detection algorithm combined with data structure of micro cluster is designed.The contributions in this paper include the following aspects.SR_IncLOF algorithm is proposed to effectively improve the speed and accuracy of centralized and normal dimensional data streams detection.The algorithm quickly searches the approximate neighbor set of every data using the structure block of SR tree index and depicts outlier level by local outlier factor.The algorithm can effectively solve the the problems of rapid flow of data streams and traversal once,has lower complexity,and support normal dimensional data streams detection.For multi-dimension data streams,high dimensional index techniques about the decomposition of tensor are analyszed and an outlier detection algorithm is proposed.The algorithm views evolving streams as a tensor,decomposite the tensor,and approximate flow distribution of data and get the best approximation about data stream matrix through the self-adaptive sampling.Outlier detection technology is based on a kernel density.Through the abnormal distribution about distance and density,two novel formal definitions of abnormal points are given.Using the kernel density estimate techniques an algorithm is introduced to quickly get approximate flow distribution of data.The exponential decay technology is taken to solve the time evolution of flow data in this algorithm.An outlier detection algorithm combined with data structure of micro cluster is designed to process data partition problems.To sum up,aiming at the characteristics of evolving data streams, this paper proposes different solutions.The theoretical analysis and experiments show that algorithms proposed have higher precision and response rate,lower time complexity and space complexity,and are more applicable to evolving data streams.
Keywords/Search Tags:Evolving Data Streams, Outlier Detection, SR Tree, Local Outlier Factor, Tensor Decomposition, Kernel Density Estimate
PDF Full Text Request
Related items