Font Size: a A A

Online Outlier Detection In Parameter Space Over Larg-scale Data Streams

Posted on:2020-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:G Z ZhaoFull Text:PDF
GTID:2428330590478177Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,massive-scale high-speed real-time data streams have been generated in diverse applications such as social networks,location-based online services,mobile payments,traffic monitoring,and various smart phone applications.Real-time response to such complex outlier analytics in high-speed streaming data has been recognized as critical for many domains.In many complex application environments,due to different requirements,outlier conditions in different parameter settings often need to be monitored in real time,which poses a big challenge to fast query response.Therefore,this paper proposes following work for the users' abnormal check-in in the mobile social network and the streaming outlier detection problem in parameter space:First,we propose an online check-in outlier detection method based on user's mobility behaviors for mobile social networks.Firstly,we propose two outlier models on the basis of distance-based outlier,History location based Outlier(H-Outlier)and Friendship based Outlier(F-Outlier).Secondly,we give the optimization detection algorithms that corresponds these two outlier models respectively.For H-Outlier,we propose an optimized algorithm called H-Opt,which utilize propose status model of check-in location and optimized neighbor searching mechanism to reduce computation cost.For F-Outlier,we propose a trigger-based optimized detection algorithm called F-Opt,which transform the continuous online outlier detection to trigger-based outlier detection.Finally,extensive experiments on real-world check-in dataset demonstrate the effectiveness and efficiency of our proposed methods.Experimental results show that F-Opt significantly reduces error of outlier detection.In addition,compared to LUE,F-Opt and H-Opt achieve 2.34 and 2.45 fold improvements in efficiency,respectively.Second,we propose a parameter space framework,called PSOD,for online outlier detection over sliding window streaming bigdata to support a large variety of query requests in parameter space with both diverse pattern and window parameter settings.First,we design an ingenious neighbor table that records the neighbors for each point in different distance intervals and different slides,which enables us to maximally reuse the already acquired neighbor information across the entire parameter space.In addition,we propose a series of shared strategies in sliding window environment to minimize processing cost by eliminating the redundant query requests.Moreover,the PSOD effectively transforms the query group in 4-D parameter space into a periodic query group in 3-D parameter space to minimize the number of queries.Our experimental study on three real-world steaming data demonstrates that our PSOD successfully drives down the CPU costs by more than 100 folds compared with the state-of-the-art method.
Keywords/Search Tags:Outlier detection, Streaming big data, Parameter space, Sliding window, Multi-query
PDF Full Text Request
Related items