Font Size: a A A

Research And Implementation Of Outlier Detection Algorithm For Multi-dimensional Stream Data

Posted on:2020-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhuFull Text:PDF
GTID:2428330578469607Subject:Engineering
Abstract/Summary:PDF Full Text Request
The outlier detection of streaming data plays an important role in many modern applications such as credit card fraud detection and stock investment plan,and is also an important issue in the field of data management.The most widely used distance-based outlier detection has been extensively studied.However,existing techniques cannot support efficient detection of outliers for multi-dimensional streaming data.The root cause is the high maintenance cost of range of queries and candidates.To solve the problems above,this paper proposes the query processing framework PIOD(Partition-Index based Outlier Detection)and ISOD(Index based Slide-query Outlier Detection).This paper first studies the problem of outlier detection based on kNN under the sliding window model.In response to such problem,this paper proposes the query processing framework PIOD.PIOD first uses the sharding technique to divide the sliding window.Then,PIOD proposes ZPH-Tree index based on Z curve to manage streaming data and meawhile,adds the buffer updated mechanism to improve the applicability of index.Furthermore,PIOD puts forward a candidate outlier maintenance algorithm based on ZPH-Tree.This algorithm uses fragmentation techniques and index space filtering to avoid maintaining k-nearest neighbors of all objects.In addition,this paper proposes that the CSM(Candidate-Set Maintain)algorithm based on EM-tree index can reduce the maintenance cost of candidates and eventually achieve efficient maintenance of candidate sets by maintaining the positional relationship and score relationship among candidates.Theoretical analysis and experiments verify the efficiency and stability of PIOD.This paper then studies the threshold-based outlier detection problem under the sliding window model.In response to such problems,this paper puts forward the query processing framework ISOD.Firstly,the ISOD proposes an index ZPT-Tree based on the Z curve to manage streaming data.The index maintains the positional relationship among streaming data on the one hand and the time relationship of streaming data on the other hand.Furthermore,ISOD proposes a minimum search principle based on ZPT-Tree.This principle can avoid double counting costs,reduce the number of range queries and eventually achieve efficient detection based on distance outliers by the screening of security points and the query and maintenance of the best neighbors of candidates.Theoretical analysis and experiments verify the efficiency and stability of ISOD.
Keywords/Search Tags:Streaming Data, Data Management, Outlier Detection, Objects Maintenance
PDF Full Text Request
Related items