Font Size: a A A

Research And Implementation Of Approximate Algorithm For Outlier Detection Based On Probability Model

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:X L JiFull Text:PDF
GTID:2428330578969606Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology,stream data has gradually become the main data type.It has the characteristics of large data size and fast transmission speed.These characteristics lead to significant challenges in managing stream data efficiently.Outlier detection is an important data mining technology in the field of data mining,widely used in stream environment.Existing algorithms have such problems as high computational cost and large space cost,which make them unable to work efficiently in the high-speed stream environment and the real-time demand of users cannot be met.In this paper,the problem of outlier approximate detection for stream data is studied.With the cost of reducing a small amount of precision,it can greatly reduce the cost of query processing and meet the real-time demand of users.The contribution points of this paper are summarized as follows:In this paper,outlier approximate detection problem based on distance threshold under the sliding window model is studied firstly.In view of such problem,a query processing framework PBOAD(Partition-Based Outlier Approximate Detection)is presented.PBOAD first divides the sliding window by the technique of sharding.Based on this,PBOAD proposes index PMT(Partition based M-Tree)based on M-Tree to manage data of each piece.Again,PBOAD filters safe objects using the outlier prediction algorithm with probabilistic error guarantee,reducing algorithm complexity.This paper then studies outlier approximate detection problem under the sliding window model based on kNN average distance.In view of this problem,a query handle framework GAOAD(Grid-based Average Outlier Approximate Detection)is provided to support such queries.Firstly,GAOAD puts forward an index based on the grid managing summary information of stream data distribution and studies the rule of cell granularity adaptive adjustment,adjusting the grid granularity adaptively according to the kNN average distance outliers so as to achieve the purpose of efficient filtering.Secondly,GAOAD proposes a cell search algorithm based on min heap to maintain safe objects.Thirdly,this paper proposes a candidate object maintenance strategy to support outlier detection.Based on the existing problems of outlier detection in the stream environment,this paper studies the key technologies in outlier detection,in which theoretical analysis and experiments verify the high efficiency and accuracy of PBOAD and GAOAD.The results of the research can lay a foundation for effectively supporting outlier detection in data stream environment.
Keywords/Search Tags:Stream Data, Outliers, Indexes, Error Guarantee
PDF Full Text Request
Related items