Research On Outlier Detection For Stream Data Based On Sliding-window Model

Posted on:2013-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:X L Zhao

Full Text:PDF

GTID:2248330362474843

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data stream mining is one of today’s hot issues, which has broad prospects fordevelopment. As one of the basic tasks of data mining, outlier detection has a veryimportant value, and always gets the attention of researchers.In traditional static data sets, outlier detection has been made very fruitful, but theapproach can not be directly applied to data stream, therefore, outlier detection in datastream need to be solved urgently. Drawing on some existing approach of data streammining, this article proposes an outlier detection algorithm for stream data based onsliding window model, and provides some further optimization. The main achievementsof this work include the following aspects:①At first, this paper reviewed the current status of data mining, and describes thefunction, process and value of data mining. It focuses on Outlier detection, which is oneof the basic tasks of data mining, and summarizes the methods of outlier detection. As aspecial form of data, its increasingly expanding size has made the mining of data streammore and more practically significant. In order to distinguish it from traditional staticdata sets, this paper analyzes the characteristics of data stream, and summed up theuseful processing model for its mining.②Introduces the most important method of outlier detection in stream data briefly,including clustering-based methods and distance-based methods. Then, their strengthsand weaknesses are analyzed and summarized.③On the basis of previous studies, an outlier detection method for data streamSODS is proposed, which is based on sliding window model. With an acceptableaccuracy, this method uses a simple sliding window to manage the evolution of datastream effectively. The data structures used by the algorithm cuts down the computationtime of the statistics for neighbor set, and the use of the concept of safe inliers reducesoutliers query processing time, both make the performance of the algorithm better thansome other algorithm.④Based on SODS algorithm, pruning algorithm SODS1makes some reductionof redundant computation to improve the processing time performance of the algorithm.With a buffering mechanism onto the sliding window, BSOD algorithm avoids theunfairness in neighbor set statistics of the points nearby the borders of thesliding-window. That reduces the algorithm false positive rate effectively. Real data sets are employed in the experiments for the algorithms proposed in thispaper. A comparative analysis is made in respect of true detection rate, false positiverate, average processing time per point and query response time. The experimentalresults show that the algorithm SODS has good accuracy and time performance. Thealgorithm SODS1’s time performance gradually becomes better than SODS algorithm’sas the width of the window increases. In spite of the algorithm BSODS’ response time islonger than SODS1algorithm’s, the misjudgment rate has been well controlled to besignificantly lower than SODS1algorithm.

Keywords/Search Tags:

Data Mining, Data Stream, Outlier, Sliding-window

PDF Full Text Request

Related items

1	Research And Implementation On Key Techlogy Of Data Stream Mining
2	Research On Data Stream Reverse K Nearest Neighbors Outlier Mining Algorithm Based On X~* Tree
3	Research On Outliers Mining Algorithm Based On Data Streams With Different Attributes
4	Research On Data Stream Clustering Algorithm Based On Sliding Windows And Subspace Partition
5	Mining Association Rules Over A Stream Sliding Window
6	Research And Application Of Frequent-pattern Mining Methods In Data Stream
7	Research On Mining Frequent Closed Itemsets From A Sliding Window Over Data Streams
8	Research On Optimization Of Data Stream Frequent Itemsets Mining Algorithm Based On Sliding Window
9	The Bank On The Net Data Stream Based On Sliding Window Frequent Pattern Study
10	Outlier Detection Technic On Probilistic Stream