Font Size: a A A

Research On Outlier Detection For Stream Data Based On Sliding-window Model

Posted on:2013-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhaoFull Text:PDF
GTID:2248330362474843Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data stream mining is one of today’s hot issues, which has broad prospects fordevelopment. As one of the basic tasks of data mining, outlier detection has a veryimportant value, and always gets the attention of researchers.In traditional static data sets, outlier detection has been made very fruitful, but theapproach can not be directly applied to data stream, therefore, outlier detection in datastream need to be solved urgently. Drawing on some existing approach of data streammining, this article proposes an outlier detection algorithm for stream data based onsliding window model, and provides some further optimization. The main achievementsof this work include the following aspects:①At first, this paper reviewed the current status of data mining, and describes thefunction, process and value of data mining. It focuses on Outlier detection, which is oneof the basic tasks of data mining, and summarizes the methods of outlier detection. As aspecial form of data, its increasingly expanding size has made the mining of data streammore and more practically significant. In order to distinguish it from traditional staticdata sets, this paper analyzes the characteristics of data stream, and summed up theuseful processing model for its mining.②Introduces the most important method of outlier detection in stream data briefly,including clustering-based methods and distance-based methods. Then, their strengthsand weaknesses are analyzed and summarized.③On the basis of previous studies, an outlier detection method for data streamSODS is proposed, which is based on sliding window model. With an acceptableaccuracy, this method uses a simple sliding window to manage the evolution of datastream effectively. The data structures used by the algorithm cuts down the computationtime of the statistics for neighbor set, and the use of the concept of safe inliers reducesoutliers query processing time, both make the performance of the algorithm better thansome other algorithm.④Based on SODS algorithm, pruning algorithm SODS1makes some reductionof redundant computation to improve the processing time performance of the algorithm.With a buffering mechanism onto the sliding window, BSOD algorithm avoids theunfairness in neighbor set statistics of the points nearby the borders of thesliding-window. That reduces the algorithm false positive rate effectively. Real data sets are employed in the experiments for the algorithms proposed in thispaper. A comparative analysis is made in respect of true detection rate, false positiverate, average processing time per point and query response time. The experimentalresults show that the algorithm SODS has good accuracy and time performance. Thealgorithm SODS1’s time performance gradually becomes better than SODS algorithm’sas the width of the window increases. In spite of the algorithm BSODS’ response time islonger than SODS1algorithm’s, the misjudgment rate has been well controlled to besignificantly lower than SODS1algorithm.
Keywords/Search Tags:Data Mining, Data Stream, Outlier, Sliding-window
PDF Full Text Request
Related items