Font Size: a A A

Research On Outlier Mining Method Based On Deviation Characteristic

Posted on:2018-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:X L YinFull Text:PDF
GTID:2348330542490946Subject:Engineering
Abstract/Summary:PDF Full Text Request
Data mining is a multidisciplinary field that covers topics such as machine learning,databases,statistics,knowledge systems,artificial intelligence,high performance computing,and so on.In simple terms,data mining uses appropriate algorithms to study and analyze data patterns in order to discover important knowledge from them.Outlier mining is an important research direction of data mining,focusing on researching rare events,through the detection and analysis to dig out the valuable knowledge from them.Outlier mining applications in the field of log analysis,intrusion detection,quality control,and so on,which promotes scientific progress and social development.In the static data set environment,a fast LOF detection algorithm is proposed from the perspective of deviation characteristics(local density)in this paper.The traditional LOF algorithm is based on the whole data set to calculate the local outlier factor of each data point which require a lot of computing time.In order to solve this problem,the algorithm divides the data space into grids,and calculates the local outlier factor of the data points based on the centroids of the grids.Since the number of grids is less than the number of data points,the time complexity is obviously reduced under acceptable error,however,it is more appropriate for low-medium dimensional data with a large number of data points.Moreover,the algorithm can also be effectively used for real-time outlier detection.It can rapidly detect outliers of new data points,as it can utilize a grid structure of existing data points,the LOF calculation of new data points only requires the the identification of the grid-location of the data points and no further calculations are required.The contrast experiment between the traditional LOF algorithm and the fast LOF algorithm is proposed,the experimental results show that the proposed algorithm can reduce the computation time and improve the efficiency,while achieving comparable accuracy.In the dynamic data stream environment,a fast IncLOF detection algorithm is proposed from the perspective of deviation characteristics in this paper.The traditional IncLOF algorithm requires saving all previous data points in the data stream to compute the local outlier factor of the new data points when detecting the outliers in the data stream,which is impractical due to memory limitations.In order to solve this problem,the algorithm calculates the local outlier factor of the new data points by summarizing,merging and inserting measures,accumulating the historical information of the data stream and storing it in the limited memory.The contrast experiment between the traditional IncLOF algorithm and the fast IncLOF algorithm is proposed,the experimental results show that the proposed algorithm required substantially less computation time and memory than the traditional IncLOF algorithm,while achieving comparable accuracy,and it is extensible.
Keywords/Search Tags:Stream Data Mining, Outlier Detection, Local Outlier Factor, Deviation Characteristic, LOF Algorithm
PDF Full Text Request
Related items