Font Size: a A A

Efficient Top-n Local Outlier Detection For Big Data

Posted on:2021-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:F LiuFull Text:PDF
GTID:2428330623474898Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the widespread popularity of various types of smart mobile devices,broad applications such as social networks,online shopping,mobile payment,and location services have continued to emerge.Various types of big data have been collected and processed,and mining and analysis services for these big data has suddenly become a unique emerging industry.As one of the most important tasks of data mining,anomaly detection is considered to be of vital importance in various applications such as network monitoring and credit card fraud.In addition,the distributions of the data tend to be skewed in the real world,and local outlier factor effectively addresses the problem of outlier detection in skewed datasets,which has been shown remarkable detection performance in variety of applications.Therefore,local outlier detection has received more and more attention in both academia and industry.In order to more efficiently and quickly detect abnormal objects in large amounts of data,two density-based local outlier detection algorithms are proposed in this thesis.The mian research content is as follows(1)The thesis proposes an efficient top-n local outlier detection algorithm for fast detecting top-n local outliers in large-scale datasets.First,the thesis proposes four LOF upper bounds that are closer to the real LOF value to avoid the computations of LOF values,and analyze their computational complexity theoretically.Second,by combining with index structure and the upper bounds UB1 and UB2,the thesis proposes a two-layer Cell pruning strategy,not only adopting the global Cell pruning strategy,but also introducing a local pruning strategy based on the internal data objects of Cells,which effectively prunes the high-density regions.Third,the thesis proposes two more reasonable and effective data object pruning strategies using the proposed upper bounds UB3 and UB4.UB3 and UB4 are closer to the real LOF value,which benefits to pruning more data objects.On the other hand,the upper bound calculation method based on computation reuse greatly reduces the computational cost.Finally,the thesis optimizes the selecting method of initial top-n local outliers leveraging the established index structure.Specifically,the thesis selects the initial top-n local outliers in sparse regions,which is conducive to selecting the data objects with a larger LOF value as the initial local outliers.Experimental study on seven real-world datasets demonstrates the efficiency and scalability of our proposed method—up to 3.5 times faster than the state-of-the-art methods(2)Considering the detection of local outliers over high-volume data streams,the thesis proposes a top-n local outlier detection method based on Kernel Density Estimation(KDE)over large-scale high-volume data streams.First,the thesis defines a KDE-based Outlier Factor(KOF)to measure the local outlierness score for the data points.Then,the thesis proposes the upper bounds of the KOF and an upper-bound-based pruning strategy to quickly eliminate the majority of the inlier points by leveraging the upper bounds without computing the expensive KOF scores.Moreover,the thesis designs an Upper-bound pruning-based top-n KOF detection method(UKOF)over data streams to efficiently address the data updates in a sliding window environment.Furthermore,the thesis proposes a Lazy update method of UKOF(LUKOF)for bulk updates in high-speed large-scale data streams to further minimize the computation cost.Experimental study demonstrates that the proposed method outperforms the state-of-the-art methods by up to 3600 times in speed,while achieving the best performance in detecting local outliers over data streams.
Keywords/Search Tags:Outlier detection, Local outlier detection, Data streams, Kernel density estimation, Top-n, Pruning strategy
PDF Full Text Request
Related items