Research On Outlier Detection Algorithm For High Dimensional Big Data

Posted on:2019-06-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhao

Full Text:PDF

GTID:2348330545981041

Subject:Computer Science and Technology

Abstract/Summary:

Outliers detection,as one of the main tasks of data mining,along with development of big data tech,due to the increasing of data dimension and datasets sparsity,the traditional detection method is facing a serious efficiency issue,and even invalid.Influenced by the "The Curse of Dimensionality",local outliers could be concealed by redundant dimensional attributes,and not able to be detected in the full dimensional space.Thus,how to utilize the dimensionality reduction method to detect outlier subspace and detect the partial outliers is becoming the main approach in high dimensional big data outlier detection algorithms.This paper is going to discuss the problems revealed during the high dimension outlier detection,also research the outlier detection tech towards high dimension relevant subspace and the angle-based outlier detection:First of all,the paper is going to bring forward an outlier detection algorithm based on the relevant subspace,applying the local density distribution matrix to select the dimensions appearing the relevant attributes,using which to construct relevant subspace,and then detect the hidden local outliers in the subspace.Experiment verification within the synthetic datasets and real data will also be conducted in the end.The result shows that the performance of this method is superior to other subspace detection methods in high dimensional big data outlier detection.Then,this paper will also offer a modification of the angle-based outlier detection,and applying in the relevant subspace detection.Due to the distance of high-dimensional data objects becoming sparse and similar,the distance measurement comparison has no significance any more,but the high algorithm complexity and low accuracy towards the unequally distributed data sets.Using grid to prune the normal data,and apply outlier detection towards the left candidate outlier data set could increase algorithm efficiency significantly.The experiment proves that increasing grid density as local distribution weights can increase the accuracy of outlier detection algorithm when detecting the non-circular data.This paper is aiming at utilizing the subspace tech and modifying traditional detection methods to achieve a more efficient and accurate result in outlier detection of high dimensional big data.

Keywords/Search Tags:

high dimensional big data, outlier detection, relevant subspace, angle-variance, grid partition

Related items

1	Research And Application On Outlier Detection Algorithm For High-dimensional Data Stream
2	Outlier Detection Methods For Complex Data Types
3	Research On Outlier Detection Algorithm For High-Dimensional Data Based On Angle And Entropy
4	Analysis And Research Of Outlier Detection Algorithm For High Dimensional Data
5	Study On Outlier Detection In Subspace
6	Study On Algorithms For Fast Outlier Detection
7	High-dimensional data mining: Subspace clustering, outlier detection and applications to classification
8	Scalable Mining Of Contextual Outliers Based On Relevant Subspace
9	A Study On Outlier Detection Algorithms For High Dimensional Data
10	Local Outlier Data Mining And Application-related Subspace