Font Size: a A A

Research On Outlier Detection Algorithm For High Dimensional Big Data

Posted on:2019-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2348330545981041Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Outliers detection,as one of the main tasks of data mining,along with development of big data tech,due to the increasing of data dimension and datasets sparsity,the traditional detection method is facing a serious efficiency issue,and even invalid.Influenced by the "The Curse of Dimensionality",local outliers could be concealed by redundant dimensional attributes,and not able to be detected in the full dimensional space.Thus,how to utilize the dimensionality reduction method to detect outlier subspace and detect the partial outliers is becoming the main approach in high dimensional big data outlier detection algorithms.This paper is going to discuss the problems revealed during the high dimension outlier detection,also research the outlier detection tech towards high dimension relevant subspace and the angle-based outlier detection:First of all,the paper is going to bring forward an outlier detection algorithm based on the relevant subspace,applying the local density distribution matrix to select the dimensions appearing the relevant attributes,using which to construct relevant subspace,and then detect the hidden local outliers in the subspace.Experiment verification within the synthetic datasets and real data will also be conducted in the end.The result shows that the performance of this method is superior to other subspace detection methods in high dimensional big data outlier detection.Then,this paper will also offer a modification of the angle-based outlier detection,and applying in the relevant subspace detection.Due to the distance of high-dimensional data objects becoming sparse and similar,the distance measurement comparison has no significance any more,but the high algorithm complexity and low accuracy towards the unequally distributed data sets.Using grid to prune the normal data,and apply outlier detection towards the left candidate outlier data set could increase algorithm efficiency significantly.The experiment proves that increasing grid density as local distribution weights can increase the accuracy of outlier detection algorithm when detecting the non-circular data.This paper is aiming at utilizing the subspace tech and modifying traditional detection methods to achieve a more efficient and accurate result in outlier detection of high dimensional big data.
Keywords/Search Tags:high dimensional big data, outlier detection, relevant subspace, angle-variance, grid partition
PDF Full Text Request
Related items