Contextal Outlier Mining And Parallelization Based On Weighted Probability Density

Posted on:2022-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:H Bai

Full Text:PDF

GTID:2518306521496784

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Outlier detection is an important branch of data mining.Its task is to detect the data objects that are significantly different from the vast majority of data in the data set,and to reveal the meaningful information and knowledge hidden behind the data objects.With the development of information technology,the amount of data increases rapidly and the dimensions of data also increase constantly."Disaster maintenance" has become one of the main reasons for affecting the detection effect of traditional outlier data,and it is difficult to be applied to big data analysis tasks.In this paper,we use weighted probability density and correlation subspace to investigate the context outlier data detection and its parallelization.The main results are as follows:(1)A context outlier detection algorithm based on weighted probability density is presented.Firstly,Gaussian mixture model and sparsity matrix are used to determine the correlation subspace.Secondly,in the correlation subspace,the weighted probability density is used to calculate the local outlier factor,which effectively reflects and describes the degree of inconsistency between the data object and its surrounding data objects.Then,N data objects with the largest outlier factors are selected as outlier data,and outlier factors,values of related subspace attributes and local data sets are taken as context information to improve the interpretability and comprehensibility of outlier data.Finally,the effectiveness of the algorithm is verified by experiments with artificial and UCI data sets.(2)Based on Spark parallel computing platform,a parallel detection algorithm for outlier data based on weighted probability density is presented.Firstly,the elastic data set is used to store the intermediate results generated by the local data set,sparsity matrix,attribute weight,and correlation subspace matrix in memory.Secondly,the outlier score of the data object is calculated on each calculation node.Finally,the scalability of the proposed algorithm are verified by experiments with artificial data sets.

Keywords/Search Tags:

Outlier mining, Subspace, Weighted probability density, Contextual information, Spark

PDF Full Text Request

Related items

1	Scalable Mining Of Contextual Outliers Based On Relevant Subspace
2	Local Outlier Data Mining And Application-related Subspace
3	Research On Outlier Detection Based On Density Difference
4	Research On Algorithms For Subspace Clustering And Outlier Mining Based-on Information-entropy
5	Research On Local Outlier Detection Algorithm Based On Subspace
6	Outlier Mining And Parallelization Based On Gaussian Mixture Model And Relative Subspace
7	Based On Information Entropy And The Subspace Outlier Mining Algorithm
8	Research And Improvement Of Local Outlier Detecting Algorithm Based On Density
9	Research On Outlier Mining Algorithms Based On Subspace And Its Application
10	Research Of Local Outlier Mining Algorithm Based On Spark