Optimal Subspace Outlier Mining Algorithm Based On Entropy Increment And Local Attribute Weighting

Posted on:2022-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:J J Liu

Full Text:PDF

GTID:2518306536996619

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,data mining technology has become a hot research object.Outlier mining technology,as an important part of the data mining field,has also received extensive attention and exploration research.Relying on its unique mechanism and valuable information,the outlier detection technology has played an important role in the field of data development intelligent systems.At present,outlier detection has been widely used in fraud detection,medical diagnosis,public safety and other fields,and domestic and foreign experts and scholars have also proposed many specific methods for outlier detection.Aiming at the limitations and instability of outlier detection under high-dimensional data sets,this paper proposes an improvement strategy.The research is carried out from the two directions of subspace clustering and outlier mining,which are mainly divided into the following several aspects.First,improve the problem of low detection efficiency under high-dimensional data sets.In the data preprocessing stage,the optimal subspace is searched,and the data set dimensions are preliminarily screened through dimensional entropy,redundant attributes are filtered out,and the optimal subspace for detecting outliers is obtained.Then,according to the characteristics of mutual information that can describe the correlation between dimensions,an index to measure the pros and cons of subspace clustering is redrawn,and the objective function of the clustering subspace is optimized to obtain the optimal subspace.In the outlier detection stage,the entropy outlier score is proposed as a metric according to the idea of dividing the information entropy increment.Perform outlier detection in the optimal subspace,An optimal subspace outlier detection algorithm based on entropy increment is proposed.And analyzed the correctness and complexity of the algorithmSecondly,in view of the limitations of the current density-based outlier detection algorithm in the detection stage,further in-depth research is carried out.On the premise of finding the optimal subspace using information entropy,in the outlier detection stage,the outlier attribute of the data object is determined through the dimensional information entropy.Define weighted distance describes the distance between data objects,and gives related definitions such as weighted k-distance,reverse weighted k-distance,and weighted k-neighborhood.Finally,the Gaussian kernel function is introduced to describe the neighborhood kernel density of the data object,and the outlier degree of the data object is further described.An outlier detection algorithm based on local attribute weighting based on information entropy is proposed,and the correctness and complexity of the algorithm are analyzed.Finally,the two algorithms proposed in this paper were verified on the UCI real data set,and compared with other related outlier detection algorithms.Respectively,the effectiveness and feasibility of the algorithms were verified.

Keywords/Search Tags:

Data mining, outliers, high-dimensional data, information entropy, Gaussian kernel function

PDF Full Text Request

Related items

1	The Outliuer Mingng Algorithm Based On Gaussian Kernel Function And Local Density
2	Mining Association Rules Among Outliers Based On Histogram And FP-growth
3	The Research On A Few Key Issues In High Dimensional Data Mining
4	Research On Visual Analysis Mechanism Of High-dimensional Data Based On Information Entropy
5	Research On Outliers Detection In Data Stream Based On Unsupervised Learning
6	Intelligent data mining using kernel functions and information criteria
7	Research On Outlier Detection Algorithm For High-Dimensional Data Based On Angle And Entropy
8	Research On Robust Kernel Low-rank Representation Algorithm Of High-dimensional Data By Tensor Decomposition
9	A Research On Outliers Mining Algorithm Based On Heat Metering Data
10	The Research Of High-dimensional Data Mining Technology For Big Data