Font Size: a A A

Optimal Subspace Outlier Mining Algorithm Based On Entropy Increment And Local Attribute Weighting

Posted on:2022-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiuFull Text:PDF
GTID:2518306536996619Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,data mining technology has become a hot research object.Outlier mining technology,as an important part of the data mining field,has also received extensive attention and exploration research.Relying on its unique mechanism and valuable information,the outlier detection technology has played an important role in the field of data development intelligent systems.At present,outlier detection has been widely used in fraud detection,medical diagnosis,public safety and other fields,and domestic and foreign experts and scholars have also proposed many specific methods for outlier detection.Aiming at the limitations and instability of outlier detection under high-dimensional data sets,this paper proposes an improvement strategy.The research is carried out from the two directions of subspace clustering and outlier mining,which are mainly divided into the following several aspects.First,improve the problem of low detection efficiency under high-dimensional data sets.In the data preprocessing stage,the optimal subspace is searched,and the data set dimensions are preliminarily screened through dimensional entropy,redundant attributes are filtered out,and the optimal subspace for detecting outliers is obtained.Then,according to the characteristics of mutual information that can describe the correlation between dimensions,an index to measure the pros and cons of subspace clustering is redrawn,and the objective function of the clustering subspace is optimized to obtain the optimal subspace.In the outlier detection stage,the entropy outlier score is proposed as a metric according to the idea of dividing the information entropy increment.Perform outlier detection in the optimal subspace,An optimal subspace outlier detection algorithm based on entropy increment is proposed.And analyzed the correctness and complexity of the algorithmSecondly,in view of the limitations of the current density-based outlier detection algorithm in the detection stage,further in-depth research is carried out.On the premise of finding the optimal subspace using information entropy,in the outlier detection stage,the outlier attribute of the data object is determined through the dimensional information entropy.Define weighted distance describes the distance between data objects,and gives related definitions such as weighted k-distance,reverse weighted k-distance,and weighted k-neighborhood.Finally,the Gaussian kernel function is introduced to describe the neighborhood kernel density of the data object,and the outlier degree of the data object is further described.An outlier detection algorithm based on local attribute weighting based on information entropy is proposed,and the correctness and complexity of the algorithm are analyzed.Finally,the two algorithms proposed in this paper were verified on the UCI real data set,and compared with other related outlier detection algorithms.Respectively,the effectiveness and feasibility of the algorithms were verified.
Keywords/Search Tags:Data mining, outliers, high-dimensional data, information entropy, Gaussian kernel function
PDF Full Text Request
Related items