Font Size: a A A

Research On Outlier Detection Algorithm For High-Dimensional Data Based On Angle And Entropy

Posted on:2021-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y WenFull Text:PDF
GTID:2428330647463364Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Outlier detection has always been one of the important research contents in the field of data mining.It aims to detect data points in the data set that deviate greatly from most normal data and do not conform to the general law.In general,the generation mechanism of outliers is different from that of normal objects.As the data dimension increases,the data representation becomes more and more sparse.Traditional outlier detection algorithms rely on Euclidean distance to measure the positional relationship between data objects,making the difference between outliers and normal data objects smaller.The performance is that the accuracy of the detection is not high.Therefor,this dissertation proposes a high-dimensional outlier detection algorithm based on angle and entropy.Considering that Euclidean distance affects the accuracy of outlier detection in high-dimensional space,angle cosine is used to measure the distance between data objects in high-dimensional space.Positional relationship.The angle cosine represents the positional relationship between points in a high-dimensional space compared to Euclidean distance.This article will use the angle cosine as the basis and introduce the concept of information entropy to quantify the degree of outliers for each data object.In this dissertation,based on the research of the current typical high-dimensional space outlier detection algorithm,the promblem of accuracy of outlier detection in high-dimensional space is solved.Using the angle entropy as the measurement index of each data object,the outler detection algorithm is studied.The main research results are as follows:1.A high-dimensional data outlier detection algorithm based on angle and entropy is proposed.Aiming at the problem of high-dimensional outlier detection algorithm based on angle,it still relies on the Euclidean distance as a weight to a certain extent in the calculation process.On the basis of the original algorithm,a high-dimensional outlier detection based on angle entropy is proposed algorithm.The algorithm uses the fixed nature of the angle cosine value range,divides it into appropriate segments,and then calculates the probability of the angle cosine value on each segment.Using the principle of information entropy,the information entropy of each data point can be calculated separately through probability value,using the entropy value as a measure of whether each object is an outlier.2.An algorithm for detecting outliers in high-dimensional data based on angle and entropy combined with subspace is proposed.Local outliers are easily covered up by redundant dimensional spaces in high-dimensional situation.If the full-dimensional space is still used to detect outliers,the effect is not good.In order to further improve the accuracy of outlier detection,this dissertation chooses a suitable subspace extraction algorithm.In the whole dimension space,the relevant attribute dimensions are extracted to form the relevant subspace,in which the hidden local outliers are detected by the high-dimensional outlier detection method based on angle and entropy.In this dissertation,the subspace technology is used to extract the relevant subspace,combined with the characteristic that information entropy value can be added,it can effectively improve the accuracy of outlier detection.3.A large number of experiment simulation tests were carried out against the improved algorithm proposed in this dissertation,which effectively proved that the method proposed in this dissertation has a high detection rate.In this dissertation,the comparative analysis of experiments in seven real data sets shows that the angle and entropy-based high-dimensional data outlier detection algorithm proposed in this dissertation is more average than the conventional angle-based high-dimensional outlier detection algorithm.The accuracy rate is improved by 14.3%,the average accuracy of outlier detection algorithm of angle and entropy-based high-dimensional outlier detection algorithm combined with subspace is increased by 31.5% compared with the angle-based high-dimensional outlier detection algorithm;finally,the angle-based combination of subspace and the entropy-based high-dimensional data outlier detection algorithm is 15% higher than the average high-dimensional data outlier detection algorithm.This confirms the reliability and effectiveness of the two improved algorithms proposed in this dissertation for outlier detection in high-dimensional data volumes.
Keywords/Search Tags:High-Dimensional Data, Angular Cosine, Information Entropy, Subspace, Outlier Detection
PDF Full Text Request
Related items