Font Size: a A A

The Outlier Detection Algorithm Based On Adaptive Clustering And Gaussian Kernel Density

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:M F ZhuFull Text:PDF
GTID:2568306104471304Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Outlier detection and data clustering analysis are research hotspots in data mining.By detecting outliers,important knowledge can be obtained from the data and help to make better decision support.At present,outlier detection has been successfully applied to intrusion detection,fraud detection,medical health,ecological fields,etc,but the accuracy and efficiency of outlier detection still face challenges.This paper has conducted in-depth research on the parameter setting of cluster-based outlier detection algorithm,the improvement of the recall rate and accuracy of cluster-based and density-based outlier detection algorithm.The main research content includes the following two aspects.Firstly,The research status of outlier detection algorithm based on density and clustering is analyzed.The problem of improper density characterization and difficult parameter determination in the algorithm is proposed.In order to solve the problem of improper density characterization and difficult parameter determination of the algorithm,this paper proposes an outlier detection algorithm based on non-parameter and clustering boundary,which uses mutual neighbors and a no parameter cluster search method to adaptively obtain parameters;To suppress the "misclustering" phenomenon in the boundary area,the concept of outlier cluster boundary regions is proposed to describe the outlier of data points between clusters;A local deviation factor is proposed to measure the outlier degree of local outliers.The algorithm does not need to set parameters manually in the whole process,and can accurately find the global outliers and local outliers in the data of different distribution characteristics.Secondly,the outlier detection algorithm based on density space clustering is studied.In order to solve the problem that the parameters of the algorithm need to be determined manually,resulting in poor clustering quality and low outlier detection efficiency,an outlier detection algorithm based on gaussian kernel density and candidate set is proposed.The algorithm uses the k-distance matrix and mathematical expectation to complete the acquisition of parameters and obtain the candidate set.The Gaussian kernel density function and the distance domain are used to characterize the tightness of the data objects in the candidate set,and a new outlier factor is proposed.Characterize the outliers of the data objects in the candidate set,reduce the influence of boundary points,and detect true outliers.Lastly,under the relevant data setst,the outlier detection algorithm based on non-parameter and clustering boundary and the outlier detection algorithm based on gaussian kernel density and candidate set are experimentally verified.The performance comparison with related algorithms verifies the effectiveness and extensiveness of the algorithm in this paper.
Keywords/Search Tags:data mining, outliers, clustering, adaptive, gaussian kernel density
PDF Full Text Request
Related items