| Outlier detection is a task with a long history,and large amounts of data are constantly generated in modern production and life.Not all data is valuable,it is necessary to extract some useful information from massive amounts of data.This has led to the increasingly widespread application of outlier detection,and many different types of outlier detection techniques have emerged.Scholars studied the concept of outlier detection and generally defined it as: if the attribute value of a data object is significantly different from that of other data objects in a given data set,then this data object is called an "outlier".This article focuses on improving the detection accuracy of density-based and cluster-based outlier detection algorithms.The main research contents are as follows.First,this paper proposes an outlier detection algorithm based on kernel local density estimation to address the issue of low detection accuracy of local outliers in datasets with uneven density distribution and irregular shape,as well as the sensitivity of many distance-and density-based outlier detection algorithms in setting the parameter k value.This algorithm uses the natural neighbor search algorithm to automatically adjust the k parameter,estimates the local density of data objects by considering the neighborhood information of objects in the Gaussian kernel density estimation,and then defines the concept of the k object average distance to characterize the distribution around data objects.By combining the local density of data objects and the k object average distance,the algorithm proposes the local deviation factor to detect local outliers more accurately.Secondly,to address the issue of data sets with large density differences and distant distances between subclusters and clusters,this paper proposes an outlier detection algorithm based on K-Medoids clustering and density peaks.This algorithm uses the contour coefficient method to determine the optimal number of clusters for each data set,the Max Min algorithm to determine the starting point of the cluster,and the K-Medoids clustering algorithm to divide the data set into multiple clusters.Then,the cut-off distance is redefined,using the idea of density peak clustering and the combination of the two indicators,and then the concept of cluster object deviation degree is proposed to detect outliers,so as to find the outliers in the cluster more accurately.Finally,this paper selects the most widely used and effective outlier detection algorithms in recent years and performs experimental comparisons and analyses with the proposed algorithm on artificial and real datasets to validate its efficacy. |