Font Size: a A A

Research On Spectral Clustering And Outlier Detection Algorithms Based On Natural Neighbors

Posted on:2020-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:J X ShiFull Text:PDF
GTID:2428330596993893Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,there is a steady stream of data from all walks of life.It is the main task of current data mining to find valuable information from these data and provide decision support for relevant personnel.Cluster analysis and outlier detection are important components of data mining tasks.They have been widely used in pattern recognition,artificial intelligence,credit card fraud detection,video surveillance and other fields,and promote social progress and industry development.Cluster analysis analyzes its potential relationships by using similarity between data.Spectral clustering has a solid theoretical basis and good clustering performance,which has attracted the attention of more and more researchers.The spectral clustering algorithm can converge to the global optimal without making any assumptions about the global structure of the data,but there are problems such as the selection of scale parameters,the measure of similarity,and the determination of the number of clusters.Outlier detection is mainly used to find some abnormal data or patterns that deviate from the normal behavior.The density-based outlier detection algorithm is a commonly used strategy,but this type of algorithm usually has neighbor parameter selection and the easy misdetection of outliers in density difference data sets.Therefore,in order to solve the problem of neighborhood parameter selection in spectral clustering and outlier detection,this paper introduces a neighborhood search method that does not require artificially set parameters—the natural neighbor algorithm.The algorithm automatically adapts the distribution between data points in the data set by continuously expanding the search range of the neighborhood.In addition,for the other problems mentioned above,this paper proposes two improved algorithms in combination with the natural neighbor algorithm,as follows:An adaptive spectral clustering algorithm based on shared natural neighbors is proposed.Aiming at the problem of neighborhood scale parameter selection in spectral clustering algorithm,this paper first obtains adaptive neighborhood parameters through natural neighbor algorithm.Then,for some cases where the data points are misidentified as the same cluster on some popular data sets,combined with the obtained adaptive neighborhood parameters,the similarity of the data points is redefined using the shared neighbors,so it can be effectively described intrinsic connection.Finally,the idea of feature gap is used to obtain the number of clusters in the feature vector to complete the whole algorithm clustering.This paper conducts a comparative experiment on artificial and real data sets.The experiment results show that the proposed algorithm has better performance and advantages in popular clustering than the existing algorithm with appropriate parameters input.An outlier detection algorithm based on natural neighbors is proposed.Aiming at the problem of neighborhood parameter selection in outlier detection algorithm,this paper firstly improves the natural neighbor algorithm to obtain natural feature values and construct a natural feature neighborhood graph.Then,the information in the natural feature neighborhood graph is used to reflect the degree of data compactness,so as to solve the density difference problem in the data set,and at the same time,the global outliers are obtained.Then,the new outlier factor is redefined and sorted,and the objects with higher factors are selected as the outliers.Finally,the effectiveness of the algorithm in outlier detection is verified by experiments on artificial and real data sets.
Keywords/Search Tags:Spectral Clustering, Outlier Detection, Natural Neighbors, Shared Natural Neighbors
PDF Full Text Request
Related items