Font Size: a A A

Self-representation Clustering Algorithm Based On Local Structure Information And Its Application

Posted on:2022-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuiFull Text:PDF
GTID:2518306752987039Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the amount of information stored and available is constantly increasing.In the real world,most of the data do not have natural labels or it is costly to add labels artificially.At this point,unsupervised learning becomes a powerful tool.Clustering,a typical model of unsupervised learning,is often used to discover the underlying structure of data and to perform clustering.Outliers are also known as outliers,and outlier detection is a fundamental problem in machine learning.Many clustering algorithms obtain nearly perfect clustering results when there are no outliers in the dataset,but when there are outliers in the dataset,the original assumptions of the problem are broken and the algorithm is significantly less effective.The self-representation clustering uses other data points to represent the target data points and rejects the noise in the data set based on a priori information,thus reconstructing and correcting the target data points.In contrast,the algorithm based on local structure information adds the local environment information of data points to the global information,which makes the relationship between points in the data network more realistic and further enhances the performance of the clustering algorithm.The outlier detection method based on self-representative clustering can perform outlier identification while clustering,thus forming a mutually reinforcing and complementary relationship.This "multi-task" learning strategy improves the quality of clustering through outlier detection,and the quality of clustering improves the accuracy of outlier identification.In this paper,we first analyze and study existing self-representative clustering algorithms and outlier detection methods.Second,based on two self-representative clustering frameworks,we propose improved algorithms using the local structure information of the data,respectively.Finally,the corresponding outlier detection methods are designed based on the newly proposed self-representative clustering algorithms.Specifically,the main contents of this paper are as follows.(1)The 6)-far-neighbor based heuristic clustering algorithm(KFNHC)is proposed.The heuristic clustering algorithm based on center-of-mass learning can adapt to data sets with varying cluster shapes and densities,and due to its hierarchical and shrinkage nature,it can gradually separate outlier clusters from real clusters to achieve the effect of identifying global outlier points.Based on this clustering framework,the article proposes a 6)-far-neighbor similarity measure based on the 6)-far-neighbor point set,which captures local density information and effectively identifies near-neighbor points and outlier points.The KFNHC algorithm proposed in the article combines the good properties of local structure information and heuristic clustering algorithm with superior clustering performance.(2)The proposed subspace clustering algorithm based on representation of common neighbors(RCNSC).The algorithm uses post-processing techniques to control the self-expressiveness of subspace clustering.By using representation coneighborhood,the algorithm eliminates outliers in the mapping space and reconstructs the representation matrix to enhance the sparsity and connectivity of the representation matrix.The algorithm proposed in the article is universal and can be used in existing subspace clustering algorithms,thus improving the algorithm performance.(3)Self-representation clustering-based outlier detection algorithm(SRCOD)is proposed.The algorithm combines the self-representation clustering algorithm with outlier detection,and uses a multi-stage rejection strategy to reject outliers in a robust and orderly manner.In the clustering process,outliers are identified based on shrinkage distance clustering,and the "back-off" mechanism is used to correct the "wrong points" caused by the influence of extreme outliers(clusters).The experimental results show that the proposed outlier detection framework based on self-representative clustering can effectively detect global and local outliers.
Keywords/Search Tags:heuristic clustering, subspace clustering, local structure information, outlier detection
PDF Full Text Request
Related items