Font Size: a A A

Outlier Detection And Semi-supervised Clustering Algorithm Based On Shared Nearest Neighbors

Posted on:2012-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:L Z ZhengFull Text:PDF
GTID:2218330368493363Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important method in the world of data mining, Clustering results not only depend on distance or similarity, but also the outliers affect the effectiveness. Traditional clustering analysis is unsupervised without any prior knowledge which can help to solve hard labeled problem in practical application, so the semi-supervised clustering emerged.Firstly, this paper introduces some relevant knowledge about clustering analysis such as similarity, then analyzes traditional clustering analysis and its classification, compares the performance of main algorithm. It also elaborates on the learning frame of semi-supervised clustering and the difference between traditional and semi-supervised clustering.Secondly, the paper proposes an outlier detection algorithm based on nearest neighbors. It analyzes the importance of outlier detection and introduces the method of how to determine the nearest neighbors'sets and the step of algorithm, then proves the good results of algorithm on both synthetic and real datasets.Thirdly, the paper proposes the semi-supervised clustering algorithm based on shared nearest neighbors. It elaborates on prior knowledge and introduces how to gain and describe the prior knowledge, then proposes the method of expansion based on transitive of constraints sets and the characteristic of datasets. The new algorithm combines the expanded constraints sets and the graph based on the similarity of SNN, the clustering results is the sub graph which gains by separating the SNN graph. The simulation experiment verifies the good performance of expansive method and the improved results of the semi-supervised clustering algorithm.Last, it combines the algorithm of outlier detection and semi-supervised clustering based on the nearest neighbors. The experiment uses a dataset with outliers which first be operated by detecting the outliers and then the semi-supervised clustering, the result proves the two algorithms have better performance than other algorithms.
Keywords/Search Tags:similarity, outliers, prior knowledge, shared nearest neighbors, semi-supervised clustering
PDF Full Text Request
Related items