Font Size: a A A

Research And Implementation Of Clustering And Outlier Detection Algorithms

Posted on:2008-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhengFull Text:PDF
GTID:2178360272977154Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining techniques can be used to find out potential and useful knowledge from vast amount of data. With the rapid development of the data mining techniques, clustering analysis and outlier detection are widely applied to the field such as pattern recognition, data analysis, image processing, and market research. Research on clustering analysis and outlier detection algorithms has become a highly active topic in the data mining field.This thesis introduces the theory of data mining, and deeply analyzes the algorithms of clustering and outlier detection. Based on the analysis of density-based clustering and outlier detection algorithms, we present Outlier Detection algorithm Based on Symmetric Neighborhood (ODBSN) and r-Neighborhood Based Clustering algorithm (RNBC).In the ODBSN algorithm, we introduce the concept of reverse k nearest neighbors. Based on this concept, we design an outlier detection algorithm based on symmetric neighborhood to improve the efficiency of the density based outlier detection algorithms. The ODBSN algorithm does not need to compute the reachable distance and reachable density, so the computation cost can be greatly reduced. In the meanwhile, outlier detection based on Symmetric Neighborhood-based Outlier Factor (SNOF) also makes the outlier more accurate.In the RNBC algorithm, we introduce the concept of relative density factor. Based on this concept, we design a new density-based clustering algorithm. Compared with clustering algorithm DBSCAN, this algorithm has tow advantages: first, we use the relative density factor to distinguish the local core point from local border point, then we can cluster datasets based on local data distribution. In this way, multi-density clusters can be found. Second, the algorithm can detect outliers by measuring the outlierness of some data object using relative density factor during the clustering process.We have implemented ODBSN, RNBC, LOF and DBSCAN algorithms with Java. As shown in the experimental results, ODBSN, RNBC algorithms can correctly discover outliers and clusters respectively, and those two algorithms are better on the effectiveness and efficiency than that of LOF and DBSCAN respectively.
Keywords/Search Tags:Data Mining, Clustering Analysis, Outlier Detection, Symmetric Neighborhood, r-Neighborhood
PDF Full Text Request
Related items