Font Size: a A A

Study On The Density-Based Local Outlier Mining Algorithm

Posted on:2012-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2218330371958015Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a very important part of data mining, outlier detection can find out the most different and abnormal data instance, which is called "outlier", from large scale data sets. Outliers always include valuable information. Study on the existing method of outlier detections, we present two new ideas for mining outliers in large scale data sets and special data sets.In large high dimensional data sets, the computation complexity of algorithm for identifying density-based local outliers (LOF algorithm) is very high. Under such circumstances, an outlier detection algorithm based on kernel k-means clustering is proposed in our article, which applied kernel k-means clustering on data sets to calculate clusters by measure function. Those data instances in clustering with high value were carried out for the candidates of outliers, and will be left for outlier mining using LOF algorithm. The outlier detection algorithm based on kernel k-means clustering reduced the computation for neighborhood of the data sets and shortens the execution time.In special datasets, the orthodox method takes all attributes multidimensional, which can not find out all outliers correctly. In our local outlier detection based on Voronoi graphics, we determine neighborhood dates by spatial attributes, and weight the non-special attributes, which is used to compute outliers, by local entropy theory.Run two algorithms on real datasets. Result in both theoretic analyses and experiments show that our algorithm is reasonable and efficient.
Keywords/Search Tags:outlier mining, LOF, kernel-kmeans, local entropy theory
PDF Full Text Request
Related items