Font Size: a A A

Mining Association Rules Among Outliers Based On Histogram And FP-growth

Posted on:2014-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:L J LiFull Text:PDF
GTID:2268330425994651Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Outlier detecting in high-dimensional space is one of the difficult issues in the area of data mining because of the sparseness and the dimensionality curse. Based on the existing methods on high dimensional space, a new outlier mining method based on histogram and FP (frequent-pattern) growth to discover the association rules among the outliers is presented in this paper, which can explain the outliers and the relationship among the outliers better.In recent years, many researchers have focused on outlier detecting in high-dimensional and very large datasets and proposed many approaches, such as data reduction, projection, feature selection and so on. The existing methods do improve some traditional approaches, but there are some problems and drawbacks. Besides the high cost of computation, the cause of outliers and their generation mechanism have not been studied deeply. To solve the problems, a new outlier mining method based on histogram and FP (frequent-pattern) growth to discover the association rules in the outliers is proposed. In our method, the KNN (K-Nearest Neighbors) distance is calculated first to form the histogram in each dimension. Then, global outliers, local outliers and border outliers are distinguished from them so as to reduce the computation complexity. Finally, the association rules which meet the support and confidence of the dimensions in which outliers occur frequently are detected by the FP growth to explain the relation in the outliers.The experiments on three synthetic datasets and three real datasets indicate that our method improves the computation efficiency and the results can explain the cause and the regular patterns of the outliers well, which shows that our method is effective and meaningful.
Keywords/Search Tags:data mining, high-dimensional outliers, KNN distance, histogram, FP, association rules among outliers
PDF Full Text Request
Related items