Font Size: a A A

An Improvement To The Angle-based Outlier Detection Algorithm

Posted on:2016-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y W XuFull Text:PDF
GTID:2348330488972882Subject:Engineering
Abstract/Summary:PDF Full Text Request
Contrasting to most data objects in the database, an outlier doesn't fit general rules, and deviates from most groups. Their generating mechanisms are far more different from the normal. There have been a lot of technologies to detect outliers; their ideas of processing outliers are not all the same. There are many specific algorithms belonging to each detecting scheme, however, most detecting algorithms have shortcomings to some degree, i.e. the accuracy doesn't meet requirement, and the time complexity is pretty high.The angle-based outlier detection is a method which is used to detect outliers in high dimensional data sets. It is similar to the local outlier detecting method, known as LOF, which designates outlier factors to objects in data sets, and orders the result according to the outlier factor. The achievements in this paper:(1) Aiming at reducing the high time complexity in the angle-based outlier detection, an improved method has been proposed, i.e. neighborhood approximate improvement, which is based on the principle of the original. In order to calculate the approximate ABOF in the improvement, the thought of “neighborhood” in LOF is used for reference to inspect the anomaly degree of the observation in a certain range. Specifically, during the inspecting, the known distances are employed to decrease the number of data points which generate vectors as much as possible to promote the computation efficiency.(2) In order to reduce the time complexity further, a specific calculation scheme of detecting outliers is also proposed. The scheme prunes and then integrates the results of multi-parameter-set DBDCAN clustering to achieve a primary outlier data set. First, it selects neighborhoods in the original data set, and then applies the proposed neighborhood approximate improvement above to calculate the approximate angle-based outlier factor in the primary outlier data set.Simulating tests prove that the improved strategy and the calculation scheme have a great performance to process large and medium-scale high dimensional data sets with higher accuracy than classical LOF, and better efficiency than their original.
Keywords/Search Tags:data mining, outlier detection, angle, outlier factor, approximate calculation
PDF Full Text Request
Related items