An Improvement To The Angle-based Outlier Detection Algorithm

Posted on:2016-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Xu

Full Text:PDF

GTID:2348330488972882

Subject:Engineering

Abstract/Summary:

Contrasting to most data objects in the database, an outlier doesn’t fit general rules, and deviates from most groups. Their generating mechanisms are far more different from the normal. There have been a lot of technologies to detect outliers; their ideas of processing outliers are not all the same. There are many specific algorithms belonging to each detecting scheme, however, most detecting algorithms have shortcomings to some degree, i.e. the accuracy doesn’t meet requirement, and the time complexity is pretty high.The angle-based outlier detection is a method which is used to detect outliers in high dimensional data sets. It is similar to the local outlier detecting method, known as LOF, which designates outlier factors to objects in data sets, and orders the result according to the outlier factor. The achievements in this paper:(1) Aiming at reducing the high time complexity in the angle-based outlier detection, an improved method has been proposed, i.e. neighborhood approximate improvement, which is based on the principle of the original. In order to calculate the approximate ABOF in the improvement, the thought of “neighborhood” in LOF is used for reference to inspect the anomaly degree of the observation in a certain range. Specifically, during the inspecting, the known distances are employed to decrease the number of data points which generate vectors as much as possible to promote the computation efficiency.(2) In order to reduce the time complexity further, a specific calculation scheme of detecting outliers is also proposed. The scheme prunes and then integrates the results of multi-parameter-set DBDCAN clustering to achieve a primary outlier data set. First, it selects neighborhoods in the original data set, and then applies the proposed neighborhood approximate improvement above to calculate the approximate angle-based outlier factor in the primary outlier data set.Simulating tests prove that the improved strategy and the calculation scheme have a great performance to process large and medium-scale high dimensional data sets with higher accuracy than classical LOF, and better efficiency than their original.

Keywords/Search Tags:

data mining, outlier detection, angle, outlier factor, approximate calculation

Related items

1	Research And Application Outlier Detection Method Based On Density&Distance
2	Research And Application Of Outlier Detection Algorithm
3	Outlier Mining Algorithm Research And Application
4	Research On Outlier Mining Method Based On Deviation Characteristic
5	Outlier Mining Method Based On Gini Indexes And Sub-space Research
6	Research And Application Of Outlier Data Mining Algorithm Based On Deep Forest
7	The Outlier Detection Algorithm Based On Decision Outlier Factor And Markov Model
8	Study On Spatial Outlier Mining
9	Research Of Detection Outlier Based On Outlier Degree
10	Analysis And Research Of Outlier Detecting Algorithm Based On Ensemble Methods