Local sparsity coefficient-based mining of outliers

Posted on:2003-01-26

Degree:M.Sc

Type:Thesis

University:University of Windsor (Canada)

Candidate:Agyemang, Malik

Full Text:PDF

GTID:2468390011481664

Subject:Computer Science

Abstract/Summary:

The process of discovering interesting, useful and previously unknown knowledge from very large databases is known as data mining. Outlier or exception mining focuses on the problem of finding patterns that apply to a small percentage of data objects. Outliers are observations that show different characteristics from all other data objects to arouse suspicion that they were generated by a different mechanism. Density-based algorithms for mining outliers are the most effective in finding all forms of outliers. Density-based algorithms determine outliers based on the concentration of data objects at a location and declare objects with few neighbours as outliers. However, existing density-based algorithms have the following drawbacks: (1) computing the local reachability distance and density for every object before the few outliers are found; (2) computing local outlier factor (LOF) for every object in the dataset before declaring those with very high LOF as outliers. These are very expensive computations since outliers form only a small fraction of the entire data.; This thesis proposes Local Sparsity Coefficient (LSC) and Enhanced Local Sparsity Coefficient (ELSC) algorithms based on the distance of an object and those of its k-nearest neighbours without computing reachability distance and density of every object. This reduces the number of computations and comparisons in LOF technique. In ELSC, data objects that can not possibly contain outliers are pruned (removed) based on their neighbourhood distances. The remaining set constitutes the candidate set on which outliers are determined resulting in an improved performance over LSC and LOF.

Keywords/Search Tags:

Outliers, Mining, Local sparsity, LOF, Data

Related items

1	Research On Outliers Mining Method To Web Content
2	The Local Outlier Mining Algorithm Based-on Conditional Cumulative Holoentropy And Global Neighbourhood
3	Study And Improvement Of Local Outliers Mining Based On Density
4	Research Of Outliers Mining Applied In Snort System Improvement
5	Study Of Mining Outliers Based On Interestingness
6	A Research On Outliers Mining Algorithm Based On Heat Metering Data
7	Mining Association Rules Among Outliers Based On Histogram And FP-growth
8	Optimal Subspace Outlier Mining Algorithm Based On Entropy Increment And Local Attribute Weighting
9	Research On The Outliers Detection Algorithm
10	A Study On Local Outliers Mining Algorithm Based On Weighted-Attribute