Study On Outlier Detection In Subspace

Posted on:2011-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2178330338491056

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Outlier detection has become a hot issue in the field of data mining. With the constant expansion of its scope of application, traditional outlier detection algorithms have encountered a biggest obstacle that they can not meet the high-dimensional data characteristics. For this problem, researchers proposed several methods. In these methods, subspace mining is an effective method for high dimensional data mining. In currently proposed subspace outlier detection algorithms, there are still many problems. For instance: the accuracy of these algorithms is low; to select the algorithm's parameters is difficult which lead to an unstable result; and so on. This paper mainly for the above problems does some research on subspace outlier detection algorithm.Firstly, the outlier detection in axis-parallel subspaces of high dimensional data (SOD) algorithm is introduced. For the deficiency of this algorithm, an improved algorithm is proposed. On the one hand, through quantifying the aggregation of each dimension, the reference value of each dimension can be fixed, thus reducing the parameter settings'impact on algorithm results. On the other hand, using the relative distance to show the degree of deviation is convenient for detecting outlier in different densities subspace.Secondly, because the number of cluster in data set is unknown, so combined with Gini-entropy, the relevant subspace measure based on Gini-entropy is proposed. And the relevant subspace outlier degree is defined. Based on these, a new outlier detection algorithm RSOD based on relevant subspace is proposed. This algorithm reduces the requirements of priori knowledge of data set. It is not limited by the number of clusters in data set. Whether the data set contains one or more than one cluster, the algorithm can effectively select relevant subspace and detect outliers.Finally, four data sets which contain synthetic data set and real data set are used to validate the two algorithms proposed in this paper.

Keywords/Search Tags:

Data Mining, Outlier, High-dimensional data, Subspace, Entropy

PDF Full Text Request

Related items

1	Optimal Subspace Outlier Mining Algorithm Based On Entropy Increment And Local Attribute Weighting
2	Research On Outlier Detection Algorithm For High-Dimensional Data Based On Angle And Entropy
3	Research On Algorithms For Subspace Clustering And Outlier Mining Based-on Information-entropy
4	Local Outlier Data Mining And Application-related Subspace
5	High-dimensional data mining: Subspace clustering, outlier detection and applications to classification
6	Research On Outlier Detection Algorithm For High Dimensional Big Data
7	Analysis And Research Of Outlier Detection Algorithm For High Dimensional Data
8	Outlier Detection Methods For Complex Data Types
9	The Research On A Few Key Issues In High Dimensional Data Mining
10	Research On Outlier Data Mining In High Dimensional Space