Local Outlier Data Mining And Application-related Subspace

Posted on:2015-03-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Li

Full Text:PDF

GTID:2268330428477817

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of technology, data is exploding which makes mankindenter the era of big data. However, with the rapid growth and expanding the dimensions ofthe data, how to high quality and high efficiency obtain the desired information from largehigh-dimensional data has become a hot research field of data mining. In this paper,outlier data mining algorithms based on the related subspace with MapReduceprogramming model have been studied. The main results are as follows:1)A local outlier data mining algorithm based on the related subspace is presentedwhich adopts the measure factor of local sparse differences and local density differences.The algorithm determines local data set of each data object of the data set according toK-NN, and generates global and local sparse factor matrix according to the sparse factorof attribute values, that effectively reflects the degree of local sparse of data objects. Aftercomputing the local sparse difference factor of a data object’s attribute dimensions, thedata object‘s subspace definition vector can be derived from the local sparse factor matrix.In doing so, our algorithm is able to characterize data object’s arbitrarily related subspaces,which is used to determine the data object’s local density difference expressed as aGaussian error function. As a result, the "dimension disaster" effect can be significantlyalleviated. Outlier measurement in a related subspace is independent of a dataset’sdimension. The data objects’ outlierness can be measured from the perspective of anyrelevant subspace. Otherwise, the data object’s local density differences is set to zero toindicate that the object is a normal data. Data objects with the maximum local densitydifference (outlier degree) are selected as local outliers. Finally, UCI and stellar spectraldata sets are used to verify the effectiveness of the algorithm.2)A parallel local outlier minning algorithm based on the related subspace withMapReduce programming model is proposed. Firstly, the parallelization of PLOF isanalyzed and the implement with MapReduce is given; followed by a parallel miningalgorithm of local outliers based on MapReduce programming model is proposed whichadopts LSH distributed strategy. Finally, artificial data sets and stellar spectral data sets areused to verify the effectiveness of the algorithm, scalability, and scalability of the parallelalgorithm. 3) Based on the above research results, we design and implement of the visualizationprocess of astronomical spectra outliers mining based on the related subspace with JDKdevelopment tools, and describe the implementation techniques in detail. So as to providea new way for finding the unknown special objects.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Outlier Detection Methods For Complex Data Types
2	Efficient Top-n Local Outlier Detection For Big Data
3	Research On Local Outlier Detection Algorithm Based On Subspace
4	Research And Improvement Of Local Outlier Detecting Algorithm Based On Density
5	Optimal Subspace Outlier Mining Algorithm Based On Entropy Increment And Local Attribute Weighting
6	Study On Outlier Detection In Subspace
7	Research On Outlier Detection In Data Stream Based On Density
8	Research On Outlier Detection Algorithm For High Dimensional Big Data
9	Research Of Subspace-clustering Algorithms Based On Density Over High-dimensional Data
10	Analysis And Research On Density-based Local Outlier Detection