Fractal Analysis Of Datasets Using Distributed Computing

Posted on:2017-08-16

Degree:Master

Type:Thesis

Country:China

Candidate:Iakovleva Tatiana

Full Text:PDF

GTID:2348330566956137

Subject:Software Engineering

Abstract/Summary:

PDF Full Text Request

Clustering is a key issue in the advanced application.There are many existing clustering algorithms,but not all of them can work well with large datasets.If we apply clustering algorithms to databases the amount of dimensions can be too big.With the increasing dimensionality,the amount of data to be processed is greatly increasing and hence time costs are also increasing.This is a well-known problem which called curse of dimensionality.Dimensionality reduction is very important for data mining.First,I present a new algorithm of reducing dimensionality of a big dataset based on fractal theories and using Map Reduce.I will apply my algorithm to a big dataset to reduce its dimensionalityAs the second step I consider one of clustering algorithms to show how we can use the results obtained by applying my algorithm.One of the famous clustering algorithms is the Method Halite for Correlation Clustering.This algorithm is a fast and scalable density-based clustering algorithm for multi-dimensional data and it is able to analyze large data sets,which suggested Robson L.F.Cordeiro,Caetano Traina and Christos Faloutsos.In this paper I also propose two additions to the method: a)for determining specific densities in space as candidates for the center of a cluster to use Laplacian filters for the cells located on the edges and corners of the space with special values of the convolution Laplacian masks,as it used for example for processing photographic images.This will eliminate false triggering of the filter at the borders and corners of the space used.b)when analyzing the boundaries of a cluster,for the selection of the relevant axes of the cluster,when using method MDL(Minimum Description Length)do not use the suggested model of the function "by average of all axes",implying at some relationship between them,which maybe does not exist,but to use a discrete function,with a threshold equal to the probability of the density of an analyzed region.The main contribution of this work is the proposal of an approach to reduce the dimensionality of dataset,using the concept of fractal dimension and distributed computing,as well as interesting suggestions for improvement the grid-based clustering algorithm Halite.

Keywords/Search Tags:

clustering, big datasets, dimensionality reduction, database, fractals, MDL, Laplacian mask

PDF Full Text Request

Related items

1	Research On Class-Preserving Laplacian Eigenmaps For Dimensionality Reduction
2	Research On Dimensionality Reduction And Indexing Algorithm Of Multimedia Database And System Implementation
3	Research Of Applying Dimensionality Reduction Algorithms And Relevance Feedback To Multimedia Image Retrieval
4	The Out-of-Sample Problem Of Laplacian Eigenmaps And Regularized Dimensionality Reduction
5	A Perception-Driven Approach To Supervised Dimensionality Reduction For Visualization
6	Multi-label Learning Based On Dimensionality Reduction
7	Research On Visualization And Clustering Of Standard Synthetic Biology Parts Based On Nonliner Dimensionality Reduction
8	Semi-Supervised Clustering And Dimensionality Reduction With Their Applications
9	Research On The Manifold Based Locally Dimensionality Reduction Algorithms
10	Research Of Dimensionality Reduction And Clustering Based On Constraint Weight Learning And Dictionary Learning