High-dimensional Data Clustering Method Research Based On The Super Network

Posted on:2016-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2308330470451340

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, large-scale network data with complex structureand multiple attributes of is increasingly accumulation. The common feature of these data is the"high-dimensional nature", such as a variety of e-commerce transaction data, web text data, andgene expression data. When the traditional clustering method used commonly inlow-dimensional space applied to the high-dimensional space, usually cannot achievesatisfactory clustering results. Therefore, to seek efficient high-dimensional data clusteringmethods become a hot topic in the field. How to overcome the high-dimensional data clusteringeffect of "dimension disaster" has become the research difficult.Also, with the development of the network and the explosion of high-dimensional data,there are many complex networks. The network consists of nodes and edges are numerous andcomplicated structure, the formation of the network is random and not follow certain rules, suchas the worldâ€™s biggest Internet, genetics, network, knowledge network, etc. In these cases, usingordinary network diagram does not depict the characteristics of the real world network, at thispoint, the super network model arises at the historic moment and can image characterizescomplex network composed of high-dimensional data.This paper conducted a series of studies focuses on high-dimensional data clusteringmethod based on super network, the main research work is as follows:1, Conduct a careful study and exploration based on the super network model and the mainhigh-dimensional data clustering method, and form the basic theoretical system as a foundationfor further study in later.2, The traditional clustering algorithm and super network model were studied, then animproved high-dimensional data clustering algorithm based on super-network is proposed. firstof all, the high-dimensional data is mapped to a mass weighted network; Second, define the edgeweights of super network; Again, using the optimization of the hypergraph partitioning methoddivided the weighted network; Finally realize the high-dimensional data clustering. This methodfilter out noise in clustering data effectively, avoid the traditional clustering methods in theprocess of dimension reduction of defects. Experiments show that the algorithm is idealeffectiveness and accuracy.3, In this paper, a detailed interpretation of the MapReduce model ang some of itsassociated algorithms are studied, analyzes the research status,ã€‚ For K-means algorithmexcessive dependence on initial cluster centers, such as the limitations of slow convergence andlow memory problems when dealing with huge amounts of data exist, This paper presents a new hybrid clustering algorithm super-k-means for large data set, will improve the high-dimensionaldata clustering algorithm based on the super network combined with k-means, And through theMapReduce parallel deployment running on Hadoop clusters. Finally achieve the ideal effect ofclustering. The experimental results show that the algorithm not only has good speed ratio andextension, the convergence and the clustering accuracy are improved.

Keywords/Search Tags:

big data, super network, clustering, MapReduce

PDF Full Text Request

Related items

1	Research On The Clustering Algorithm Of Parallel Partition Based On MapReduce
2	Research, Design And Application Of Clustering Algorithm Using Mapreduce
3	Parallel Clustering Algorithm Based On MapReduce
4	Research And Implementation Of Mapreduce-based Graph Clustering Algorithm
5	Research On Clustering Algorithms Of Location Big Data Based On MapReduce
6	Research And Application Of Clustering Mining Algorithm Oriented Big Data Based On MapReduce
7	MapReduce-enabled scalable nature-inspired approaches for clustering
8	Research On Parallelization Of Clustering Algorithm Based On MapReduce
9	Research On Parallelization Of Clustering Algorithm Based On Mapreduce
10	Research On Distributed Fast Clustering Algorithm Based On Mapreduce