Font Size: a A A

High-dimensional Data Clustering Method Research Based On The Super Network

Posted on:2016-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2308330470451340Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, large-scale network data with complex structureand multiple attributes of is increasingly accumulation. The common feature of these data is the"high-dimensional nature", such as a variety of e-commerce transaction data, web text data, andgene expression data. When the traditional clustering method used commonly inlow-dimensional space applied to the high-dimensional space, usually cannot achievesatisfactory clustering results. Therefore, to seek efficient high-dimensional data clusteringmethods become a hot topic in the field. How to overcome the high-dimensional data clusteringeffect of "dimension disaster" has become the research difficult.Also, with the development of the network and the explosion of high-dimensional data,there are many complex networks. The network consists of nodes and edges are numerous andcomplicated structure, the formation of the network is random and not follow certain rules, suchas the world’s biggest Internet, genetics, network, knowledge network, etc. In these cases, usingordinary network diagram does not depict the characteristics of the real world network, at thispoint, the super network model arises at the historic moment and can image characterizescomplex network composed of high-dimensional data.This paper conducted a series of studies focuses on high-dimensional data clusteringmethod based on super network, the main research work is as follows:1, Conduct a careful study and exploration based on the super network model and the mainhigh-dimensional data clustering method, and form the basic theoretical system as a foundationfor further study in later.2, The traditional clustering algorithm and super network model were studied, then animproved high-dimensional data clustering algorithm based on super-network is proposed. firstof all, the high-dimensional data is mapped to a mass weighted network; Second, define the edgeweights of super network; Again, using the optimization of the hypergraph partitioning methoddivided the weighted network; Finally realize the high-dimensional data clustering. This methodfilter out noise in clustering data effectively, avoid the traditional clustering methods in theprocess of dimension reduction of defects. Experiments show that the algorithm is idealeffectiveness and accuracy.3, In this paper, a detailed interpretation of the MapReduce model ang some of itsassociated algorithms are studied, analyzes the research status,。 For K-means algorithmexcessive dependence on initial cluster centers, such as the limitations of slow convergence andlow memory problems when dealing with huge amounts of data exist, This paper presents a new hybrid clustering algorithm super-k-means for large data set, will improve the high-dimensionaldata clustering algorithm based on the super network combined with k-means, And through theMapReduce parallel deployment running on Hadoop clusters. Finally achieve the ideal effect ofclustering. The experimental results show that the algorithm not only has good speed ratio andextension, the convergence and the clustering accuracy are improved.
Keywords/Search Tags:big data, super network, clustering, MapReduce
PDF Full Text Request
Related items