Font Size: a A A

Research On Big Data Clustering Based On Complex Network

Posted on:2018-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2370330518966956Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of communication technology and IT technology,the scale of the network continues to expand and the structure of the network becomes more and more complicated,which produces massive information data,that is,Big Data.The emergence of big data makes the transition from the information age of human technology to the era of big data.In the era of big data,network data often shows the characteristics of complexity,diversity and heterogeneity.In the real network,the community structure(which is also known as clustering characteristics)is an important feature of complex network big data.The connection in the community is relatively close,and the connection between the community is relatively sparse.Community structure is the key fundament and an important procedure for analyzing big data of network,which has important research value and scientific significance.At present,community detection becomes one of the most challenging research themes in many fields such as data mining.This dissertation mainly analysis and studies on homogeneous network and heterogeneous network community detection algorithm,with the main research work as follows:(1)In order to effectively exploit the overlapping community structure in complex networks,this dissertation proposes a kind of overlapping community detection based on the connection similarity of maximum clique.The algorithm introduces an idea of maximum clique to initialize the community structure of the network,and quantifies the connectivity between the communities according to the shared neighbor nodes and the inter-cluster bridges.On this basis,all cliques are merged to get a rational structure of overlapping community.The rationality of the proposed algorithm is tested on four real network datasets through comparing with CPM algorithm.The experimental results show that the overlapping community structure got by the proposed algorithm is reasonable and the accuracy of the network mining community structure is improved in terms of accuracy,coverage and modularity.Thus,this proposed algorithm in the dissertation is a kind of effective overlapping community detection algorithm.(2)Aiming at the problem that traditional homogeneous network community detection algorithm can not make full use of heterogeneous information,this dissertation proposes a heterogeneous network community detection algorithm based on semantic path,which takes full account of the information contained in heterogeneous nodes and edges in the network.Firstly,the semantic path is selected by the FindPath method.Then,the similarity matrix of the objects under different semantic paths is extracted.Finally,the object features of different semantic paths are extracted and merged,and the K-Means algorithm is used to get the finalresult of community division.According to the experiment that carried out on the real data set.the experimental results show the effectiveness of the algorithm.(3)In the community detection algorithm for heterogeneous networks,the original structure and information of heterogeneous networks can not be adequately retained,and the heterogeneous nodes belonging to the same community are rarely considered.Aiming at the above problems,a heterogeneous network community detection algorithm is proposed in the dissertation,which introduces the maximum bipartite clique theory: Firstly,regarding the largest maximum bipartite clique that the key node belongs to as initial community.Then,the community is expanded based on the similarity between the neighbor node of the community and the initial community in quantitative.Finally,a reasonable community structure is mined and the simulation experiments are carried out on artificial heterogeneous networks and real heterogeneous networks.The experimental results show that the algorithm has relatively high community accuracy and modularity in community detection,which proves the rationality and validity of the algorithm.
Keywords/Search Tags:Big Data, Complex Networks, Community Detection, Heterogeneous Networks, Maximum Clique
PDF Full Text Request
Related items