Font Size: a A A

Study On Graph Sampling Algorithm For Graph Clustering Characteristic

Posted on:2017-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q QiFull Text:PDF
GTID:2308330482499745Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the era of high information technology, the rapid development of social networks, such as Facebook, Twitter, Tencent and other social networks have greatly promoted the social services and marketing, but also to promote people’s social network data mining research. These large scale networks can be abstracted as a graph with a large number of nodes and edges. Research on large scale graph can help users to get the information of the actual network, which is of great significance to large scale network analysis.As one of the characteristics of the graph, the clustering property is the coefficient of the degree of node aggregation in a graph. Through the clustering coefficient of the graph clustering analysis in the field of network data mining and network community classification has a wide range of applications, but also the current research focus of researchers.Due to the high complexity of large-scale graph space, large scale graph analysis method based on memory is not feasible in time complexity. To this end, the researchers put forward a large number of graph summary technology, and then approximate to obtain large-scale analysis results. In order to efficiently compute the clustering coefficients of large scale graphs, a typical approach is to sample a representative sub network (sample graph) from a large complex network. Figure sampling provides an effective and low cost approach to social network analysis. However, sampling figure aggregation coefficient with the original clustering coefficient distributed nonlinear variation, distort the aggregation properties of the original image. In view of this, this paper focuses on the sampling map of the aggregation coefficient to maintain the technology, including:First of all, this paper for the original clustering coefficient formula in the application of sampling plan, acquisition of accurate degree will by the acquisition the influence of the degree of node size and different deviation, modified clustering coefficient formula is proposed reduces the clustering coefficient dependence on node degree.Secondly, the data flow graph sampling algorithm is proposed based on the modified clustering coefficient. In dealing with the huge amount of data the data flow network in the form of graph and through this algorithm only read a data stream of the collected samples, is sub graphs, save the acquisition of multiple samples and multiple sampling inconvenience. And each sub graph are calculates how much we want to study the clustering coefficient, the sub image are calculated, the results are more in line with the original, thus obtained the clustering coefficient will be more in line with the original.At the end of this paper, through the experiment, verify the modified clustering coefficient formula and optimization of graph sampling algorithm, the experimental data obtained with the original more matching and data space is greatly reduced, accuracy has also been well represented.
Keywords/Search Tags:graph sampling, clustering coefficient, data flow, social network, Large graphs
PDF Full Text Request
Related items