Font Size: a A A

Design And Implementation Of Social Analysis System Based On Spark

Posted on:2018-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y C CuiFull Text:PDF
GTID:2348330518994427Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of information technology led to the rapid development of all walks of life, which is making the emergence of a large number of complex network data. And these data are difficult to managed and analyzed. Therefore, how to deal with the analysis of large-scale complex network is worth studying. In recent years, the rapid development of large data platform Spark provides a new way for the analysis of complex networks. The Spark based memory performs well in terms of iteration, it can improve the operational efficiency greatly in the graph calculation. This paper designs and implements a social network analysis system based on Spark, with regards to this, the main work of this paper is as follows:This paper realizes the parallel computing of nine representative indexes, such as clustering coefficient, HITS and network density, in social network analysis. With Graphx, complex network data are stored as graph structures represented by edge RDD and node RDD. This distributed structure is easy to operate and calculate, in the case of large data it can be completed quickly graph analysis and evaluation. In addition, This paper implements parallelization of three classical community detection algorithms such as LPA, BGLL and MNS. Their iteration computations are large, Spark RDD’s cache mechanism reduces the I/O operations in multiple iterations, thus reducing the running time of the algorithm. Therefore, the Spark-based community found that the parallel algorithm has better performance.In this paper, the social network analysis system is designed in detail,including the basic system management design, component function design and visual display design. It realizes the format conversion, the management function of the data and the workflow, because the current social network analysis data format is not uniform. The social network analysis index and the community detected algorithm are implemented in this paper, and they are unified and managed by the way of workflow. The ECharts and D3.js technologies are used to display the results and display the six layouts.In this paper, module testing and algorithm parallelization testing are carried out. Module test results show that the system can deal with the data effectively and provide a wealth of visual display. The results of the parallelization test show that the achieved algorithm has good parallel performance and it can meet the performance requirements of social network analysis in large data context.
Keywords/Search Tags:Social Analysis, Spark, Graphx, Parallelization
PDF Full Text Request
Related items