Font Size: a A A

Design And Implementation Of Distributed Social Analysis System

Posted on:2018-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z X GuoFull Text:PDF
GTID:2348330518995901Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of online social and big data analysis, the value of social analysis work is growing, and also cause for concern. The modularization and functionalization of social analysis are worth studying and realizing deeply. In the era of big data, the scale of social data is growing. It is difficult for the traditional social analysis tools and methods to support large data storage and computing. At the same time, Hadoop,Spark and other distributed platforms make the big data storage and computing can be very simple to achieve distributed, while providing good fault tolerance. These distributed tools provide new ideas and methods for large-scale social analysis. Distributed analysis of social analysis tools is a major trend, and now the overall system for social analysis or solutions are also rare.This thesis presents a bottom-up distributed system for large-scale social analysis, which involves data storage, computation, algorithm implementation, application, performance analysis, etc. This system can support analyzing the different types of data with different social analysis methods. At the same time, the performance analysis function of thissystem itself can find out the performance bottleneck of different workloads in the system in time, which can provide guidance for system parameter optimization and application optimization.This thesis starts from the overall architecture of the system, and makes the hierarchical design of the distributed social system. It is divided into five layers from bottom to top: data storage layer, distributed computing layer, social analysis engine, application layer and performance analysis layer. Taking into account all aspects of social analysis, we compared the strengths and weaknesses of different technologies, for each level, the technical selection. In the social analysis engine layer, we divided the social function into four modules, and distributed the functions of each module. In order to solve the problem of data skew and shuffle data in the realization process, we proposed an effective solution Program. At the application level, we provide command-line tools and online interactive analysis tools. In the performance analysis layer, we can not only monitor the resource utilization of the cluster in real time, but also propose aquantization method of performancebottleneck, and design the BRP tool to get performance bottleneck ratios of workloads. Furthermore, we designed a set of benchmark test tool TrainBench to generate training data and model the performance bottleneck ratio. Finally, we get a general performance bottleneck model G-BRM.
Keywords/Search Tags:distributed system, social analysis, performance bottleneck
PDF Full Text Request
Related items