Font Size: a A A

Parallel Massive Data Processing Platform Based On Graph Computing

Posted on:2021-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:W Q ZhouFull Text:PDF
GTID:2428330620964248Subject:Engineering
Abstract/Summary:PDF Full Text Request
The development of the Internet has produced a lot of data,and the technology of data analysis and data mining based on big data has gradually developed.The traditional data quantity is small and the data format is single.Generally,a single server is used to mine or calculate the data.However,with the rapid increase of Internet data volume and data format,the traditional data processing system is not efficient and fast for a variety of data processing,lack of a more general distributed data processing platform.Therefore,how to design a basic distributed data processing platform for different data formats,including computing and storage functions,has become a problem worthy of study.In this thesis,a distributed massive data computing platform based on graph computing is designed,which can perform distributed computing and storage for various formats of data including graph data,and can complete the user-defined abstraction and processing of various data and tasks,complete task analysis,task execution,task scheduling,data storage and other functions.The main work is as follows:1)A distributed parallel massive data processing platform is designed and implemented.The GraphMaster node is responsible for system task scheduling and resource management,and the GraphSlave execution node and GraphWorker computing node are responsible for task execution and resource statistics.2)A relay data management model of distributed system is designed.The whole process of graph calculation is defined by user-defined dynamic link library and graph data execution flow topological structure file.The specific business code and calculation platform system are decoupled to realize the common use of data processing platform.Design a consistent hash disk storage protocol model to provide an efficient and universal distributed storage engine for the system.The consistency protocol of primary and secondary nodes is designed to improve the reliability of GraphMaster node.3)A resource allocation and scheduling algorithm model is designed,which includes system initialization resource scheduling algorithm,resource reconfiguration scheduling algorithm and disaster recovery scheduling algorithm.The algorithm modelcan dynamically schedule tasks according to the use of system hardware resources,detect system server or task execution exceptions and schedule system disaster recovery.4)Build and test the whole distributed system,test the function and performance of all key modules of the whole system,and analyze the test results in detail.This thesis provides a general distributed processing platform for various types of data,including graph data.It has the characteristics of good fault tolerance,reasonable resource scheduling,large network throughput and strong universality.It provides a design scheme and solution for the calculation and storage of various types of massive data.
Keywords/Search Tags:Distributed System, Relay Data Management, Resource Scheduling Algorithm Model, Graph Data Computing
PDF Full Text Request
Related items