Parallel Massive Data Processing Platform Based On Graph Computing

Posted on:2021-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:W Q Zhou

Full Text:PDF

GTID:2428330620964248

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The development of the Internet has produced a lot of data,and the technology of data analysis and data mining based on big data has gradually developed.The traditional data quantity is small and the data format is single.Generally,a single server is used to mine or calculate the data.However,with the rapid increase of Internet data volume and data format,the traditional data processing system is not efficient and fast for a variety of data processing,lack of a more general distributed data processing platform.Therefore,how to design a basic distributed data processing platform for different data formats,including computing and storage functions,has become a problem worthy of study.In this thesis,a distributed massive data computing platform based on graph computing is designed,which can perform distributed computing and storage for various formats of data including graph data,and can complete the user-defined abstraction and processing of various data and tasks,complete task analysis,task execution,task scheduling,data storage and other functions.The main work is as follows:1)A distributed parallel massive data processing platform is designed and implemented.The GraphMaster node is responsible for system task scheduling and resource management,and the GraphSlave execution node and GraphWorker computing node are responsible for task execution and resource statistics.2)A relay data management model of distributed system is designed.The whole process of graph calculation is defined by user-defined dynamic link library and graph data execution flow topological structure file.The specific business code and calculation platform system are decoupled to realize the common use of data processing platform.Design a consistent hash disk storage protocol model to provide an efficient and universal distributed storage engine for the system.The consistency protocol of primary and secondary nodes is designed to improve the reliability of GraphMaster node.3)A resource allocation and scheduling algorithm model is designed,which includes system initialization resource scheduling algorithm,resource reconfiguration scheduling algorithm and disaster recovery scheduling algorithm.The algorithm modelcan dynamically schedule tasks according to the use of system hardware resources,detect system server or task execution exceptions and schedule system disaster recovery.4)Build and test the whole distributed system,test the function and performance of all key modules of the whole system,and analyze the test results in detail.This thesis provides a general distributed processing platform for various types of data,including graph data.It has the characteristics of good fault tolerance,reasonable resource scheduling,large network throughput and strong universality.It provides a design scheme and solution for the calculation and storage of various types of massive data.

Keywords/Search Tags:

Distributed System, Relay Data Management, Resource Scheduling Algorithm Model, Graph Data Computing

PDF Full Text Request

Related items

1	Resource Scheduling For Wireless Multihop Relay Networks
2	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
3	Distributed Computing Application Oroented Resource Scheduling Mechanisms In Optically Interconnected Data Center
4	Research And Implementation Of A Data Resource Management Platform For Strategic Consulting Based On Graph Database
5	Research On Grid Scheduling
6	Distributed Data Process In Graph Database
7	Research On Resource Scheduling Technology Of Tracking And Data Relay Satellite System
8	Joint Scheduling Of Data And Computation In Geo-distributed Cloud Systems
9	Research On Data Extraction And Distributed Graph Data Management
10	Research And Implementation Of Distributed Data Center Resource Scheduling Algorithm