Design And Implementation Of Data Visualization System For Distributed Offline Computing Platform

Posted on:2019-04-06

Degree:Master

Type:Thesis

Country:China

Candidate:Q Q Wang

Full Text:PDF

GTID:2348330569488928

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet and the advent of the era of big data,various cluster computing frameworks have emerged.The allocation of resources for various computing frameworks and job scheduling,and the visualization of data generated by this process,are issues that large-scale Internet companies urgently need to solve.Therefore,this thesis designed and implemented a scalable PaaSWeb visualization system,combined with Baidu's urgent need for data visualization of distributed offline computing platform Neptune.This thesis studied Baidu's Neptune platform and the characteristics of it's distributed data,adopted RPC technology to obtain remote data,and performed data cleaning according to visual requirements.Then built a cache,and updated it regularly.At the same time,statistical data was constructed,and general information was extracted from the configuration file,in order to enhance the usability and extensibility of the system.Finally,the visualization of job execution data and resource scheduling data of the distributed offline computing platform was realized,achieved the research purpose of providing auxiliary decision-making for the overall allocation of company resources.The main work of this thesis is as follows:Firstly,this thesis introduced the research background and research significance,discussed the Baidu proper nouns involved in this thesis,and the problems faced by Baidu's offline distributed computing resources,as well as the countermeasures adopted.And briefly described the research content and structure of this thesis.Secondly,the system architecture,operating principle and data analysis of the distributed offline computing platform Neptune were described briefly.Based on this,the requirements and feasibility of the PaaSWeb system were analyzed,the system architecture and functional module partitioning were proposed,and the interface for visual display was designed.Thirdly,the specific design and implementation of PaaSWeb system were discussed.The work can be divided into three parts:(1)Data acquisition and cleaning.Including the development of the Protobuf communication protocol,the encapsulation of client-side RPC functions,the acquisition of raw data produced by job execution,and the encapsulation of local interfaces for data cleansing;deserializing the meta file on the disk and reading the persistent history data of job execution;building a filter rule,dynamically generating the database table name,and obtaining the resource scheduling data in the Logstash library and cleaning it;acquiring the resource scheduling data by invoking the Casio system interface.(2)Storing data in a double buffered queue and updating it periodically to solve the problem of the high generation speed of job execution data and the concurrent access issues.(3)Encapsulating the acquired data as a Json API,then reading the data in the API through Ajax,and using the Bootstrap and Echars framework to achieve data visualization.Finally,the system operating environment was set up,and the operation effects were analyzed and displayed.The results of system operation show that the PaaSWeb system in this paper can intuitively view and analyze the job execution status and resource usage of the distributed offline computing platform.This proves the effectiveness and practicability of this research application.

Keywords/Search Tags:

Data visualization, Distributed offline computing, RPC, Logstash, Scalable configuration, Double buffered queues

PDF Full Text Request

Related items

1	The Design And Implementation Of A Task Scheduling And Monitoring Platform For Big Data Offline Applications
2	Design And Implementation Of Scalable Method For Data Visualization System Framework
3	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
4	Design And Implementation Of Data Processing And Visualization Based On Memory Compuing In Eole System
5	Nimbus: Scalable, distributed, in-memory data storage
6	Research And Development Of University Search Engine Based On Scalable Distributed Architecture
7	Cluster Computing And Visualization On The MASSIVE Grid Environment
8	Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments
9	Visual Analytics in Scalable Visualization Environments
10	Towards scalable and privacy-preserving integration of distributed heterogeneous data