Font Size: a A A

Design And Implementation Of Data Visualization System For Distributed Offline Computing Platform

Posted on:2019-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q WangFull Text:PDF
GTID:2348330569488928Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the advent of the era of big data,various cluster computing frameworks have emerged.The allocation of resources for various computing frameworks and job scheduling,and the visualization of data generated by this process,are issues that large-scale Internet companies urgently need to solve.Therefore,this thesis designed and implemented a scalable PaaSWeb visualization system,combined with Baidu's urgent need for data visualization of distributed offline computing platform Neptune.This thesis studied Baidu's Neptune platform and the characteristics of it's distributed data,adopted RPC technology to obtain remote data,and performed data cleaning according to visual requirements.Then built a cache,and updated it regularly.At the same time,statistical data was constructed,and general information was extracted from the configuration file,in order to enhance the usability and extensibility of the system.Finally,the visualization of job execution data and resource scheduling data of the distributed offline computing platform was realized,achieved the research purpose of providing auxiliary decision-making for the overall allocation of company resources.The main work of this thesis is as follows:Firstly,this thesis introduced the research background and research significance,discussed the Baidu proper nouns involved in this thesis,and the problems faced by Baidu's offline distributed computing resources,as well as the countermeasures adopted.And briefly described the research content and structure of this thesis.Secondly,the system architecture,operating principle and data analysis of the distributed offline computing platform Neptune were described briefly.Based on this,the requirements and feasibility of the PaaSWeb system were analyzed,the system architecture and functional module partitioning were proposed,and the interface for visual display was designed.Thirdly,the specific design and implementation of PaaSWeb system were discussed.The work can be divided into three parts:(1)Data acquisition and cleaning.Including the development of the Protobuf communication protocol,the encapsulation of client-side RPC functions,the acquisition of raw data produced by job execution,and the encapsulation of local interfaces for data cleansing;deserializing the meta file on the disk and reading the persistent history data of job execution;building a filter rule,dynamically generating the database table name,and obtaining the resource scheduling data in the Logstash library and cleaning it;acquiring the resource scheduling data by invoking the Casio system interface.(2)Storing data in a double buffered queue and updating it periodically to solve the problem of the high generation speed of job execution data and the concurrent access issues.(3)Encapsulating the acquired data as a Json API,then reading the data in the API through Ajax,and using the Bootstrap and Echars framework to achieve data visualization.Finally,the system operating environment was set up,and the operation effects were analyzed and displayed.The results of system operation show that the PaaSWeb system in this paper can intuitively view and analyze the job execution status and resource usage of the distributed offline computing platform.This proves the effectiveness and practicability of this research application.
Keywords/Search Tags:Data visualization, Distributed offline computing, RPC, Logstash, Scalable configuration, Double buffered queues
PDF Full Text Request
Related items