Research And Implementation Of Acceleration Technologies For Big Data Platform

Posted on:2020-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:L G Xu

Full Text:PDF

GTID:2428330596475452

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The main processing flow of big data systems can be divided into five parts: data collection,data forwarding,data processing,data storage and data visualization.Among these core steps,the flow of computing platform represented by Flume,Kafka and Flink has been widely used in industry and applied to many production environments.Despite this,there are still various problems with these related technologies in different business scenarios.In this thesis,we focus on these points and make targeted optimizations for Flume,Kafka,and Flink.Firstly,for the problem that Flume needs to run on the client server,occupy server resources,and compete with the host server,this thesis designs a message collection system based on non-JVM process.Replacing the traditional Flume with this system can effectively reduce the client server resources occupied by Flume and improve the performance of the data collection phase.For the characteristics of LSM and Kakfa message reading and writing,this thesis proposes a caching strategy based on the C0 layer of LSM tree to improve the read and write performance of Kafka.Secondly,in order to solve the problem that migrating Kafka to SSD can not give full play to the characteristics of flash memory,this thesis introduces a mechanism based on the new flash media called Open Channel SSDs to replace Kafka's original persistence mechanism.This mechanism uses the idea of key-value separation to solve the problem of LSM tree read-write amplification,and uses the improved LSM tree(Log Structured Merge Tree)as the storage engine for Kafka persistence.In addition,the messages processed within the Kafka cluster are prioritized,and an IO priority scheduling scheme based on dynamic negative feedback to optimize the processing performance of the internal messages of the cluster.Finally,in order to solve the problem that the load balancing strategy provided by Flink itself is not very perfect,this thesis proposes a dynamic load balancing adjustment mechanism.The mechanism performs effective prediction by counting the load status information of the historical nodes,and uses the prediction result as the basis for load migration between nodes,thereby improving the overall execution performance of the Flink cluster.This thesis focuses on accelerating Flume,Kafka and Flink for big data computing platforms.For the five steps of big data processing,this thesis finds out the problems that can affect the overall performance,and selects appropriate solutions for these problems to further optimize,so as to improve the overall performance of the big data streaming computing system.

Keywords/Search Tags:

Big data, Data collection, Data processing, Data forwarding, performance optimization

PDF Full Text Request

Related items

1	Research And Application Of Performance Optimization Mechanism For Big Data System
2	Design And Implementation Of Police Communication Data Processing Platform Based On Big Data Technologies
3	Research On Job Runtime Characteristics Based Performance Optimization In Big Data Processing System
4	AUV Relay Position And Performance Optimization For Underwater Data Collection
5	Data Collection And Processing System For A Ship Structural Health Monitoring
6	Research On Techniques And Systems For Big Data Processing
7	The Design And Implementation Of A Data Forwarding Engine Based On MPLS
8	The Study And Application Of Data-collection And Data Processing Based On ISO15693 Technology
9	Research On Methods Of Performance Optimization And Energy Saving In Big Data Processing System
10	Precision Data Acquisition Of Laser 3D Imaging Radar