Font Size: a A A

Research And Implementation Of Acceleration Technologies For Big Data Platform

Posted on:2020-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:L G XuFull Text:PDF
GTID:2428330596475452Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The main processing flow of big data systems can be divided into five parts: data collection,data forwarding,data processing,data storage and data visualization.Among these core steps,the flow of computing platform represented by Flume,Kafka and Flink has been widely used in industry and applied to many production environments.Despite this,there are still various problems with these related technologies in different business scenarios.In this thesis,we focus on these points and make targeted optimizations for Flume,Kafka,and Flink.Firstly,for the problem that Flume needs to run on the client server,occupy server resources,and compete with the host server,this thesis designs a message collection system based on non-JVM process.Replacing the traditional Flume with this system can effectively reduce the client server resources occupied by Flume and improve the performance of the data collection phase.For the characteristics of LSM and Kakfa message reading and writing,this thesis proposes a caching strategy based on the C0 layer of LSM tree to improve the read and write performance of Kafka.Secondly,in order to solve the problem that migrating Kafka to SSD can not give full play to the characteristics of flash memory,this thesis introduces a mechanism based on the new flash media called Open Channel SSDs to replace Kafka's original persistence mechanism.This mechanism uses the idea of key-value separation to solve the problem of LSM tree read-write amplification,and uses the improved LSM tree(Log Structured Merge Tree)as the storage engine for Kafka persistence.In addition,the messages processed within the Kafka cluster are prioritized,and an IO priority scheduling scheme based on dynamic negative feedback to optimize the processing performance of the internal messages of the cluster.Finally,in order to solve the problem that the load balancing strategy provided by Flink itself is not very perfect,this thesis proposes a dynamic load balancing adjustment mechanism.The mechanism performs effective prediction by counting the load status information of the historical nodes,and uses the prediction result as the basis for load migration between nodes,thereby improving the overall execution performance of the Flink cluster.This thesis focuses on accelerating Flume,Kafka and Flink for big data computing platforms.For the five steps of big data processing,this thesis finds out the problems that can affect the overall performance,and selects appropriate solutions for these problems to further optimize,so as to improve the overall performance of the big data streaming computing system.
Keywords/Search Tags:Big data, Data collection, Data processing, Data forwarding, performance optimization
PDF Full Text Request
Related items