Research On Performance Tuning Algorithm Of Apache Kafka

Posted on:2021-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:W K Xie

Full Text:PDF

GTID:2428330614963939

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Kafka,as a high-throughput,low-latency,high-fault-tolerant message queue system,is used to accepting requests from both senders and receivers,as a data buffer pool,and as persistent data.However,when Kafka is connected to large-scale Io T sensors,the data may be tilted,which will lead to hot issues,which will cause slow data transmission,abnormal resource occupation,and even downtime.This thesis focuses on the hot issues of Kafka in large-scale docking producers and discusses the following three aspects.First,there are some problems with uneven data distribution when Kafka is connected with largescale producers.1)The calculation of the clustering algorithm is complicated.2)Insufficient utilization of cluster resources is analyzed.3)SDG(Sensor Dependency Graph)is distributed by the secondary clustering algorithm.The proposed method reduces the similarity calculation between the clusters.Also,it reduces the SDG construction and its complexity of calculation.The simulation results show that,compared with the classic SDG and traditional hierarchical clustering algorithms,the proposed DASDG improves the Kafka throughput performance,reduces the resource consumption of the Kafka server,and improves the clustering time compared to SDG clustering.Secondly,in response to the Kafka tuning problem in distributed clusters,a sampling-based Kafka adaptive performance tuning algorithm ENLHS is proposed.First of all,Latin hypercube sampling is used to generate a data set,and this data set is used for performance model training.Using the elastic network model to fit the data set to improve the Latin hypercube sampling,iterate the optimal solution,and then obtain the result of the optimal performance solution.The acorresponding experimental results show that the configuration set collected by the ENLHS algorithm can improve Kafka's throughput performance,reduce latency,and have smaller errors.At last,in this dissertation,We designed and implemented a middleware prototype system based on Kafka.When the system is connected to large-scale producers,producer data can be effectively distributed,and Kafka can maintain efficient and stable operation in the cluster through adaptive performance optimization.By comparing the data imbalance of the system by default and the Kafka performance in the default configuration environment,the prototype system proposed in this dissertation can outperform the open source version of Kafka in terms of throughput performance,latency,and data skew.

Keywords/Search Tags:

Message Queue, Kafka, Hot Spot, Sensor Dependency Graph, Weighted Sampling

PDF Full Text Request

Related items

1	Research On Reliability Of Kafka Messaging System
2	The Research And Implementation Of Performance Modeling And Optimization Technology Of A Distributed Message System Named Kafka
3	Design And Research Of Message Transmission System Based On Message Queue
4	Design And Achieve Of Billing And Accounting System Message Queue
5	Design And Implementation Of Billing Aggreagation System Based On Message Queue
6	The Design And Implementation Of A Message Publish-subscribe Service Based On Kafka
7	Design And Implementation Of Message Queue In Cloud Encryption System
8	Design And Implementation Of Multi-source Sensing And Emergency Linkage System For Smart City
9	Design And Implementation Of Customized Distributed Web Crawler
10	Integration Testing Method Based On Module Dependency Graph And AOP