Font Size: a A A

Research On Performance Tuning Algorithm Of Apache Kafka

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:W K XieFull Text:PDF
GTID:2428330614963939Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Kafka,as a high-throughput,low-latency,high-fault-tolerant message queue system,is used to accepting requests from both senders and receivers,as a data buffer pool,and as persistent data.However,when Kafka is connected to large-scale Io T sensors,the data may be tilted,which will lead to hot issues,which will cause slow data transmission,abnormal resource occupation,and even downtime.This thesis focuses on the hot issues of Kafka in large-scale docking producers and discusses the following three aspects.First,there are some problems with uneven data distribution when Kafka is connected with largescale producers.1)The calculation of the clustering algorithm is complicated.2)Insufficient utilization of cluster resources is analyzed.3)SDG(Sensor Dependency Graph)is distributed by the secondary clustering algorithm.The proposed method reduces the similarity calculation between the clusters.Also,it reduces the SDG construction and its complexity of calculation.The simulation results show that,compared with the classic SDG and traditional hierarchical clustering algorithms,the proposed DASDG improves the Kafka throughput performance,reduces the resource consumption of the Kafka server,and improves the clustering time compared to SDG clustering.Secondly,in response to the Kafka tuning problem in distributed clusters,a sampling-based Kafka adaptive performance tuning algorithm ENLHS is proposed.First of all,Latin hypercube sampling is used to generate a data set,and this data set is used for performance model training.Using the elastic network model to fit the data set to improve the Latin hypercube sampling,iterate the optimal solution,and then obtain the result of the optimal performance solution.The acorresponding experimental results show that the configuration set collected by the ENLHS algorithm can improve Kafka's throughput performance,reduce latency,and have smaller errors.At last,in this dissertation,We designed and implemented a middleware prototype system based on Kafka.When the system is connected to large-scale producers,producer data can be effectively distributed,and Kafka can maintain efficient and stable operation in the cluster through adaptive performance optimization.By comparing the data imbalance of the system by default and the Kafka performance in the default configuration environment,the prototype system proposed in this dissertation can outperform the open source version of Kafka in terms of throughput performance,latency,and data skew.
Keywords/Search Tags:Message Queue, Kafka, Hot Spot, Sensor Dependency Graph, Weighted Sampling
PDF Full Text Request
Related items