Font Size: a A A

The Research And Implementation Of Performance Modeling And Optimization Technology Of A Distributed Message System Named Kafka

Posted on:2018-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2348330521950905Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing,virtualization and the Internet,especially mobile Internet in recent years,the big data era is coming.Hadoop,MapReduce and other related technologies have been proposed and well studied,and they can effectively deal with the massive off-line and historical data.With the popularity of 4G network,the increase of the number of users surfing on the Internet,the introduction of machine learning,and so on,lots of Internet companies have to face the problem that how massive real-time data can be accessed effectively.To solve the problem,many news middleware have been produced and improved,and which facilitate the access of the massive real-time data.However,the performance of these middleware still need to be analyzed and may need to be improved.This paper investigates,collates and analyzes the practical applications and the related research situation of various message middleware.This paper chooses Kafka,the distributed messaging system as the research object.In order to analyze the relationship between Kafka's performance and characteristics,this paper builds a distributed Kafka real-time data access platform and uses machine learning and genetic algorithms.In particular,the concepts of this paper are described as follows.(1)Based on the official definition of Kafka's characteristics and the idea of orthogonal experiment design,this paper filters out the characteristics and generates training samples.The research of the characteristics is divided into two steps.According to the official annotation and feature definition,the first step is to filter out the performance-independent features.According to the impact of the level of performance and expert recommendations,the second step is to filter out the low level characteristics.In order to do part of the experiments,this paper selects a representative set as a training sample with an orthogonal design.(2)This paper uses the training samples and constructs a performance prediction model with an appropriate algorithm.In order to construct a complete training sample,this paper designs and implements the relevant experiments to obtain throughput.By studying and comparing several machine learning algorithms and combining the relationship between performance and characteristics,this paper selects LASSO to study the samples.Then this paper selects characteristics and constructs the corresponding prediction model with LASSO.(3)Based on the prediction model and the genetic algorithm,this paper realizes the optimal solution of performance and characteristic.Firstly,this paper designs coding scheme of the feature.Then,this paper optimizes the intersection step.Finally,this paper realizes the optimization process of the genetic algorithm by selecting the appropriate fitness function,selection,crossover and mutation.(4)In order to analyze performance prediction model and improve the genetic algorithm,this paper carries out lots of experiments.Firstly,this paper analyzes and verifies the rationality of the prediction model by variance,deviation and correlation.Then,this paper compares the optimization of the performance by using particle swarm algorithm,DE algorithm and standard genetic algorithm.Finally,this paper shows experimental results that the research ideas proposed in this paper can obtain the best performance in the case of limited resources.
Keywords/Search Tags:Distributed message middleware, Kafka, Real-time data acquisition, Robotic Learning, Genetic algorithm
PDF Full Text Request
Related items