With the rapid development of Internet,storm,s4,spark,streaming real-time computing and other large data frame it is widely used real-time monitoring,real-time recommendations,real-time trading systems.In order to be able to consume data streams in real time,the Kafka message system has been widely deployed.However,as the real-time performance of data increases,the reliability of the Kafka message system is facing enormous challenges.At present,the Kafka message system mainly enablesreal-time data distribution through message queues,and its reliability is mainly ensured by the consistency of data.The current message queue data inconsistency is mainly caused by two reasons.On the one hand,data inconsistency between the master and slave replicas in the cluster results.In order to make the data consistent,Follower is usually backed up data synchronously,but it takes a lot of network,disk,memory,and other extras.The overhead increases the cluster load.On the other hand,because the message production rate does not match the message fallback rate,the data buffer overflows and data is lost.Although the Synchronous message production can solve the data loss problem,it will affect the throughput.In response to the above problems,the paper analyzes the existing Kafka message queue mechanism and study the reliability of the Kafka message system and specifically completes the following tasks:1.In response to the problem of increasing the cluster load during the process of replica data synchronization in Kafka message queue,a replica adaptive synchronization strategy based on message popularity is proposed.The strategy proposes thatthe replica choose the method of synchronizing data by estimating the message popularity,and dynamically adopts the strong consistency policy and the weak consistency policy.The Kafka cluster is set up and compared with the Kafka2.10 replica data synchronization strategy.The results show that the adaptive consistency synchronization strategy is not only The consistency of the message copy can be guaranteed,and the additional resource overhead can be significantly reduced and the cluster throughput rate can be increased.2.In response to asynchronous message distribution for Kafka message queues using message caching mechanism,there is a problem of data loss due to buffer overflow.A multi-level cache mechanism based on message popularity is proposed.Using WFQ multi-queue scheduling algorithm to achieve fair queue scheduling and data scheduling.Compared with Kafka2.10,The results show that using the multi-queue caching strategy based on message heat not only reduces the delay of data consumption,but also guarantees the reliability of message queue.In summary,the study of Kafka messaging system master copy of the data consistency policy and message distribution mechanism is an effective way to ensure the reliability of Kafka messaging system,has important theoretical and practical significance of the reference value. |