Font Size: a A A

Design And Development Of Lightweight And High-performance Log Collector

Posted on:2022-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:M H ZhouFull Text:PDF
GTID:2518306773497814Subject:Library Science and Digital Library
Abstract/Summary:PDF Full Text Request
With the rapid development of distributed systems and the continuous horizontal expansion of complex applications,logs are scattered on many different machines.And the number of logs generated by various applications in the process of running are growing exponentially,which brings new challenges to the collection,storage and analysis of logs.At present,the common log collection schemes in the market can not solve the problems of poor log analyzability,poor performance,unreliability and poor expansion at the same time.Among them,no breakthrough has been made in performance,which can not meet the needs of rapid business expansion.Therefore,this paper focuses on the performance problems and ensures the reliability,analyzability and scalability,which provides a better solution for log collection in specific scenarios.On the issue of analyzability,this paper structures the log data through JSON serialization to facilitate subsequent analyzability;The log sorting field is added to ensure the order of logs within milliseconds and facilitate troubleshooting.On the issue of scalability,the log data in MDC is automatically obtained when assembling the log data.Each business line can put customized attributes into MDC according to their own needs,so as to facilitate the expansion of log content;Kafka and Pulsar are distributed and horizontally scalable components to solve the scalability problem of log center service.In terms of reliability,when the log center is unavailable,the log can be temporarily stored locally;the heartbeat mechanism is used to avoid log loss caused by asynchronous network transmission;the sending confirmation mechanism is used to avoid the loss of logs in memory due to service restarts problem,but only if that in-memory log is read from disk.This paper implements a log persistent file system for log local storage,which can provide temporary storage and reading of logs,support the use of different compression algorithms for logs,ensure the sequentiality of log reading,support the configurable size of a single log file and the number of log files,and support self retrieval after service restart and locate the index position of the last read and write.For performance problems,batch technology is introduced in network transmission to save network IO overhead;Disruptor high-performance queues are used to replace blocking queues that come with JDK,which can provide better performance;JSON serialization method is rewritten to reduce intermediate processes and temporary object creation;using memory reuse technology to reduce the overhead of memory development and destruction;using zero-copy technology to reduce the overhead of memory replication.The customimplemented JSON serialization tool in this article can achieve zero copy and zero GC.This tool supports the formatting of timestamps,the interception of longer strings,and the reuse of byte arrays.The rewritten time stamp formatting improves the performance by 7 times compared with the date formatting tool of Java.After testing,the performance of the system has improved significantly.Compared with the Filebeat's file collection method,the throughput has increased by 18 times,and compared with the Github's network collection method,the throughput has increased by 5 times.On a common machine such as 4-core 8G,the maximum log printing speed can be supported to 1million.The log collector designed and implemented in this paper has been put into use in the company's application system,and the use effect is good.
Keywords/Search Tags:Log Collection, High Concurrency Scenario, High-performance Memory Queue, Log Serialization, Garbage Collection(GC) Problem
PDF Full Text Request
Related items