Font Size: a A A

The Design And Implementation Of Business Log Collection System Agent Smith

Posted on:2022-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2518306725984619Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the booming development of big data technology has profoundly changed the business model of Internet enterprises.Company A,known as an emerging Internet enterprise,generates a large amount of data every day within every business.Some of these data need to be stored in Kafka for real-time computing,and some need to be stored in Hive for query analysis.In addition to different scenarios,the data format can also be divided into JSON and ProtoBuf,which brings some challenges to data collection.Previously,the company's business units have implemented the data collection logic according to their own business needs,and each business system that needs to transmit data to Kafka and HDFS has integrated the corresponding client.This approach,while satisfying the need for data collection,has resulted in a lot of repetitive development.The high coupling between data transmission and the service itself also increases the cost of operation and maintenance management and reduces the stability of the system.With the rapid growth of the company's business,the drawbacks of the original log solution have become more and more obvious.In this thesis,considering the performance,reliability and versatility of various open source log collection frameworks,Business Log Collection System Agent Smith is proposed for corporate applications based on the open source framework Flume.Agent Smith reasonably abstracts and encapsulates the data transmission logic of various businesses,shielding the details of storage,parsing and transmission,and providing unified log collection capabilities for the products of the business layer.In terms of architecture,it is divided into four main modules:task management,log output SDK,data transmission and schema monitoring.All kinds of businesses can register the data transmission task through the task management module,encapsulate,serialize and store the business log in a unified format through the log output SDK module,and assign the responsibilities of log parsing and transmission to the data transmission module.Meanwhile,the schema monitoring module can manage the schema of the business log and monitor its changes.In this way,the high coupling between business and log collection can be removed and the complexity of business development can be reduced.Since the launch of Business Log Collection System Agent Smith,it has carried more than 200 services such as advertising billing,search advertising model collection,and growth push log collection,and processed 1.9 million pieces of data per second at the peak.It fully meets the business needs of the company and improves the effectiveness and ease of use of log collection.
Keywords/Search Tags:Log Collection, Big Data, Flume
PDF Full Text Request
Related items