| Logs are really important data generated during the operation of a software system.They can be utilized to debugging by developers,operation and maintenance personnel to diagnose problems,security personnel to monitor abnormal traffic,and operations personnel to understand product operations.With the development of cloud computing and big data,enterprises are gradually moving to the cloud,and decentralized applications are becoming increasingly common.Logs are scattered across multiple machines and the number is huge.Under these circumstances.The existing distributed log analysis solution ELK technology stack(Elasticsearch Logstash Kibana,ELK)requires companies to purchase a lot of machines for construction,with functions such as login and authority management,and requires human resources for customized development and operation and maintenance.In response to the above problems,this article takes the company’s actual project as the background.Aiming at the traffic imbalance problem caused by multi-tenancy and the 10 bottleneck problem caused by a large number of reads and writes in the project,this article specially designs the routing based on the flow of Kafka at both ends and the analysis module in which both master and standby is live and read-write separate,and finally realizes the multi-tenancy enterprise’s back-end log analysis system.It provides a fully managed log service from collection to analysis and dump,supports massive log access,and has low usage costs because of on-demand billing.The main work of this paper includes the following aspects:(1)Analysis of system demand.Analyze and model the needs of enterprise back-end log analysis in the context of enterprise cloud,determine the role of the system,and write demand use cases with use case diagrams and use case tables according to functional requirements.At the same time,it is determined that the response speed of the page,the processing speed of the preprocessing module and the analysis module,and whether there is a backlog of Kafka to measure system performance under different log reporting volumes in a single log stream.Then organize the entities and attributes as well as the relationships between entities and draw the ER diagram depending on the requirements.Finally,according to process modeling,five modules including collection module,preprocessing module,analysis module,dump module,and management module are divided.(2)System design and implementation.First,the application architecture of the system is depicted according to the results of the requirements analysis.Each module is designed to use component diagrams,and the interfaces between modules are described.Then design the physical structure of the database according to the ER diagram.Finally,class diagrams and sequence diagrams are used to depict the implementation of each module,and the system is implemented using technologies such as Kafka,Elasticsearch,Redis and Kubernetes.Among them,the analysis module uses the active and standby dual nodes to separate read and write,while the fallback node replicates data from the active node.When the master node fails,switch between active and standby to ensure that the query is not affected.In addition,routing is designed at both ends of Kafka to balance the log traffic on each Topic.(3)System test.On the basis of coding completion,the hardware and software test environment are prepared.Then depending on the function of the system,the test case and test process are designed base on the modules.Finally,the system is functionally tested according to the test case and performance tested according to the performance baseline,and the results are analyzed.The entire system has fully fulfilled the requirements,and has undergone a full amount of functional tests and certain performance tests.While the functions are completed,the system can respond in seconds when a single log stream sends logs at a rate higher than the performance baseline.The entire system has been made available for actual use,has produced economic benefits and achieved the expected goal. |