| Since Google officially published Site Reliability Engineering: How Google Runs Production Systems in 2016 to share the operation and maintenance management experience accumulated for many years,how-to guide the stability construction with the concept of Service Level Objectives(SLO)has always been a hot topic for Site Reliability Engineers(SRE)of Internet companies at home and abroad.At the same time,there is an urgent need for an Internet service quality monitoring system supported by SLO to measure and drive the construction of service stability.Therefore,this paper designs and implements a quality of service monitoring system based on SLO to solve the above problems.The service quality monitoring system is mainly divided into five modules:infrastructure,monitored service,Service Level Indicators(SLI)access,service quality monitoring chart and monitoring alarm.The infrastructure module is designed based on the Cloud Native concept,which conforms to the current Internet production reality,provides the whole system with high flexibility and availability,and standardizes the access benchmark for the monitored service module.The monitored service module provides a practical basis for the entire monitoring system.Through the design of the monitored service,users can further understand the SLO concept and the usage of the service quality monitoring system.The SLI access module is responsible for accessing the service quality indicators of the monitored services to the monitoring system to provide the monitoring system with the ability to collect data samples.These SLI lay the foundation for SLO calculation.As a visual part of the whole system,the service quality monitoring chart module provides a visual monitoring interface for monitored services with service quality as the core,which is convenient for SRE and R&D personnel of monitored services to discover,locate and recover faults.The monitoring alarm module is responsible for early warning,finding service quality decline,and grading the fault according to the agreed SLO.The service quality monitoring system is designed based on SLO concept and provides the monitored service with the monitoring capability based on service quality.After testing,compared with the traditional monitoring method based on success rate,it can better measure and drive the investment of SRE and R&D personnel of monitored services in stability,and has less alarm noise,fewer alarm items and more standardized business protocols. |