With the continuous development of cloud computing technology,serviceoriented architecture technology has been widely used.In service-oriented architecture,a large application is broken up into multiple services,each of which is responsible for a portion of the functionality and calls each other to fulfill user requests.Such architecture technology not only improves the efficiency of application development,but also reduces the operation and maintenance cost of the application.First of all,each service can be developed independently,which improves module reuse ability and application development efficiency.Secondly,the coupling degree between services is low.When a service is upgraded or updated,it can be replaced directly,reducing the operation and maintenance cost of the application.However,frequent invocations between services are required to complete user requests.When a service fault occurs,the fault will spread along with the interaction between services,resulting in the risk of large-scale service failure.Service invocation relationship depends on user request,and when user request changes,service invocation relationship will also change.Therefore,complex and dynamic service invocation relationship increases the difficulty of service fault propagation analysis.To ensure the stable and reliable running of services in the cloud computing environment,you need to analyze the propagation of service faults,identify the propagation path of service faults and locate the root causes of service faults in a timely manner to prevent large-scale service failures caused by fault diffusion.Domestic and foreign scholars have conducted a large number of studies on service fault propagation under cloud computing environment,but there are still some problems:(1)In the measurement of abnormal events of a single service,because the function and technology selection of each service may be different,the impact of faults on different services will be different,resulting in the existing method is difficult to judge whether a service is an abnormal service through a unified threshold.(2)In terms of service fault propagation path identification,the existing methods usually need to rely on the service call relation.However,because the service call relation is constantly changing,the exact time of fault propagation is difficult to judge.As a result,when fault propagation analysis is carried out through the service call link,the analysis results will be different from the actual situation of fault propagation.In addition,some scholars also use Bayesian network structure learning and other algorithms to infer event dependence and establish event causality graphs to identify fault propagation paths,but such methods often need to define events manually or require large time overhead.(3)In terms of locating the root cause of service faults,existing methods locate the root cause of faults mainly through the service dependency graph constructed by the service call relationship.When the service fault propagation does not necessarily depend on the service call,the root cause of faults will be inaccurate.To solve these problems,this paper proposes a service fault propagation analysis method based on cause-and-effect graph in cloud computing environment.The research content of this paper is as follows:1.A measurement method of service abnormal events is proposed,which measures abnormal events caused by service faults in combination with the deviation of indicator expectations in service running data and the correlation between indicators,and standardizes the measurement results,so as to determine whether the service is an abnormal service through a unified threshold.2.A service fault propagation path identification method is proposed,and the causal relationship between service abnormal events is analyzed and inferred through the causal diagram model to construct the service fault propagation diagram.In addition,the service fault propagation diagram is locally updated based on the changes of the service call link,so as to improve the construction efficiency of the service fault propagation diagram.3.A method for locating the root causes of service faults is proposed,and the influence range of different faults is determined by the strength of causal effect in the service fault propagation diagram.Within the influence range of faults,eigenvalue method was used to determine the number of fault root causes,and exploratory factor analysis method was used to extract fault factors,and fault root cause services were located according to the contribution of fault factors to services.4.Design and implement a service fault propagation analysis system under cloud computing environment,and verify the effectiveness of the service anomaly event measurement method,service fault propagation path identification method and service fault root cause location method proposed in this paper through experiments. |