Font Size: a A A

Research On Key Techniques Of User Request Trace-Oriented Monitoring For Distributed Systems

Posted on:2016-01-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W ZhouFull Text:PDF
GTID:1318330536467207Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays,distributed systems,which greatly benefit our daily life,are more and more important in various fields.As the advancement of computer techniques and increasing of user requirements,the development of distributed systems exhibits new trends,i.e.,more users,more deployed infrastructures,more complex architectures.These new trends provide new opportunities to distributed systems,and also bring new challenges,among which the increasing number of anomalies in distributed systems is an important one.The anomalies adversely impact the reliability and performance of distributed systems,and may bring enormous financial loss.Especially in safety-critical applications,these anomalies may lead to catastrophic results.System monitoring,a runtime technique,collects the runtime information of a distributed system,detects and diagnoses anomalies based on the collected information,and recovers the system according to the analysis results,thus effectively improves the reliability and performance of distributed systems.System monitoring collects runtime information in different forms,in which user-request traces,or called traces,record the execution paths of user-requests and the context of each step in the paths.User-request traces are more valuable and useful for detecting and diagnosing anomalies.Therefore,trace-oriented monitoring is widely used in academia and industry,and attracts more and more attentions.However,trace-oriented monitoring meets serious challenges when employing in the distributed systems with new development trends,how to accurately collect traces in real-time with low overheads,to effectively exhibit the collected traces: to accurately and efficiently detect anomalies from the massive traces,to quickly and precisely locate the fine-grained system faults based on traces,and to get satisfactory trace data from a distributed system,etc.To answer above questions,this dissertation studies the key techniques and the trace data in trace-oriented monitoring field,and makes the main contributions as follows:(1)A lightweight white-box-based tracing methodThis dissertation presents a lightweight white-box-based tracing method,called MTracer.MTracer can record accurate traces in real-time and collect information in different levels,such as host,component,and function.The web-based visualization tool,called MTracer-Viz,analyses the collected traces and exhibits the information from different dimensions,which helps users to understand and maintain their systems.The experiment results show that MTracer brings only a few additional overheads to the target system,that MTracer demonstrates a pretty high processing ability,and that MTracer can be used in real systems.Moreover,MTracer is easy to use and maintain,and can be used on different operating systems.(2)A fine-grained,multi-scenarios trace data setThis dissertation collects and publicly shares a fine-grained,multi-scenarios trace data set,called TraceBench.To the best of our knowledge,TraceBench is the first finegrained user request-centric open trace data set.To make the target system exhibit different features,during collection,we consider different scenarios,involving multiple scales of clusters,different kinds of user requests,various speeds of workloads,many types of injected faults,etc,to make the target system exhibit different features Through an extensive data analysis based on TraceBench,we have validated the authenticity of this data set.By employing TraceBench in different applications,we show that TraceBench can be used in the trace-oriented monitoring topics,such as anomaly detection and diagnosis techniques,and other techniques,like temporal invariants mining methods.(3)A runtime verification based anomalies detection methodThis dissertation proposes a runtime verification based anomalies detection method.This method describes the features of system behaviors in the form of various logic expressions,and employs the runtime verification techniques to transfer the logic expressions to different formats,based on which the method checks traces and detects anomalies.The experiment results show that the logic languages used in the method can exactly and flexibly describe different system features,and the method can quickly and precisely detect anomalies based on system features,and the method can be used for real-world applications..(3)A segmentation-based performance anomalies diagnosis methodThis dissertation proposes a segmentation-based performance anomalies diagnosis method,called SegDiag.SegDiag first divides traces using an automatic segmentation algorithm,to reduce the dimensions of trace data,enlarge the capacity of trace clusters,and ensure the utilization of information,etc.,which improve the accuracy and efficiency of the analysis process.SegDiag repeats the analysis process for multiple times and terminate the analysis with rational termination conditions,to further improve the accuracy.Besides,SegDiag adopts a comprehensive voting mechanism,which effectively locates system faults according to user interests.The experiment results validate the accuracy and efficiency of SegDiag.
Keywords/Search Tags:Distributed System, System Monitoring, User Request Trace, End-to-End Tracing, Anomaly Detection, Anomaly Diagnosis, Data Set
PDF Full Text Request
Related items