Font Size: a A A

Research And Application Of Large-Scale Distributed System Monitoring Technology

Posted on:2018-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:S C FengFull Text:PDF
GTID:2348330512983443Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Distributed system become the mainstream of large-scale websites and applications because of its scalability and fault-tolerance.Distributed tracing system and distributed performance monitoring system play important roles in distributed system for failure diagnosis,resource monitoring and system stability.However,there are also fundamental challenges that are unique to distributed system,including inefficient failure diagnosis,low-value data collection and high overhead during monitor data search.The contributions of this paper are concluded as follows:This paper proposes a tail-based sampling schema.Distributed tracing system for large scale situation,abnormal data take very few part.Traditional sampling schema cost high overhead to take reduce operation for collecting call chain.The tail-based sampling schema proposes every component judge the call separately,thus reduce the overhead of completing the chain only in abnormal situation.This paper proposes a failure diagnosis method of call chain based on decision tree.Call chain's failure is difficult to quickly and accurately diagnose.In this method,feature extraction is carried out on the known abnormal call chain data,and diagnose the cause of collected failures quickly.This paper proposes an efficient data index mechanism which optimizes time-series data aggregation.Since performance monitor data of distributed system is time-series data.This mechanism combine with synopsis forest which is an effective time-series data aggregation algorithm.Then combine Hbase mechanism optimizes the aggregation query speed and designed a hash mechanism to solve Hbase distributed hot pot problem.In conclusion,the paper introduces the JTang Tracer(Distributed Tracing System),which can trace and analyze the call chain and display visually.Optimized the overhead during call chain collecting and time-series data's aggregation operations,and proposes a schema for distributed system failure diagnosis.
Keywords/Search Tags:Distributed tracing system, Call chain, Monitor sampling, Failure diagnosis, Aggregation
PDF Full Text Request
Related items