| With the coming of the big data era and the rapid development of Internet technology,distributed architecture has been more and more popular among big companies.But at the same time there are more and more problems to be exposed.First of all,It takes a lot of time to locate mistakes.Secondly,engineers need permission to log on the online machine for viewing the exception log,which takes a large amount of time too.Finally,many problems that are not clear are suspected of network problems.Although there are some simple monitoring tools,their expansion ability is relatively weak,and they can’t interconnect with each other,making it a time-consuming and hard work to find the root of the problem.Therefore,the distributed monitoring system emerges as the times require.In this thesis,the Dapper system of Google,the Zipkin system of Twitter and the Eagle-Eye System of Alibaba are studied and analyzed.On this basis,the ETrace system is built.The ETrace system studied in this thesis is the distributed tracking system used in Eleme.The ETrace system uses the streaming processing model,and takes Kafka as the streaming data platform.Each application of the company uses the exposed API of the ETrace system to carry out the bury-point operation,then sends it to the backstage for real-time or off-line aggregation statistical processing,and finally displays it to the user in the form of reports.Through the ETrace system,when an application fails,the engineer can quickly locate the fault point.Or when an application has a long response time,the engineer can find the bottleneck of the system quickly.The whole system has both real-time processing and off-line processing,which basically meets users’ needs for different scenarios. |