Font Size: a A A

Research On Dependable Monitoring In Large-scale Distributed Systems

Posted on:2012-07-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:G H ChangFull Text:PDF
GTID:1488303389966099Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of network technology, the computing paradigm is also constantly undergoing great changes. The typical large-scale systems such as Grid, P2P, and Cloud have emerged during past decades. However, the continuous development and evolution of network computing systems raise more and more challenges, and the demands for higher performance network application systems have emerged in many fields such as modern military, finance, aerospace, industrial manufacturing and even civil Internet applications. The demands do not only include the ability of providing rapid processing functions, but also contain the ability of continuously offering high quality services with characteristics of high reliability, high availability, and low cost. However, even the computer applications reach a highly sophisticated level and they are deployed in almost every corner of the modern society, there are still a large number of service failures. Therefore, the problem how to provide high dependable network applications becomes a key technology in the development of distributed computing. Aiming at these new requirements, academia and industry have done a lot of fundamental researches, in which dependable computing under network environments become the focus of the researches in recent years. And the dependable monitoring is one of the most important bases to guarentee the dependability of computing systems.In this paper, several key problems of dependable monitoring in large-scale distributed computing environments were discussed and studied. Based on the survey and analysis on existing research results and relevant technologies, a dependable monitoring architecture and corresponding strategy under large-scale distributed computing environments were proposed. And a related self-organized monitoring model with a series of algorithms including a self-organized algorithm, an exception message dissemination algorithm, and one system anomaly detection algorithm were designed. Finally, a prototype system was implemented. The specific studies are as follows:?Based on the characteristics of large-scale distributed computing systems, the problems on dependable monitoring were analyzed and surveyed. A distributed monitoring model under an open network environment, and a monitoring architecture which separated from applications were proposed based on the features of modern open distributed network systems. The monitoring architecture, which was composed by modules of data collection, anomaly diagnosis, dependable strategy control, member management, and monitoring message dissemination, was raised up. And the operation flows of the monitoring system were also illustrated.?The fault detection based on dependable monitoring was defined in this paper. And the generalized definition extended the meanings of traditional fault detections. An integrated dependable monitoring paradigm was come up with the fault detection containing system anomaly detection which was on the basis of time forecasting. A self-organized neighborhood grouping fault detection protocol aiming at large-scale distributed systems was proposed and corresponding algorithms were designed and analyzed. The advantages and disadvantages of traditional gossip protocols in fault detection were surveyed. A self-organized gossip monitoring algorithm was proposed, and it was modified for the purpose of asynchronous dissemination. The emulation illustrated that the grouped self-organized fault detection had better accuracy and less communication overhead. Meanwhile, it made the systems more real-time as the time consumption of monitoring was reduced.?Aiming at the limitations that only the fail-stop style failures were monitored by the traditional?t fault detection methods, the system running status variations were considered from the aspect of pattern recognition in this paper. Taking indicators vector of multi-dimensional dependable monitoring as an input, using the ideas of dimension reduction and maximum variance of monitoring data, a PCA anomaly detection algorithm was designed. An anomaly discriminant algorithm called PCLPP which combined the idea of projection manifold maintenance from high-dimensional data space to lower space was also proposed. And the anomaly detection algorithm DLPP based on supervised Fisher Discriminant Analysis which takes advantage of classification labels was brought out as well. The emulation results showed out that many system anomalies could be efficiently recognized with high accuracy by the anomaly detection algorithms that based on the ideas of pattern recognition. These algorithmss could well satisfy the actual requirements of dependable monitoring in distributed systems, and there were several significant advantages on samples control.?A dependable monitoring prototype under the environment of large-scale distributed systems was designed in this paper, and the implementations of every component in the prototype system were described in detail. And focusing on large-scale distributed applications, several design principles from the aspect of engineering were also presented. And the series of experiments were implemented on the basis of the prototype system, and the functionalities of every component were testified.In summary, modern dependable monitoring technologies and their key problems in large-scale distributed environments were analyzed in this paper. And a series of detection algorithms, self-organized grouping algorithms, and anomaly detection methods based on pattern recognition were designed and modified. Meanwhile, these protocols and algorithms were testified as functional and accurate by both theoretical and experimental methods.
Keywords/Search Tags:Large-scale distributed systems, dependable monitoring, self-organized detection, anomaly detection
PDF Full Text Request
Related items