Font Size: a A A

A scalable self-diagnosing content distribution service with bounded latency

Posted on:2008-04-08Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Huang, ChengduFull Text:PDF
GTID:1448390005478683Subject:Computer Science
Abstract/Summary:
Providing contractual performance assurances in distributed systems is an important and challenging problem. From the users' perspective, stringent timing requirements are becoming more critical. Meanwhile, from the system engineers' perspective, distributed systems are driven towards an increasingly larger scale, more integration and higher complexity, making predictable system performance difficult.;In this dissertation, we present the design, implementation, and evaluation of a scalable self-diagnosing content distribution service that provides global bounded latencies on content access. Our solution firstly involves a decentralized replication scheme that dynamically selects subsets of the content distribution servers in wide-area networks for different classes of content so that per-class network latency bounds are achieved. The replication decisions are made autonomously by the servers based on dynamically measured network latencies and workload conditions. The content replication proceeds in a way that balances workload among servers, hence fully utilizing system capacity and avoiding latency bound violations. The efficiency and decentralized nature of the replication scheme enables our solution to scale up to very large scale content distribution networks.;The self-diagnosing capability of our service comes from the scalable learning-based performance problem diagnosis techniques we propose. The increasing complexity of systems has motivated design of machine learning approaches to automate some system management tasks. However, with increase in scale, current approaches suffer from serious scalability issues. We present two scalable learning-based techniques that automatically identify probable causes of performance problems in large server systems with multiple tiers and replicated sites. By incorporating a large number of diagnostic information sources using a temporal segmentation mechanism and applying transfer learning techniques, we achieve both scalability and improved diagnosis accuracy.
Keywords/Search Tags:Content distribution, Scalable, Self-diagnosing, Service, Performance, Systems
Related items