Font Size: a A A

Availability, scalability and cost-effectiveness of cluster-based Internet infrastructures

Posted on:2002-10-11Degree:Ph.DType:Dissertation
University:Princeton UniversityCandidate:Ji, MinwenFull Text:PDF
GTID:1468390011994747Subject:Computer Science
Abstract/Summary:
Clusters of commodity computers are a cost-effective system structure for large-scale Internet services. Availability and scalability are two major concerns in the design of such a system. My dissertation examines the opportunities in the data storage systems for improving the availability and scalability of cluster-based Internet infrastructures at a low cost. The goal of availability is to maximize the percentage of client requests that succeed despite the failure of one or more servers in the cluster. The goal of scalability is to efficiently scale the server throughput with the cluster size. My basic approach is to investigate the data distribution strategies across nodes in the cluster, i.e. how to partition and replicate data on disk or in memory in order to achieve high availability and scalability.; Maintaining availability in the face of failures is a critical requirement for Internet services. Existing approaches in cluster-based data storage rely on redundancy to survive a small number of failures, but the system becomes largely unavailable if more failures occur. I study a failure isolation approach that partitions and replicates data and metadata across cluster nodes in such a way that the server in each node can deliver data to clients independently of the failures in other nodes. This approach is complementary to existing redundancy-based methods: redundancy can mask the first few failures, and failure isolation can take over and maintain availability for the majority of clients if more failures occur.; I also study how to improve the performance of Internet application servers in a cost-effective way by using a cluster of in-memory databases as the cache for dynamic content. In particular, I investigate how to dynamically partition and replicate data across individual databases in the cluster and how to direct queries to the right databases in order to maximize effective cache capacity and minimize synchronization cost. Despite the conflicts across queries for dynamic content, I observe natural query affinity in a wide range of Internet applications, which could be exploited in management strategies.
Keywords/Search Tags:Internet, Availability, Cluster, Scalability
Related items