Availability, scalability and cost-effectiveness of cluster-based Internet infrastructures

Posted on:2002-10-11

Degree:Ph.D

Type:Dissertation

University:Princeton University

Candidate:Ji, Minwen

Full Text:PDF

GTID:1468390011994747

Subject:Computer Science

Abstract/Summary:

Clusters of commodity computers are a cost-effective system structure for large-scale Internet services. Availability and scalability are two major concerns in the design of such a system. My dissertation examines the opportunities in the data storage systems for improving the availability and scalability of cluster-based Internet infrastructures at a low cost. The goal of availability is to maximize the percentage of client requests that succeed despite the failure of one or more servers in the cluster. The goal of scalability is to efficiently scale the server throughput with the cluster size. My basic approach is to investigate the data distribution strategies across nodes in the cluster, i.e. how to partition and replicate data on disk or in memory in order to achieve high availability and scalability.; Maintaining availability in the face of failures is a critical requirement for Internet services. Existing approaches in cluster-based data storage rely on redundancy to survive a small number of failures, but the system becomes largely unavailable if more failures occur. I study a failure isolation approach that partitions and replicates data and metadata across cluster nodes in such a way that the server in each node can deliver data to clients independently of the failures in other nodes. This approach is complementary to existing redundancy-based methods: redundancy can mask the first few failures, and failure isolation can take over and maintain availability for the majority of clients if more failures occur.; I also study how to improve the performance of Internet application servers in a cost-effective way by using a cluster of in-memory databases as the cache for dynamic content. In particular, I investigate how to dynamically partition and replicate data across individual databases in the cluster and how to direct queries to the right databases in order to maximize effective cache capacity and minimize synchronization cost. Despite the conflicts across queries for dynamic content, I observe natural query affinity in a wide range of Internet applications, which could be exploited in management strategies.

Keywords/Search Tags:

Internet, Availability, Cluster, Scalability

Related items

1	The Study Of High Availability Of Network Servers Cluster System And Management Software Realization
2	Availability And Scalability Issues In Internet Routing: A Weak Forwarding Correctness Approach
3	Design And Implementation Of Cluster Industry Application Gateway
4	Design And Implementation Of Cluster Industry Application GateWay
5	Cluster Operating System High Availability Services Research
6	Availability Management Software For Multi-node Cluster System Design And Implementation,
7	Design Of Distributed System With High Availability And Scalability
8	High-availability Server Aggregation System Analysis And Design Method
9	Study And Design Of Linux-Based Scalable Cluster Server
10	Research And Implementation Of Scalable Server Cluster