Font Size: a A A

A systematic approach to quantifying and improving the availability of Internet services

Posted on:2007-12-28Degree:Ph.DType:Thesis
University:Rutgers The State University of New Jersey - New BrunswickCandidate:Nagaraja, KiranFull Text:PDF
GTID:2458390005987053Subject:Computer Science
Abstract/Summary:
As network services, in particular Internet services such as Yahoo or Google pervade our daily lives, we are demanding higher availability from them. Yet, studies show that these services are achieving only 99 to 99.9% availability, which implies they are unavailable from anywhere between a few hours to a few days each year (when accumulated). In comparison, the public switched telephone network is down for only a few minutes each year. To achieve a similar perception of availability for Internet services, significant effort is required. However, research in this context has focused primarily on performance and scalability, with less attention to availability. This thesis develops an availability-cognizant approach to building such multi-criteria Internet services.; Topics in this thesis are motivated by three key observations. First, there is a dearth of systematic techniques to evaluate service availability. While various techniques have been explored in the context of mission critical systems, these are too detailed and expensive for application within Internet services. Therefore, high availability in Internet services has mostly been achieved through application of ad hoc techniques developed using experience as guide. We improve upon the status quo by presenting a base methodology that quantifies average availability of a service under a given fault load, and allows identification of individual contributions to unavailability of each fault type in the fault load. We also present a novel formulation for a metric to evaluate a service under multiple criteria, namely performance and availability, called performability.; Second, services when choosing their platform often trade-off availability with other factors such as cost and scalability. Clusters of commodity components, a popular choice among Internet services, have regular component failures. While various high availability techniques exist, their individual benefits or when applied in combination is not well understood. This thesis shows how multiple such techniques can be applied in an evolutionary manner while understanding their benefits quantitatively. Additionally, we propose a novel technique called Fault Model Enforcement that effectively complements the implemented techniques by forcing recovery paths at run-time for faults not handled by the service.; Finally, this thesis tackles a critical observation: operator mistakes are a major cause for service unavailability. (Abstract shortened by UMI.)...
Keywords/Search Tags:Internet services, Availability, Thesis
Related items