Font Size: a A A

Exploiting object usage nonuniformity in distributed data management

Posted on:2008-03-26Degree:Ph.DType:Dissertation
University:University of RochesterCandidate:Zhong, MingFull Text:PDF
GTID:1448390005468134Subject:Computer Science
Abstract/Summary:
Many emerging Internet services provide online, massive data service backed by distributed system infrastructures. Examples include web search engines, web-based email servers, media servers, online stores, etc. Compared with traditional distributed systems, these distributed data-intensive systems focus on data management and impose new challenges in order to achieve fast, reliable, and massive information service to many concurrent users.; It is well known that data objects hosted by Internet services often exhibit highly skewed usage probability distributions. Motivated by this, this dissertation studies the problem of exploiting data object usage popularity skewness to improve the data management of common distributed data-intensive applications. Our main results originate from the simple intuition that being biased towards popular objects may increase system performance compared with unbiased data management. However, choosing the optimal biased strategy is typically equivalent to an NP-hard problem. Furthermore, incorporating the chosen popularity-driven strategy into an existing application may demand significant system adjustment or rebuilding effort in practice.; On the algorithmic side, we present results on deriving optimal/near-optimal popularity-driven data management strategies for the applications we study. Specifically, the results include (1) problem formulation, which builds a model for each studied system to represent the relationship between its object popularities, resource constraints, and targeted performance metrics as a constrained optimization problem; (2) optimization, which derives the optimal or approximate solution to the problem by using techniques such as Lagrange multipliers, dynamic programming, linear programming relaxation, and random walks guided by the Metropolis-Hastings algorithm; and (3) analytical results on the complexity of the optimization problem, the asymptotic optimization time, and the approximation ratios of approximate solutions.; On the systems side, we explore the effectiveness and feasibility of popularity-driven data management on running traces collected from real, large systems. We also address a number of issues that are key to the implementation of popularity-driven strategies in practice, e.g., dynamic object popularity changes, offline computation cost, etc.
Keywords/Search Tags:Data, Distributed, Object, Usage, System, Popularity-driven
Related items