Exploiting object usage nonuniformity in distributed data management

Posted on:2008-03-26

Degree:Ph.D

Type:Dissertation

University:University of Rochester

Candidate:Zhong, Ming

Full Text:PDF

GTID:1448390005468134

Subject:Computer Science

Abstract/Summary:

Many emerging Internet services provide online, massive data service backed by distributed system infrastructures. Examples include web search engines, web-based email servers, media servers, online stores, etc. Compared with traditional distributed systems, these distributed data-intensive systems focus on data management and impose new challenges in order to achieve fast, reliable, and massive information service to many concurrent users.; It is well known that data objects hosted by Internet services often exhibit highly skewed usage probability distributions. Motivated by this, this dissertation studies the problem of exploiting data object usage popularity skewness to improve the data management of common distributed data-intensive applications. Our main results originate from the simple intuition that being biased towards popular objects may increase system performance compared with unbiased data management. However, choosing the optimal biased strategy is typically equivalent to an NP-hard problem. Furthermore, incorporating the chosen popularity-driven strategy into an existing application may demand significant system adjustment or rebuilding effort in practice.; On the algorithmic side, we present results on deriving optimal/near-optimal popularity-driven data management strategies for the applications we study. Specifically, the results include (1) problem formulation, which builds a model for each studied system to represent the relationship between its object popularities, resource constraints, and targeted performance metrics as a constrained optimization problem; (2) optimization, which derives the optimal or approximate solution to the problem by using techniques such as Lagrange multipliers, dynamic programming, linear programming relaxation, and random walks guided by the Metropolis-Hastings algorithm; and (3) analytical results on the complexity of the optimization problem, the asymptotic optimization time, and the approximation ratios of approximate solutions.; On the systems side, we explore the effectiveness and feasibility of popularity-driven data management on running traces collected from real, large systems. We also address a number of issues that are key to the implementation of popularity-driven strategies in practice, e.g., dynamic object popularity changes, offline computation cost, etc.

Keywords/Search Tags:

Data, Distributed, Object, Usage, System, Popularity-driven

Related items

1	The Research And Application Of Distributed Object Tecthnology In Real-Time Data Acquisition System
2	Data-driven Consensus Filtering In Distributed Sensor Networks
3	Research On Distributed File Placement Algorithm Without Depending On Popularity Information
4	Research Of Several Key Techniques On Distributed Data Processing
5	Collaborative Filtering Based On Item Popularity Weighting Research And Development Of Recommendation System
6	Research On The Normalisation Of Online-usage Indicators And Data Of Academic Journals
7	Research On Collaborative Caching Policy Based On Node Centrality And Content Popularity In NDN
8	Research On Popularity-Driven Video Caching Strategies
9	Research On Data-driven Object Completion Algorithm
10	User Preference Research Of Chinese Scholarly Articles From The Perspective Of Usage Metrics