| Latency is an ever-increasing component of data access costs, which in turn are often the bottleneck for modern high performance systems. The ability to predict future data accesses is essential to any attempt at addressing this problem, and we present a novel model for gathering and utilizing data access predictions. Prior attempts to utilize access predictions have taken the form of a single predictive engine attempting to preemptively fetch data. We offer a more powerful model that separates the process of access prediction from the data retrieval mechanism. Predictions are made on a per-file basis and used to provide a minimal amount of additional metadata, which in turn is used by a grouping mechanism to automatically associate related items. This approach allows truly opportunistic utilization of predictive information, with little of the timing restrictions of prior approaches. Our research covers access prediction, grouping based on predictions, and a discussion of predictability and its meaning in the context of I/O behavior.; We present two predictors: Noah, named for its prediction of pairs, and Recent Popularity, a majority voting mechanism. We distinguish the goal of predicting the most events accurately (general accuracy) from the goal of offering the most accurate predictions (specific accuracy). Both predictors can trade the number of events predicted for accuracy. Trace-based evaluation demonstrates that their error rates can be adjusted to less than 2% for more than 60% of all access requests. Predictions are used to provide a minimal amount of per-file additional metadata, which is then used separately by our grouping mechanism.; To demonstrate the usefulness of grouping, we present the aggregating cache which manages distributed file system caches based upon groups built from our successor predictions. We present trace-driven results demonstrating that grouping can reduce LRU demand fetches by 50% to 60%. If we consider the effects of intervening caches we observe dramatic gains for our predictive cache. Our treatment includes information theoretic results that justify our approach, a graphical explanation of the effects of caches on workload predictability (cache-frequency plots), as well as relative predictor performance (rank-difference plots). |