Font Size: a A A

Content caching, retrieval and dissemination in networks with storage

Posted on:2012-01-24Degree:Ph.DType:Thesis
University:Rutgers The State University of New Jersey - New BrunswickCandidate:Dong, LijunFull Text:PDF
GTID:2458390008992756Subject:Engineering
Abstract/Summary:
The overwhelming use of today's network is for an endpoint to acquire a named content file. As a result, efficient content discovery and dissemination are becoming one of the key challenges for the design of future Internet protocols.;With significant advances in the technology areas of data storage, storage capacities have increased dramatically while the price has been dropping very fast. Thus it is valid to assume that each router on the Internet can cache content files that pass by and reply to content requests with its local copies. In this thesis, We firstly introduce the In-Network Caching framework. The content dissemination consists of two phases. The first phase is content discovery, which is the service provided by Content Name Resolution. Through CNRS a requester discovers the location(s) of the desired content files. We present the hybrid CNRS architecture, in which CNRS servers form a hierarchy, with national, regional, institutional CNRS servers from top to bottom. On each level, each CNRS server is responsible of monitoring the caching locations of a predesignated group of content files. While there is a miss at the lower level, the CNRS request will be forwarded to the higher level server, which is similar to the hierarchical web caching.;Following the content discovery process, the second phase is content retrieval, in which the endpoint sends a request towards the hosting server, and the requested content file will be returned to the requester as the outcome. Cache-n-Capture is the baseline caching scheme, in which each enroute router independently decides whether or not to cache passing content files. When a request is routed through the router later, the router can "capture" the request and reply back with the cached copy of the content, instead of forwarding the request to the original hosting sites. We advocate two enhancement techniques: one is content broadcast, which lets each router advertise its cached content files to the immediate neighbors so that a router can direct subsequent content requests to nearby caching nodes, and the other is coordinated caching, which lets neighboring routers implicitly coordinate their caching decisions so that the collective cache utilization can be improved. Through detailed simulations, we show that these two caching techniques can significantly outperform other approaches. This performance gain can be achieved with a small communication overhead by limiting content broadcast and coordination only within one hop. Also, we demonstrate that the storage requirement of the discussed schemes is reasonable.;Next we develop a mathematical model for CC to optimize the average content retrieval latency with limited storage on each router. We propose Sequential Reassignment (SR) algorithm to solve the optimization problem. In order to implement the proposed scheme in a distributed fashion, we use exponential smoothing based estimator. We compare the average content retrieval latencies and the average saved hops per enroute hit of the proposed distributed caching schemes with two other common cache replacement policies. We study the impact of cache size and the locality parameter, as well as the plateau factor for workload generation. We show that our proposed scheme can always provide significant performance improvement under various settings.;Next we reconsider the optimization problem with content broadcast enhanced on each router. We formulate a different mathematical model to obtain the maximum benefit of CB.We propose the distributed Pseudo-Gradient algorithm. We compare the performance of the proposed caching scheme with the two replacement policies under the same simulation settings as in CC. The results show that Distributed-PG achieves performance improvement while keeping the communication overhead of content broadcast much smaller than traditional replacement policies.;In the final part of this work, we investigate a gateway centric method for efficient content caching and routing. In this method, a content copy can get cached within an autonomous system (AS) while it is routed towards the destination, and thus subsequent requests can be satisfied faster. The caching location within an AS is determined by the gateway node, through a hashing function. We discuss the two alternatives of this method: one with uniform caching level for every content file, and the other with varying caching levels. Through simulation studies, we show that the gateway-controlled caching method can greatly improve content retrieval latencies over traditional solutions, such as three levels of caches placed by Hierarchical Caching, and mirror servers deployed by Content Distribution Networks (CDNs). This gateway-controlled caching method also outperforms a baseline caching method CC. Additionally, we study and compare the detailed performances of the two alternatives of this method. Finally we develop a mathematical model whose objective is to guide the gateway to make optimal caching decisions within an AS and to achieve the minimum average content retrieval latency. We provide a greedy caching algorithm to solve the optimization problem and show its superiority over random caching.
Keywords/Search Tags:Content, Caching, Retrieval, Optimization problem, CNRS, Storage, Show, Each router
Related items