Font Size: a A A

Research And Implementation Of The Data Grid-based Web Caching System

Posted on:2008-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:L Y XiangFull Text:PDF
GTID:2178360212996821Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of web technologies, people become moredepending on networks to access information. Due to the exponential times ofexplosive growth in internet users, this makes the network traffic getting muchmore serious. In the limited bandwidth conditions, access to network informationwhen people have to endure a serious delay and additional communication costs,meanwhiletheinternet's capacity"bottleneck"problemisworsening.WebCacheisan effective method to improve the web's QoS. However, in the era of informationexplosion, faceing with the massive data storage and transmission, currentlypopular proxycache systems have a lot of problems, such as lackingof scalability;not having universal capabilities to cross heterogeneous platforms; limited cachingcapacity;dynamicobjectcachingproblemsandsoon.Through high-speed internet connections, grid integrates geographicallydistributed; heterogeneous resources into a virtual organization. It achieves a highdegreeofresourcessharingandtriestoresolvesomesignificantresearchproblems.Data Grid is an extension of grid technology in data management; it focuseson data storage and management. Data grid can manage different types of storageand data resources from different areas, different sources, and hide specificphysical details such as storage media, data storage mode and so on. It gives aunified logic mapping to users, so that they can access to data easily, quickly andefficiently, meanwhile reducing network bandwidth occupied, alleviating theinternet'scapacity"bottleneck"problem.As a new technical framework, data grid will bring new ideas and methods tosolve the current massive data management. For the sake of saving networkbandwidth, reducing the response delay, and solving the network bottleneckproblems, it's a prior choice to use data grid as a solution to massive data caching.Based upon these premises, this paper combines data grid and web cachingtechnologies,andpresentsadatagrid-basedwebcachingsystem.Thissystemhasagood scalability which can cross platform heterogeneous, overcomes theseshortcomings of traditional cluster cache system, and has high research and applicationvalues.The emphasis of this paper including: studies grid and proxy cache relatedtechnologies; designes the framework of the data grid-based web caching system;as grid has distributed, heterogeneous, dynamic, collaborative features, designes anew cooperation mechanism——NDCP, which is fit for grid caching system;makes a discussing about dynamic objects caching in a proxy server, and designesa simple caching algorithm by caching dynamic objects into static pages; designesandimplementssomecoremoduleofthedatagrid-basedwebcachingsystem.Thispaper'smainworkincludesthefollowingaspects;1,Givestheoreticalresearchesongridandwebcachingrelatedtechnologies.This article summes up the grid structure, the data grid's features, and gives asimple introduction about the Globus project and some related key technologiesabout proxycache, and gives brief introduction and sums up the current outcomes,whichcombineofthesetwotechnologies.2,DesignesaDataGrid-basedWebCachingsystem's(DGWC)organizationalstructure.Each cache system which locates in different placees is equivalent as a virtualorganization; the virtual organization includes a number of separation buffer pools.Each of the cache pools has a catalog table which preserves local data information,and there is a directory server giving a centralizing management to the wholevirtual organization. For the connection mode between virtual organizations, thispaper introduces an equal design idea. However, interior the virtual organization,this paper uses the traditional grid organizations model——centralizedmanagement.Combingofthesetwomethods,whichcanensurethecachingsystemnot only has good scalability but also manageable. It's a good way to prevent thesystem's potential safety problems, while avoiding the traditional systembottlenecks because of over-centralized management. To prevent the failure ofindividual nodes and to guarantee the system's robustness, this paper uses themaster-slaveserversystemdesigningschemainkeynodes.3,By analyzing the shortcomings of traditional inter cache protocols, andcombining the structural features of this new system, this paper designes the NewDistributedCooperateProtocol(NDCP),Grid has distributed, heterogeneous, dynamic, collaborative features, and traditional interprotocols such as ICPand FIPhavescalabilityproblem and arenotfit for grid-based systems. NDCP is a type of protocol which is fit for large-scaledistributed environment. In this protocol, the cooperating among caches is mostlyaccomplished by the top information servers. They achieve this job throughchoosing closer nodes to cooperate in limited time. So that it can prevent problemssuch as user delay and excessive communication overhead caused by excessivecollaborate. In this protocol, information is independent on each node, whichensures a caching system easy to maintain and easy expandability. The protocolwill guarantee cache system's performance; its performance will not decline evenexpansion the scale of the system. It also has satisfactory performance in thecommunicationanduser'swaitingtime.4,In this paper, also makes a discussing about dynamic objects caching in aproxy server and designes a simple caching algorithm by caching dynamic objectsintostaticpages.This paper uses a table to store some important information about dynamicobjects which have been requested. By comparing the number of request time tothe frequencyof updating time in a defined period to determine whether the objectcanbecached.Eachcachedobjecthasalifecycle;theywillbecheckedwhetheritsdata has been update in the web server periodic. Through this way, proxy cachingsystemscanbeusednotonlyforstaticcontentbutalsoservicefordynamicobjects,whichwillplayagreaterroleinshortenuser'sdelaytime.5,Design and implementation some core module of the data grid-based webcachingsystem.DGWC system can be divided by the hierarchy into fabric layer, grid servicelayer, web cache module layer and the application layer. This paper mainly mekesconcentrated on the web caching module's design and implementation, which isbuilt upon a grid service layer. Web cache module layer includes registrationmodule,cachemanagementmodule,andcommunicationsmodulecacheandsoon.This paper uses registration module to implement building and managementgrid virtual organizations. In this article, grid virtual organizations are built basedon WSRF. Grid node can join in or quit from a virtual organizations dynamically.This is a good way to improve the system's scalability, and as using SOAP totransferinformation,whichcancrossheterogeneousplatformseasily. Cache management module includes data table management, replacementalgorithm and cache consistency algorithm. Through local pool catalog data tableto manage local pool's information and through index information server's datatable to cooperating with other caching servers. After an analysis of the currentrelatively mature replacement algorithms, this article uses GDSF algorithm inDGWC system. In my work also gives an analysis of the current major cacheconsistency solution algorithm and makes some improvement about TTL in thispaper.In the aspect of implement communications among proxy caches, the localcaching system first uses multicast to tell its group member what information itwanted,andthen use I/O multiplexingtowaitinganswers from othernodes; finallyusereliablefiletransfertofulfillreplicacachingobjects.
Keywords/Search Tags:Grid, DataGrid, ProxyCache, CooperativeMechanism
PDF Full Text Request
Related items