Font Size: a A A

Web Content Such As Search And Index Cache

Posted on:2007-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:1118360185954196Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Peer-to-peer(P2P) network has become an important application in Internet. P2P networkcollects the resource on the nodes at the edge of Internet to provide strong computing powerand storage capability. Because of its flexibility on query pattern and adaptability to dynamicenvironment, unstructured P2P network is widely deployed and become the mainstream of thecurrent P2P networks. However, unstructured P2P networks suffer from poor performance,therefore improving searching algorithm is one important major research area for unstructuredP2P network.This dissertation studies the content searching algorithm and the distributedindex-caching algorithm. The main contributions are as follows:1) A query-agent-based adaptive blind searching algorithm for unstructured P2P network(QAA, Query Agent Algorithm) is proposed. The objective of blind searching algorithm is tocontrol the query scale and reduce the redundancy cost. However the present algorithms areshort of the network information detecting capability and difficult to regulate the query scaleadaptively.QAA fulfills the query work by the sub-queries sent by query agent. Its basic idea is touse the network and query information detected by the previous sub-queries to determine thescale of the subsequent sub-queries and thus reduce redundancy cost effectively. Its rule is thesmaller gap to success query, the smaller query scale. Using the detecting scheme, QAAavoids the troublesome parameter optimization issue of the present algorithms.Experiments to compare between QAA and the present blind searching algorithms in thethesis research found that for uniform distribution and Zipf distribution networks, QAA canrealize redundancy cost control under query contents highly satisfied condition, and the resultredundancy ratio is about 1.1 and 1.4 respectively. Furthermore, from uniform distribution toZipf distribution, the increase of QAA in result redundancy ratio is the lowest, which impliesQAA's stronger flexibility than the present blind searching algorithms.2) A TTL-based distributed index-caching model is proposed. The basic idea of distributedindex-caching is to enlarge the indices in the P2P network in relative lower cost. The indicesdensity is a key parameter, which affects the searching performance of P2P networks.However, few work on indices density have been found so far in the literature.The thesis research analysed the driving effects of cached indices diffusing with queryarrivals and the cached indices destroying effects with cache TTL timeouts have beenanalyzed. The work found that caching indices density can maintain a certain stable level.This implies that the searching performance of P2P networks can be modeled by some simpleparameters.Two typical basic node-find algorithms----Gnutella algorithm and Random-walkeralgorithm were studied. The two have the shortest and longest caching path respectively. Howthe query arrival speed and cache TTL affect the indices density in these two algorithms wereinvestigated, and some interesting results were obtained. For Gnutella algorithm, a linearrelation exists between the indices density and these two factors, and for Random-walkeralgorithm a power-law exists.How the query arrival speed and cache TTL affect searching performance of P2Pnetworks was further investigated. The results shows, for Gnutella algorithm, a linear relationexists between the mean query cost and these two factors, and for Random-walker algorithm alogarithm relation exists.3) The drawbacks of present distributed index caching method have been studied, includingthe invalid indices problem caused by dynamic network and the load imbalance caused bymulti-to-one index mapping.In dynamic network, because of the separating of index and content, the changing of theshared content will cause the relevant indices to become invalid. The caching TTL mechanismused by the present algorithms cannot guarantee the effectiveness of index. Our analysisfound that the invalid indices can diffuse in P2P network, which disturbs the subsequentqueries. Based on this finding, checking the effectiveness of index to prevent the diffusing ofthe invalid indices is proposed, and a Check-regeneration method is also proposed, whichexecutes the reset instruction of the hit index in the cache TTL. Experiment results show thatour proposed methods can decrease the invalid indices by 4~7 times, but indices densityreduces no more than 44%.Index caching method enlarges the indices in P2P networks and does not affect the sharedcontents, but the non-uniform multi-to-one mapping can cause the load imbalance, whichdeteriorates network service performance. Shared Content Associated Algorithm (SCAA) isproposed to alleviate the issue. SCAA associates shared contents and shares the contentrequests to alleviate the load imbalance. The effectiveness of SCAA under variable networkload is further investigated. SCAA can improve the service satisfactory ratio about 30% undermoderate network load.
Keywords/Search Tags:Peer-to-Peer network, blind searching algorithm, distributed index caching, caching effectiveness, load balance
PDF Full Text Request
Related items