Font Size: a A A

Research On Greedy P2P Resource Location Technologies Based On Local Information Of Networks

Posted on:2014-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:W M MaFull Text:PDF
GTID:1268330401463119Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
P2P (Peer-to-Peer) networks have been widely used for resource sharing and content distribution services. They have become one of the most active research areas both in the engineering and theoretical fields. In recent years, with the increase of network size, the nodes have much higher churn rate than before. A single node then cannot maintain the information of most of the others’ones in the whole network, and the number of neighbors of each node is far smaller than the network size Therefore, efficiently locating resources using greedy search becomes a challenging research topic, if nodes only know local information of networks.The performance metrics of P2P network resource location include not only the search hit rate, but also the network overhead, storage overhead, routing table maintenance overhead and load balancing. The impact factors of performance include the search algorithms, network structure, replication mechanism, routing table structure and so on. To improve the performance, we should study the location technologies from multiple points of view.The P2P network can be divided into two types:unstructured and structured ones. When nodes only know local information of networks, locating resources in unstructured P2P networks is severely blind. Traditional greedy search algorithms of the unstructured networks suffer serious performance degradation. In this thesis, we study this problem from three aspects:search mechanism, network structure and replication mechanism. Different solutions are proposed, which can keep high resource location performance under high peers’churn rate. For structured P2P networks, we focus on routing tables design and multi-keyword search. In this thesis, we mainly study on the main stream DHT (Distributed Hash Tables) structured networks. We present methods how to design routing tables for increasing the search hit rate, and how to use the optimal Bloom Filters for decreasing the communication cost of multi-keyword search in Kademlia-like networks. The main contributions of this thesis are as follows:(1) In unstructured P2P networks, the greedy search algorithms forward the query messages depending on the historical search records stored on each node. When nodes only know local information of networks, high peers’churn rate makes the historical search records useless, which can decrease the search hit rate. To solve this problem, we formally analyze the characteristic of the random walk search, which has a high probability of forwarding queries to the high-degree nodes. We also analyze the disadvantages of only using the reverse random walk search. And then we introduce our bidirectional random walk search mechanism (BRWS). This mechanism runs in a piggybacks way. It both uses the reverse and forward random walk. The information of the resources’locations and status, requirements is embedded in the query messages, and exchanged along the search paths. The performance of this mechanism is analyzed theoretically. The experimental results show that BRWS can actually improve the search success rate with lower overhead even for the rare resources.(2) Q-learning search is a typical greedy search mechanism used in unstructured P2P networks. Although BRWS can effectively improve the search hit rate, but too much information is embedded in the query message. If the information of resources occupies many bytes, BRWS will cause excessive traffic. Therefore, we present another method to improve the performance of greedy search, which differentiate the peers to construct special network structures. This method divides the nodes into three types according to their online time and resources:service nodes, stable nodes, and ordinary nodes. Different types of nodes make up different areas:core service area, agent area and the ordinary area. Stable nodes are also called agent nodes. Each node has two kinds of preference:local preference and service preference. The different types of nodes are connected according to their service preference, and nodes belonging to the same type are connected according to their local preferences. Q-learning models are established according to the type of resources, not the resources themselves. We perform the Q-learning search and update models only in the core service area. Ordinary nodes or agent nodes need to select the appropriate service nodes, directly or indirectly, to execute Q-learning search. We analyze a variety of impact factors affecting the Q-learning search performance in such a network structure from several aspects. Simulation experimental results show that, even if the networks are disturbed frequently, the Q-learning search can still keep high hit rate.(3) When the number of copies of a specific resource and requests for this resource are relatively small, replicating this resource and the requests using flooding is a direct way to improve the search performance. However, it is difficult to control the number of the copies. To solve this problem, we first analyze the spread behavior of the P2P networks with different topologies, and then propose the distribution aware collaborative active replication mechanism. This mechanism is designed based on limited flooding active replication. Each node decides whether to execute replicating and spreading according to the number of copies in local area, which is obtained by distribution aware methods. We compare and analyze various factors affecting the replication spread behavior in a simulated network environment with multiple different topologies.(4) In DHT based structured P2P network, a resource index is usually published to a special set of nodes, the IDs of which are relatively closer to this resource’s key. When nodes only know local information of networks, the sets of target DHT nodes located by the queries or publications for the same resource have smaller intersection size using the greedy search. To enhance the search hit rate, we should maximum the intersection size, which is mainly determined by the routing table structure. The rules of building routing tables for structured P2P networks are explored. We first analyze a special situation that network size is equal to the ID space with no peers churn, and present formal descriptions for this situation. We transform the way of building routing tables in such a network into a Change-making problem. We find that canonical solutions systems are the best choice to construct routing tables. And then we extend to the actual network environment, and find that canonical solutions systems are still reasonable. Finally, we analyze the drawbacks of canonical solutions systems when the ID space is too large. An ID collision-allowed (ICA) DHT routing table structure is presented, and is compared with the typical Kademlia network from multiple aspects. The experimental results show that ICA DHT based networks can keep high search hit rate with low average degree.(5) The sets of target DHT nodes have small intersection size, when nodes only know local information of networks. To guarantee we can find the resources under high peers churn rate, we should publish the same resource on multiple different nodes. When performing multi-keyword search, we should find all the target DHT nodes responsible for each keyword, and then merge the search results. The results on the nodes responsible for the same keyword are merged by union operations, and then the union results for different keywords are merged by intersection operations. The number of the resources for a keyword, especially a common keyword, is very large. The union and intersection operations will cause huge communication cost. This thesis presents a method of performing union and intersection operations using optimal Bloom filters to reduce the Kademlia network multi-keyword search traffic. This approach optimizes the Bloom filter settings dynamically according to the communication cost and loss rate models. We can get the optimal Bloom filter settings only depending on the sizes of sets. The experimental results show that this method can greatly reduce communication cost under an acceptable loss rate.
Keywords/Search Tags:Peer-to-Peer, Local Information of Netowrk, Resource Location, Greedy, Distributed Hash Table, Search
PDF Full Text Request
Related items