Font Size: a A A

Research On Metric Space Based Similarity Search And Its Applications In Peer-to-Peer Network

Posted on:2011-07-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:D F DongFull Text:PDF
GTID:1118330332472475Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the development of P2P technology, more and more other data with format of video, audio, image etc has been stored into traditional P2P network besides the simple text. The traditional search method is out of the today's requirement, so similarity search basing on fuzzy matching becomes an important research direction of P2P technology.Considering the discussion above, my thesis focuses on the Metric space based similarity search technology in P2P environment. The main contribution is listed as follows:(1)The traditional pivot based Metric space data index structure MCAN is suitable for data with even distance distribution, but it can not filter unrelated data effectively when performing similarity search.So a layer index based Metric space data index structure MCAN* is presented in chapter 3.MCAN* divides Metric space directly basing on Voronoi space division, and selects neighbors separately in Metric space and European space to construct double-layer topology, and implements fast routing algorithm basing on the double-layer topology. The experiment shows that MCAN* can reduce the resource requirement to 77.0% and 50.5% comparing with MCAN when performing range search and kNN search, as while as the hit ratio keeps about 100%.(2) Basing on the requirement of managing the data with tiny distance distribution, M-KAD, a novel data driven distributed storage structure for similarity search in Metric Space is presented in chacper 4.M-KAD uses data object in Metric space directly as its node ID, and constructs system topology according to the data distribution. Our experiment proves that the routing success rate can reach 99.7% when the system scale is about 2400.The system construction and Voronoi space division are fully driven by data insertion. In our experiment, the node joining and data inserting are totally asynchronous, and the standard deviation of data distribution is about 50 (mean value:180) after reaching stable state.M-KAD designs a heuristic algorithm for Range search and kNN search, the hit ratio of them are 99.2% and 99.3% separately. Besides, M-KAD uses parallel fault-tolerant algorithm for routing and designs periodicity running K-bucket optimize operation for repairing neighbor relationsip. So the routing success rate can be maintained above 85% under DoS attack with 2% node loss in one minute.Comparing with M-Chord, the resource requirement for similarly search of M-KAD is about 15% less. In sum, M-KAD totally overcomes the restriction that system performance relaying on the selecting of pivot which influences traditional similarity search structure such as M-Chord, MCAN,and reduces the resource requirement as while as maintains the hit ration when performing similarity search.(3) Some research is also done about application of similarity search. A M-KAD based link state prediction mechanism and P2P streaming neighbor recommendation system is presented in chacper 5.According to the experiment result, the neighbor recommendation system can reduce the average playing latency to 58.4% comparing with original system.And I also use M-KAD to enhance spam filter system DHTfilter in chacper 6.The performance of original DHTfilter is influenced by the fitting ratio between the distribution of E-mail digests and pivots. The experiment shows that the enhanced DHTfilter can work independent with E-mail digests distribution and maintain the same spam call-back ratio with the original.
Keywords/Search Tags:Metric space, P2P, Similarity Search, Voronoi space division
PDF Full Text Request
Related items