Font Size: a A A

Design And Implementation Of The Active Network Crawler System Based On EMule

Posted on:2013-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2248330362974680Subject:Electronics and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of the P2P technology, P2P file sharing software realizequick position of the sharing files, network backup and high-speed download by makinguse of the rich resources of P2P network. So a new era of file sharing starts. As one ofthe most popular file-sharing P2P application in the world, eMule supports files searchin eDonkey2000network and Kad network and it gives users a new experience bydownloading multimedia content. While eMule also become the hotbed of the spread ofillegal and harmful content, it impacts on the network application environment, mode oftransmission which scattered privacy, all of these problems lead to a new challenge tothe supervising and managing department of every country.Aiming at the problem that the transmission of eMule network resources is difficult todiscover and position. According to structure of eMule network and characteristics ofthe two spreading ways, this paper describes an active crawler system which is based oneMule, this system access eMule resources by two different ways. So the paper providesthe basic experimental data to manage network security and macro analysis networksecurity situation, the system provides a wonderful technology platform for P2Pnetwork measurement.The main results are as follows:①In web network propagation, this system includes web crawler module, the webcrawler has key technology as follows:1) it improves the links filter functions ofheritrix, and optimizes queue management strategy of URL by adopting ELFHashmethod. Experimental results show that the number of grasp and the efficiency ofcrawling raise obviously.2) in order to avoid traditional web crawler’s repeatedproblem, and effectively improve the quality of web page gathering,this paper uses themechanism of incremental update. The results show that the average precision ratio ofWeb page is95.5%and meets the project demand.3) in order to extracts eMuleresources of Web pages thorough individual subject information block, this paper takesan information extraction method based on DOM tree, by combining with customizedextraction rules. Experiment show that the average extraction ratio of information is97.8%, greatly higher than the similar systems.②Searching file in eMule is one of the main ways of spread resources. In eMulefile-sharing system transmission, this system design the module of eMule crawler,eMule crawler has key technology as follows:1) A active measurement method basedon two different protocol in eMule network has utilized to the design of E-Crawler, adopting multicomputer parallel processing. it solves the problem that traditionalcrawler only crawl onefold protocol network.2) In order to search and analyze situationof specific popular file propagation, combining with characteristics of network videopropagation,it uses the method of adding keyword automatically to make E-Crawlermore active.3) in order to increase E-Crawler’s collection rate,the positive feedbackcrawler strategy is applied in E-Crawler, it priority communicates with larger server andthe connection of the high success rate of nodes, this way saves efficiently failure of theservers and nodes communication time. At last, experimental data shows that video andaudio are main file type, also finds location of the ed2k servers in the Earth anddistribution of client peers. the result validates the positive feedback crawler strategyimproves E-Crawler’s crawl rate.
Keywords/Search Tags:eDonkey network, Kad network, active crawler, web DOM tree, positivefeedback
PDF Full Text Request
Related items