Font Size: a A A

An Implementation Of Distributed Crawler System Based On Mobile Phone

Posted on:2018-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:X D LvFull Text:PDF
GTID:2428330569998731Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of network interconnection technology,the mobile phones used as a communication device,has become the largest number of network terminal equipment accessing to the Internet most frequently.With the improvement of the performance of the mobile phone,it is feasible to use the smart phone as a platform to deploy the crawler program.In this paper,we propose a distributed crawler model based on mobile phone to solve the problem of IP blocking.This model focuses on the following three problems: real-time communication,data acquisition,and the problem of large scale fast crawling.Dealing with the real-time communication issues,The main problem of this system is the real-time communication between the server and the mobile phone.The system will be built into two parts,one is the push message server,and the other is the Web server.The mobile phone gets the task push from the Push Notification server,and transfers the data to the Web server.The Web server sends the message to the message push server according to the task list.In order to realize real-time communication,XMPP protocol is chosen based on extensible markup language(XML),and Web protocol is selected by HTTP protocol.To solve the problem of crawling data accurately,we can solve the problem by building vertical crawler and data cleaning.This paper constructs two kinds of data crawling strategy,namely station priority strategy and breadth first strategy.The data obtained by the data fusion method is used to clean up the data,and the missing data are added to obtain the accurate data.In order to improve the crawling speed of data,this paper uses the distributed mechanism,the creation of a dynamic task allocation model,according to the load balance of nodes,node mobile phone comprehensive operational performance,the crawler expected time required to complete the task,time and other factors required by the control node to node assignment crawler,dynamic.A dynamic task adaptive algorithm is implemented by using the code..Based on these core mechanisms,the WebMagic framework is used to implement a distributed crawler system based on mobile phone.The system can achieve high efficiency and fast data acquisition under continuous working conditions,and solve the problem of IP blockade.In order to validate the model and the feasibility of the system according to the model,we use several Android smart phones to carry out the system experiment.The results show that the system can break through the restriction of IP and its download efficiency is higher than that of the traditional crawler system based on PC.
Keywords/Search Tags:mobile phone, crawler mechanism, distributed system, real-time communication mechanism, IP block
PDF Full Text Request
Related items