An Implementation Of Distributed Crawler System Based On Mobile Phone

Posted on:2018-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:X D Lv

Full Text:PDF

GTID:2428330569998731

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of network interconnection technology,the mobile phones used as a communication device,has become the largest number of network terminal equipment accessing to the Internet most frequently.With the improvement of the performance of the mobile phone,it is feasible to use the smart phone as a platform to deploy the crawler program.In this paper,we propose a distributed crawler model based on mobile phone to solve the problem of IP blocking.This model focuses on the following three problems: real-time communication,data acquisition,and the problem of large scale fast crawling.Dealing with the real-time communication issues,The main problem of this system is the real-time communication between the server and the mobile phone.The system will be built into two parts,one is the push message server,and the other is the Web server.The mobile phone gets the task push from the Push Notification server,and transfers the data to the Web server.The Web server sends the message to the message push server according to the task list.In order to realize real-time communication,XMPP protocol is chosen based on extensible markup language(XML),and Web protocol is selected by HTTP protocol.To solve the problem of crawling data accurately,we can solve the problem by building vertical crawler and data cleaning.This paper constructs two kinds of data crawling strategy,namely station priority strategy and breadth first strategy.The data obtained by the data fusion method is used to clean up the data,and the missing data are added to obtain the accurate data.In order to improve the crawling speed of data,this paper uses the distributed mechanism,the creation of a dynamic task allocation model,according to the load balance of nodes,node mobile phone comprehensive operational performance,the crawler expected time required to complete the task,time and other factors required by the control node to node assignment crawler,dynamic.A dynamic task adaptive algorithm is implemented by using the code..Based on these core mechanisms,the WebMagic framework is used to implement a distributed crawler system based on mobile phone.The system can achieve high efficiency and fast data acquisition under continuous working conditions,and solve the problem of IP blockade.In order to validate the model and the feasibility of the system according to the model,we use several Android smart phones to carry out the system experiment.The results show that the system can break through the restriction of IP and its download efficiency is higher than that of the traditional crawler system based on PC.

Keywords/Search Tags:

mobile phone, crawler mechanism, distributed system, real-time communication mechanism, IP block

PDF Full Text Request

Related items

1	Embedded Real-time Operating Systems Support Platform To Achieve
2	Research On QoS Oriented Real-time Streaming Transmission Mechanism
3	Based Vxworks Smart Robot Software System Support Platform
4	Design And Implementation Of Large-scale Internet Information Real-time Extraction System
5	Research&Development Of One Time Password System Based Mobile Phone And Challenge-Response Mechanism
6	The Research Of Concurrency Control Mechanism For Distributed Real-Time Transactions In Mobile Broadcast Environments
7	Research On Ethernet-based Real-time Communication Of Distributed Numerical Control System
8	Design And Implementation Of Crawler Based On Real-time Distributed Network
9	The Research Of Real-time Fault-Tolerant Mechanism In Distributed Real-time System DRTAS
10	Research And Application On Distributed Real-time Database In Process Industry