Font Size: a A A

Research On Distributed Crawler Based On Crowd-sourcing

Posted on:2018-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2348330533469445Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and people's growing social needs of the times,distributed crawler have been maturely applied to the major search engines and information retrieval.In this dissertation,the crawler architecture is also a distributed system,but the task allocation mode is “able person should do more work”.It should perform more tasks,if the node have more resources.In order to improve the resource utilization,efficiency,save execution time and cost.Therefore,it is of great academic significance and application prospect,as to study the optimization task allocation method based on crowd-sourcing model.This dissertation divides the task assignment problem into two parts: static task allocation and dynamic task allocation.The former problem is that the whole system has not executed the task,also is the task sequence is not executed,and the crawler nodes are not assigned the task.The later problem is that the whole system is limited by the external environment and internal resources,the task and the crawler nodes are dynamic time and again.So,when the overall cost is as small as possible,how to consider the static and dynamic allocation of task in crowd-sourcing mode,so as to improve the efficiency and resource utilization of whole system,will become the main research content of this topic.This dissertation proposes a static task allocation algorithm based on crowd-sourcing for static task allocation.This algorithm establishes a multi-dimensional computer resource model,which can effectively quantify the resource of the crawler nodes,and uses the priority matching heuristic algorithm to allocate tasks.The optimization of the cost objective function makes the cost of the static task distribution the least.Through Matlab simulation,the algorithm can make the total cost to be the minimum,under the premise of satisfying the demand of the system.In order to solve the problem of dynamic task allocation,this dissertation proposes a time-based definition of confidence,which is used to measure the timeliness of each crawler node,and design a multi-dimensional computer resource model with confidence,then,using the heuristic algorithm to dynamically allocate the tasks.By optimizing the cost objective function with multiple constraints,the time and cost of the whole system are minimized as much as possible.Through Matlab simulation,compared with the traditional greedy algorithm,the two task allocation algorithms based on crowd-sourcing are more in line with the usage rule,and the total cost is more reasonable and has good usability.The experiment results show that the distributed creep task allocation algorithm works well.
Keywords/Search Tags:crawler, crowd-sourcing, task allocation, optimization theory
PDF Full Text Request
Related items