Font Size: a A A

Design And Implementation Of Management Subsystem Of Cloud Data Collection System

Posted on:2020-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:F PengFull Text:PDF
GTID:2428330572973563Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the big data era and the explosive growth of web information,the demand for data collection by individuals or enterprises has become higher and higher.Cloud data collection system adopts a nexw cloud service mode combining web crawler technology and SaaS to provide users with low-cost,customizable and efficient data collection services.In view of the task and resource management requirements in the cloud data collection system,this paper proposes the design of the management subsystem in the cloud data collection system,and provides two functional modules:task management and resource monitoring.In the task management module,a unified control management interface for crawler tasks is provided,and the real-time scheduling function of crawler tasks is supported,and the running state of tasks is monitored in real time.In the resource monitoring module,it is responsible for collecting the resource state information of the crawler cluster in real time,and then dynamically scheduling the crawler cluster with abnormal load state thr-ough the resource load evaluation based on the crawler cluster,so as to improve the operation efficiency and resource utilization of the crawler task.According to the scenario of the crawler application in the cloud data collection system,this paper analyzes the key issues such as the complex load change of the crawler cluster resources,and proposes a solution.Firstly,a resource evaluation model based on entropy weight method is established to analyze the overall resource load state of the crawler cluster running the crawler task.For the problem of resource oversupply,a greedy selection algorithm is designed to calculate the removable resource nodes in the crawler cluster.For the problem of resource undersupply,an improved genetic algorithm is proposed to realize the scheduling of idle resource nodes.Finally,through the specific comparison experiments,the superiority of the proposed algorithm in the case of load exception processing is verified.After the requirement analysis and the discussion of key problems,the detailed design and implementation of the management subsystem are given in this paper.This paper designs test cases and tests the main function modules of the management subsystem.The test results prove that the design and implementation of the management subsystem in cloud data collection system meet the requirements.Finally,the full text is summarized.
Keywords/Search Tags:cloud computing, web crawler, load assessment, dynamic scheduling
PDF Full Text Request
Related items