Font Size: a A A

Design And Implementation Of Universal Crawler System Based On Microservice Architecture

Posted on:2021-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:H G YangFull Text:PDF
GTID:2558306134463574Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Internet is booming today,and massive amounts of new data are being generated on the Internet all the time.In order to make better use of and collect data with business requirements in the network,the crawler technology has become a convenient and efficient tool.In addition,with the development of big data-related technologies in recent years,more and more data needs will be generated.For specific network data acquisition needs,the development of web crawler tools will continue to be promoted.This paper designs and implements a crawler framework based on microservice architecture technology,which provides an effective solution for efficiently and standardized grabbing network data.The crawler technology framework designed in this paper is based on SpringBoot technology and uses Spring Cloud as the main microservice architecture solution.Rabbit MQ is the main method of message communication between different modules.According to the business logic of data capture,the system is divided into three main business modules:the constructor module,the downloader module and the data resolve module.The author participated in the design and implemented the following three major business modules.(1)Constructor module:includes functions such as running crawler timing tasks,monitoring queue return data,grabbing rule data reading,downloading request assembly,and constructing completed task distribution.It focuses on the business logic of data loading and data assembly according to rules;after the module is started,the configured timing tasks and crawling rules are automatically or manually loaded from the database,and the constructed data is used in conjunction with the other two business modules to complete data capture.(2)Downloader module:This module decouples the download function of the system and realizes the download of all public network data in the system.The module listens to the download instruction information in the message queue,and can use different downloaders for data with different characteristics.The module supports automatic login of the website and calls agents to access the website.(3)Resolver module:This module will receive the download data passed from the downloader module,parse the fields according to the parsing rules carried in the data,support tagging of the parsing results,and subsequent data will be processed according to the tags.Support to save to MySQL database,send to RabbitMQ message queue or save as file.The online operation results of the system show that the crawler system under the microservice architecture can meet the personalized business requirements and the online operation has strong stability.It reduces the development cost of developing personalized crawlers in actual business needs,and simplifies the threshold for using crawler technology.The project is currently deployed and running smoothly on multiple servers in real business.
Keywords/Search Tags:Microservices, Web Crawlers, Distributed
PDF Full Text Request
Related items