Font Size: a A A

Research And Development Of Configurable Crawler System

Posted on:2022-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y F HanFull Text:PDF
GTID:2518306350984529Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In today's society,Internet technology is booming,the number of Internet users is growing rapidly,and the amount of data information is increasing exponentially.These reasons make the ability of Internet users to effectively obtain information on the Internet weaker and weaker.In order to be able to change this situation,allow the masses of netizens to obtain useful information conveniently and efficiently,and to decomplicate the crawler system and apply it to actual production and life,this article is based on this purpose and develops by studying the network template structure and crawler technology.According to a given URL and crawling rules,a crawler system that can complete crawling operations through simple configuration.First,analyze the network template structure,and analyze and classify the network template structure from the perspective of developers and users.Developers start with the definition of the structure,presentation,and behavior layers,based on mastering the implementation language,one analyzes,and the other builds a network template structure;users analyze and compare the overall structure,hierarchical structure and page structure of different websites.The structure classification method is summarized,which lays the foundation for the crawling operation of the crawler system.Secondly,research and application of crawler technology.By comparing different crawler frameworks,crawler strategies and crawler implementation methods,select the Webmagic framework,depth-first crawling strategy and visual configuration crawler implementation methods as the basic principles of the crawler system,and have a deep understanding of the underlying implementation and crawling of the framework.Rules and crawler operating mechanism,design and development of "configurable crawler system",the system does not need to understand the basic principles of crawler work and code writing,only need to specify URL and crawling rules,through simple configuration to achieve basic crawler functions.The configurable crawler system is to split the original crawler running steps into independent modules,fill in the configuration information in the corresponding modules,and then combine the modules according to the rules to form a configurable crawler system.The crawler system has the advantages of visualization and customization,high interactive performance,greatly reducing user difficulty,and improving information acquisition efficiency.The configurable crawler system does not need to modify the source code and is suitable for use by non-programmers.It provides a convenient means for users to obtain Internet information.At the same time,the system has a data persistence function,which can realize crawler data applications and services.
Keywords/Search Tags:Configurable crawler system, Web template, Web development
PDF Full Text Request
Related items