Font Size: a A A

The Pests And Insects Topical Search Engine Research Based On Distributed Acquisition Strategy

Posted on:2018-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:L L ZhangFull Text:PDF
GTID:2348330566450399Subject:Forestry Information Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has promoted the Chinese agroforestry information from the digital agroforestry into the new stage of intelligent agroforestry.Intelligent agroforestry pays more attention to all aspects,all kinds of resources,the depth of business integration,intensive sharing and business collaboration.The realization of intelligent agroforestry will inevitably bring another profound change in agroforestry productivity,and agroforestry information will also usher in a new big data era.It is urgent to make quick,accurate and comprehensive search of agroforestry information resources by making better use of it for the majority of researchers,teaching workers and farmers in agroforestry science.The traditional general search engine provides a unified interface for all users.Its large amount of data and wide range of topics can not meet the needs of experts in the field of agroforestry information accuracy,real-time and depth,and many other personalized needs.Therefore,it has theoretical significance and application value to discuss the search engine based on agroforestry.In this paper,we first analyze the current distributed crawler system model,and deeply study the URL task scheduling strategy in peer-to-peer distributed crawler system.Aiming at the problem of loading load unevenness caused by the server node in the address space mapping random,we propose a node address space allocation strategy based on SP-cycle algorithm which makes the address space allocated by all the server nodes reach the dynamic equalization.It could improve the load balancing of the distributed crawler system and solve the problem of sudden burst of service node nodes without affecting the operation of the reptile system.Secondly,we do reasearch on the key technologies in the search engine design process,such as the topic representation method,the text word segmentation method and the search strategy of the topic crawler.The topical dictionaries are created by extracting the keywords form a large number of domain page library,artificially specifying by the field expert and regularly updating based on user search logs.On this basis,the pest theme vector is designed for the topic description.The text segmentation of this paper use the way which combining IKAnalyzer intelligent word segmentation with the topic dictionaries to achieve a good topical text word effect.Considering both the link structure and the text content,the crawler search strategy is designed to improve the collection efficiency and quality of the topical information.Finally,we have implemented a search engine based on the theme of plant pests and diseases.Comparing with the general search engine,our search engine has obvious tendency of plant pests and diseases,and better precision ratio than the general search engine.So,the search engine based on the theme of plant pests and diseases in this paper has a certain practical value.
Keywords/Search Tags:Topical search engine, Distributed crawler, Task scheduling strategy, Pests and diseases
PDF Full Text Request
Related items