Font Size: a A A

Design And Implementation Of Search Engine Based On Web Crawler

Posted on:2019-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:D FengFull Text:PDF
GTID:2428330596453486Subject:Master of Computer Technology Engineering
Abstract/Summary:PDF Full Text Request
Internet technology has made rapid development at present.People have a great demand for information,which brings great challenges to the web crawler technology responsible for Web information collection.In many cases,single-machine network crawlers can not undertake arduous tasks,which makes people need to rely on distributed network crawlers in the demand for Web information,so as to have a good speed and scale of information collection.In the numerous network information,people's demand for information is also great,there are too many contradictions between the two,in this context,it is urgent to get the support of search engine technology.However,the Internet resources show a geometric progression growth.In information acquisition,we need to have a better pursuit of index size,update speed and personalized needs.With the help of search engines,we can not better personalized and specialized information retrieval services for people's needs.This requires search engines.For services,we need specific topic search engines in specific fields.In the field of network data mining,there are many research hotspots,and the research of topic crawler,which is the core of topic search engine,needs further research.This paper designs and implements a distributed web crawler search engine,which mainly includes two aspects: hardware architecture and software module partition.In terms of hardware,the control node is a PC,and the crawling node is N PCs,all connected to the LAN.Software is designed to control node software and crawl nodes.At the same time,the solutions to the key technologies of the distributed system are systematically expounded.In the process of task segmentation,the system mainly relies on two-level hash mapping algorithm,so that the message communication enables the nodes to work together,and with the help of non-blocking sockets,the URLs can be effectively transferred between the nodes.Through the system design,the distributed web crawler search engine presented good robustness,configurability and scalability,and carefully analyzed the distributed web crawler search engine.
Keywords/Search Tags:distributed, web crawler, search engine
PDF Full Text Request
Related items