Font Size: a A A

Analysis And Design Of The Topic Web Crawler

Posted on:2014-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:H W WangFull Text:PDF
GTID:2248330398971934Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet technology, the vast information resources of internet impact a lot to human life. So how to use it reasonable becomes the key theme now. Because people pay more attention to how to search for the information they want from the Internet, which requires the support of the search engine. However, it is precisely because of thousands of over-expansion of information on the Internet, which makes the pages easily query to the target information becomes very important. Common search engines largely to help people find information on the Internet is more convenient, However, it slowly has also exposed a lot of drawbacks, in most cases, can not provide personalized specialized information search, the precision rate is low, outdated content, etc. So the fourth generation of search topic-oriented search engine emerged. Topic search engine dedicated to a particular object oriented, is able to meet the requirements of specific areas and populations, more adapted to the social situation. Topic Web crawler as a tool in the subject field of search engine will play an increasingly important role. as the core theme web crawler search engine become the top priority of the research. And for thematic networks reptiles, it also have gradually become the hot spot in the field of information mining.This article targeted to carry out research on the topic Web crawler, mainly through six chapters to analyze the design and implementation of the Reptile theme. The main content:1) The first chapter on the background of the research and outlined the current situation of the domestic and foreign research topic reptiles and research significance;2) The second chapter by introducing the search engine development on the basic principles of search engine and thus introduced a Web crawler, then also take about the contrast between the two, while focused on the architecture of both reptiles and basic works;3) The third chapter discusses the key technologies in the field of Reptile theme, were introduced contrast, and algorithm improvements, targeted for widespread network tunnel phenomenon also shows the calculation for different algorithms at the same time;4) The fourth chapter discusses the the Reptile theme of system design and implementation, including web crawling module, web analytics modules, the Chinese word management module; 5) The fifth chapter describes the research topic of this article reptiles measures for implementation of the system, the experimental basis for the use of the system and by the analysis of the experimental data to prove the theory in the preceding chapters and effectiveness;6) The last chapter Summary and summarized the contents of the previous chapters, and put forward the innovation of this paper and limitations.The experiments show that:I put forward a detailed improvement strategies applied to the actual work of the Reptile theme has obvious advantages, not only to. ensure a higher harvest than, but also greatly reduces the storage space and the time crawling. Ensure timely update of the page, in addition to a large number of analyzes that eliminate redundant information for the user process has higher precision.
Keywords/Search Tags:Search Engine, Web Crawler, Tunneling phenomena, System institutions
PDF Full Text Request
Related items