Font Size: a A A

The Design And Implement Of A Topic-based Web Spider

Posted on:2009-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:K YuanFull Text:PDF
GTID:2178360272484583Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the constantly changing of web information,it's becoming more and more difficult for search engine to provide a high-quality,comprehensive and timely updated information searching service to user.The basic limitation is that it attempts to index all the web information and services to all topics inquiries request.In contrast,topic-based search engine only covers specific topic related web information,so that its content can be deeper and its updating cycle can be shorter.Also it can meet the requirements of fast and accurate to information resources.At present,topic-based web search engine is becoming a hot research and development object of computer science and information industry.Based on the theme of the network search engine spiders is the theme of an important part of this paper from the perspective of design and implementation of the network based on the theme of spiders made a detailed analysis and discussion on the current theme spider technology research and development of the domestic and international development Trend.Analysis of the theme of the spider and its working principle of some of the major functions,seize the Spider Web search strategies and how to assess the pages related to the theme of these two key issues,proposed a network based on the theme of the spider.In the main part of the article,the first on the realization of a spider theme of the key technologies:Subject search strategy,theme relevance,content from the body,the Chinese word segmentation,and then the spider web design process the main line,based on the contents of the evaluation Search to design a strategy for small and medium-sized professional web site to obtain information network spiders,and details and said the vector space model and gives the network a spider crawling algorithm,using java language system,the theme of Spider system has a good system structure,The Internet can be collected and designated themes related pages,experiments show that the ideal system performance,can accurately crawling to high-quality website.
Keywords/Search Tags:Topic-based search engine, Web spider, Chinese word segmentation, Relevant calculation
PDF Full Text Request
Related items