Font Size: a A A

Research And Implementation Of Vertical Search Engine Of Blog Oriented

Posted on:2010-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:J J WangFull Text:PDF
GTID:2178360278465911Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the geometrically increasing of network information resources, it has become more difficult to find the required information accurately and rapidly by using general search engine. Face to daily mass increasing data, general search engine is difficult to update the index database timely; face to hundreds of millions of web pages, it's difficult to general search engines to crawl information deeply; face to the not fast enough and not deeply enough defects of general search engine, the vertical search engine came into being.The vertical search engine is a professional search engine, it's the subdivision and extension of general search engine, it's a new model against with inquiries inaccurate, not enough deeply for general search engine services. It provides some useful information or certain services for a particular area,a specific group or a particular demand.Rather than collecting and indexing all accessible web documents, The vertical spider only download the relevant web page through it's relevant algorithm, and avoids irrelevant regions of the web. As only related pages bean crawled, vertical search engines have bean improved more accurately and efficiently.In this paper, we compare and analyse the key characteristics technology of system architecture and working principle between general search engines and vertical search engine at the first. Then introduced a vertical search engine technology research and development direction, and then focused on the index module and retrieval module. On this basis, the vertical search engine for blog specific implementation of the system described in detail.Innovation in this paper:(1) In accordance with the principle of net spider, the paper developed a MySpider net spider by self-developed. It's multi-threaded and configurable, it's TopicPageRank strategies can crawl the web pages that relevant to the subject. According to the relevance than the spider decide whether to download the page.(2) In order to enhance user retrieval efficiency, the paper designed some index cache strategies.In this paper, the results of the project have provided some help and have done a useful discussion for the theme-based vertical search engine technology. To further develop and strengthen the theme of information retrieval and further improve the level of information retrieval it has made a better use of massive information.
Keywords/Search Tags:Vertical Search Engine, Spider, Cache Strategy, Inverted Index
PDF Full Text Request
Related items