Font Size: a A A

The Research And Design Of Vertical Search Engine

Posted on:2011-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2178360305481890Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the high speed of Internet development, the large amounts of information are increasing dramatically. Thus, the blanket search engine will face the more difficult challenges, which are caused by the information collecting,memorizing such aspects. Additionally, the formal large mount of non-target information searched by general searching website can not satisfy the modern bussiness people, those who need more special and faster searching. Due to the above situation, an urgent need for accurate professional information searching was being developed. Thus, a new vertical search engine, which face to the professional region search engine are just came out.Comparing with the general-purpose search engines, vertical search engines can solve most of the problems that general search engines can't solve. They focus on specific fields,specific group of people and specific requirements.This paper firstly discusses several key technologies about vertical search engine, including web spider, web pretreatment, Chinese word segmentation and indexing and so on. Finally, on the basis of the above theory, we design and implement a model of spider.In the vertical search engines, how the spiders capture data from the web has become one of the hot study issues in recent years, In this respect, we also have done a lot of learning and research. Firstly we analyzed the algorithms that all the parts of vertical spider use, after that in the respect of calculating the related web pages which are based on web content and link structure, we mainly focused on the popular fish-search algorithm,shark-search algorithm,PageRank algorithm,HITS algorithm and so on. In the same time, we compared their efficiency and performance. On the basis of analysis, we presented an improved algorithm of pages relevant algorithm--combing the web content analysis and web link. The relevance and authority algorithm demands are met by analyzing the similarity of the contents of web pages and the link structure respectively.On the basis of the analysis and improvement of spiders'search algorithms, we built a spider that can drag data from in multithreading and named it VSE-Spider. In VSE-Spider we used the improved algorithm which is more efficient. We did some tests on our VSE-Spider system and got some test data, the test data validated that the improved algorithm is more efficient.Finally, we discussed the technology of inverted index and realized a index creating of file based on text and combined with the open-source software:Lucene.
Keywords/Search Tags:search engine, vertical search engine, spider, search algorithm, correlation prediction
PDF Full Text Request
Related items