Font Size: a A A

The Design And Implementation Of Lucene-Based Network Literature Vertical Search Engine

Posted on:2012-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:R Y HuangFull Text:PDF
GTID:2218330368993354Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of Internet information, the general search engines have been increasingly difficult to meet the growing individual requirements. However, as for professional search request, general search engines will be insufficient. The emergence of vertical search engine with the features of industry, accurate and fine is specifically for solving this problem.First, this paper introduces the working principle of vertical search engine and deeply describes the constructure of web crawler called Heritrix. Base on the above description, this paper puts forward to realize the crawl to gather specific links with prediction algorithm of the link marked and to achieve crawl with multi threads to gather web page more efficiency. According to the dynamic mode features of network literature page, this paper achieves automatic extraction of web information and the data was deposited into a relational database for search.Second, depending on the status of the network literature, this paper determines system's basic functions and performance requirements. And then the function of each module and flow chart were described in detail. To help users better understand the system, this paper especially draws interactive mapping system function diagrams and use case diagram. Then retrieve and index module are designed base on understanding the architecture of Lucune and indexing technology. This paper introduces directHit algorithm, content-based page relevant algorithm and the importance of works to improve Lucene's original sorting algorithm to improve the system precision. In addition, the introduction of the system cache was brought about to speed up the retrieval speed.At last, the recall, precision and retrieval time were tested. The experimental results show that the system is feasible and has practical value.
Keywords/Search Tags:Network Literature, Vertical search engine, Web crawler, Lucene, DirectHit
PDF Full Text Request
Related items