Font Size: a A A

Research And Application Of Vertical Search Engine Key Technologies Based On The Lucene

Posted on:2010-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2178360275953368Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of the Internet, the information contained in it will become more and more. It poses unprecedented challenges for general-purpose search engines. In addition, general search engines provide service for all users, so the results from them are too exhaustive. Millions and thousands of irrelative results obviously don't meet the precise search needs. Therefore, Vertical Search Engine which provides service in a single field emerged.Rather than collecting and indexing all accessible Web documents to be able to answer all possible queries, a focused crawler analysis its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. As only related pages are crawled, accuracy and efficiency of vertical search engines have improved remarkably. Currently, accuracy of Chinese Word Segmentation and Correlation Prediction are still to be improved, search strategy of Focused Crawler has yet to be further improved to enhance search engine coverage and efficiency.Based on analyzing the recent crawling strategies of subjecting searching in common use and the thinking in PageRank arithmetic, a new crawling strategy was proposed that have the merits of the two of the enlightening strategy based on content and the strategy based on hyperlink analysis. It was possible to enlarge the resource degree of coverage through hyperlink analyzing as well as to ensure the searching results high correlate to the subject.A new improved PageRank algorithm was designed by researching and analyzing the shortcoming of traditional PageRank algorithm, which based on the unequal probability when users click on the website. The experiments show that the algorithm can be used to the subject search engine and avoid the topic drift phenomenon efficiently.Finally, with the improved Heritrix frame and the Lucene frame designed and realized a individualized vertical search engine, and constructed one prototype system of vertical search engine which faced to the electronic product information.
Keywords/Search Tags:Vertical Search Engine, Topic-focused crawler, Crawling strategy
PDF Full Text Request
Related items