Font Size: a A A

The Design And Implementation Of Vertical Search Engine Based On Lucene

Posted on:2015-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z LuoFull Text:PDF
GTID:2298330434460927Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the count of network informationresources got a rapid growth. Traditional search engines cannot obtain the real-time updatedinformation resources from the Internet. In the face of the user’s real-time query or depth andprofessional search request, they appeared to be inadequate. In this case, vertical searchengine for specific areas came out. Comparing with the general-purpose search engines,vertical search engines can solve most of the problems that general search engines can’t solve.Vertical search engine focused on a specific field, for example, the real estate, the trip,automobile and education. The vertical search engine can provide us a service which isfocused on the specific field and the in-depth information. Compared with the general searchengine, the query result of Vertical search engine is associated with specific industry orspecific areas. Because the information in the specific is less than the Internet’s information,so the vertical search engine can update easily. The search results are real-time and moreaccuracy.Firstly, this thesis introduces the research background and research significance ofvertical search engines, the development of vertical search engine. Then introduces the webcrawler fetching process, crawl strategy, the theme of the web information extractionalgorithm and the algorithm to sort the search results which were the key technology of thevertical search engine.Secondly, it has studied on the classical page sorting algorithms, including the PageRankalgorithm and HITS algorithm. This article was based on the standard PageRank algorithm,added page similarity judgment factor and the time factor of the said pages of old and new.Through the improvement of PageRank algorithm that can better improve the standardalgorithm of drift and the theme of the query results laid particular stress on the old problemof a web page. The sorting result was improved better.Finally, by studying the key technology of vertical search engines, it analyzed anddesigned the structure and the diagram of the vertical search engine. It developed the verticalsearch engine based on Hertrix and Lucene framework. The Lucene’s standard sortingalgorithm was replaced by the improvement PageRank algorithm. After running the test of thesystem, the improved PageRank algorithm reached got a good effect on sorting results.
Keywords/Search Tags:Topic Crawler, Lucene, PageRank, Web similarity, Crawling algorithm
PDF Full Text Request
Related items