Font Size: a A A

Research And Implementation Of Agricultural Vertical Search Engine Based On Nutch

Posted on:2015-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:X Q WangFull Text:PDF
GTID:2298330434965394Subject:Agricultural informatization
Abstract/Summary:PDF Full Text Request
With the development of the agricultural informatization, more and more agriculturalusers want to find the agricultural information which they need quickly and efficiently.However, general search engines, with returned results of many problems such as largenumber, poor professionalism and low accuracy, cannot meet the needs of agricultural usersto search agricultural information efficiently. Meanwhile, due to the existing agriculturalsearch engines are in the early stages of development and immature technology, there aremany deficiencies such as low utilization, low timeliness, etc, and these deficiencies need tobe improved. In order to solve the above problems, this study, based on open source searchengine Nutch, conducted research on agricultural vertical search engine. Studies are asfollows:(1) Research on agricultural vertical search engine based on Nutch. In order to improvethe efficiency of agricultural users to retrieve agricultural information, based on the in-depthanalysis to understand the principles of the basic work on the open search engine Nutch, thisstudy works on the basis of secondary development of the open search engine Nutch andimplements vertical search engine of agriculture. Firstly, gets the initial URL seed list usingthe policy of combining the manual sorting judgment and meta search; Secondly,improvements Nutch Chinese word to support Chinese word using JE word segmentationalgorithm based on forward maximum matching lexicon; Then, collects information using theWeb crawler technology, analyzes page and filters information using the improved subjectjudging method based on vector space model based on keyword topic combining theagricultural domain ontology, to filter out web pages which is not related to the agriculture,using the inverted index method to establish the index for web pages which is related toagricultural; Finally, using PAGERANK sorting algorithms, pages related to agriculturedegree combining the agricultural domain ontology improves the Nutch original page sortingalgorithms based on Lucene. In addition, expands the Nutch auxiliary functions, providesusers with recommendations of related words based on agricultural domain ontology whenthe user queries and keyword based on access number, auxiliary user to query the neededinformation.(2) The design and implementation of systems management platform. In order to facilitate the use of the agriculture vertical search engine based on Nutch as well as the themanagement of agricultural ontology library, designs and implements system managementplatform with the C/S+B/S mode, which provides web spider configuration managementbased on agricultural vertical search engine of Nutch, management of agricultural domainontology library, agricultural site navigation, user retrieve interface and other functions.Finally, the experimental results verify the search engine system management platform in thispaper with better feasibility and effectiveness.
Keywords/Search Tags:vertical search engines, Nutch, information filtering, page rank, queryexpansion
PDF Full Text Request
Related items