Font Size: a A A

Implementation And Optimization Of Nutch-based Search Engine For Agricultural Information

Posted on:2012-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2178330332499493Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet technology, the number of knowledge and resource on the Internet is grow Explosively,and how to share and manage these resources effectively and regularly is the key issue facing the Internet, and is the next generation of Web's main research directions. For this situation, knowledge grid technology came into being. Knowledge the goal of Grid research is based on the next generation of Web constructing effective sharing and management platform for knowledge,information and resources.In this paper, we achieve the agricultural information search engine system Based on open source search engine Nutch, and the inadequacies of the system was improved and optimized. This work belongs to the State 863 Project "Digital Agricultural Knowledge Grid Research and Application" and achieves the collection of agricultural information on the Internet and retrieval. and provides a rich resources for the construction and expansion of local knowledge.The specific content of this article:(1) Describes the research background, purpose and meaning, summarized some of the search engine optimization results.(2) Details the background of this article. search engines work and architecture were described in detail, and the open source search engine Nutch's overall architecture is been explorated.(3) Implementation of agricultural information search engine.Based on In-depth knowledge of search engine technology, we develop agricultural information search engine based on open source search engine Nutch.(4) Improvement and optimization of the system for some shortcomings.First,improvement of the website analysis module. In this paper, we adopt the theme page information extraction method based on STU-DOM tree,based on the analysis of the page to achieve the filter of a non-subject information node based on the semantic attribute value.Second.improvement of the summary extraction module. In this paper,we add the determinationof text features to the automatic summary extraction based on statistical methods,and assign the sentence a more refined weightfrom form the word frequency, sentence patterns, cues and other aspects. Third.implementation of the query expansion module. We build the agricultural domain ontology,and on this basis,use the Jena inference engine to query the corresponding subclasses,synonyms and Instances of search keywords in the ontology,and regard these words as the search related words.This development of agricultural information search engine,as the quiz system's main function modules in the "Digital Agriculture Knowledge Grid", realizes the collection of agricultural information on the Internet and retrieval,as well as provides a rich resources for the construction and expansion of local knowledge.we also compare the improved results with the unimproved ones.From the comparison we found that many of the portal and pages which contains a lot of links are filtered out, and more search results are text-based pages the user can get information directly from them; The content of summary from improved summary extraction module is more substantial than before,the content of summary match the theme pages in a higher degree; Query expansion module provides the search related word has a certain semantic,and provide the way of precise searching for the user.
Keywords/Search Tags:Knowledge Grid, Search Engine, Nutch
PDF Full Text Request
Related items