Font Size: a A A

Expansion Of Agricultural Information Retrieval Based On Solr

Posted on:2014-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:J Y YuFull Text:PDF
GTID:2268330425452572Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the progress of science and the development of information technology, thenetwork retrieval technology has been steadily progressing. Rapid growth in informationhas brought great distress to the user’s information query. Many of today’s search enginesare mostly keyword-based, find records that match the keywords in the index returned tothe user. For the agricultural informations,this retrieval method has disadvantages.Because of the different regions, ethnics, idioms, people use different words to express thesame concept of agricultural, some belong to the category of agricultural dialectvocabulary. These words are generally synonymous.In the same time,we will alsoencounter the condition that users cantnot tell the accurate keywords.It is only a fuzzyquery.the retrieval results are often unsatisfied.In this paper, the basic principles of full-text search and Chinese words automaticsegmentation basic theories have been deeply studied. Combined with full-text searchserver Solr’s advantages on query performance, configurable and extensible.Come on withagricultural information extension retrieval system based on solr.Build a agriculturalprofessional dictionary based on hash mechanism.Designe the forward matching algorithmbased on the hash mechanism to match with the dictionary and cope with dictionarysegmentation operation.Let words segmentation module embedde into Solr, Through theresearch of synonym storage structure,we designed two-way chain synonyms storagestructure and then embedde the synonym dictionary into the sub-word dictionary whenbuild the index call the thesaurus. Use Solr set position increments into zero, synonyms arewritten to the index in the same location.the index added job was completed,the systermextends the scope of retrieval. To solve the problems caused by the expression habits andfuzzy query. We also improve the results ranking algorithm of Solr and come with sortingalgorithm of the vector space model. Sort the results based on document similarity.In this paper,we make secondary development on solr1.4platform and improvealgorithm of segmentation modules, expansion modules, and sort module. In order tocompensate for the deficiencies of the platform for the processing of agriculturalinformation. Design and Implementation of the expansion of agricultural informationretrieval experiment platform and system testing. Test results show significantly improvedcoverage for the expansion of the agricultural information retrieval search results. Greatlyimproved the recall rate and the retrieval time was not significantly increased. The systermhas achieved the intended purpose and bring the convenience.to the users.
Keywords/Search Tags:Solr, Agricultural information, The extended retrieval, Divide the chinesewords, Vsm sort
PDF Full Text Request
Related items