Font Size: a A A

The Research Of Nutch-Based Mobile Search Engine For Rural Information Service

Posted on:2016-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhaoFull Text:PDF
GTID:2348330482482105Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the era of 4G network and the arrival of big data, network resources increase explosively. Against the rapid development of rural information service field, combining the advantages of smart mobile devices, providing a mobile searching engine with topical relevance, local proximity and high accuracy, which is the main function of agriculture-related websites should have, and is convenient for farmers to get the information they need quickly and accurately, it can improve the search experience sense of farmers, promote the construction of rural information, perfect the agricultural service system. Rural information service mobile search engine technology is the key technology to improve the acquisition accuracy of rural service information and enhance the search experience sense of farmers, it is the focus and hot spots of current research in the field of rural information service.Rural information service mobile search engine technology is the mobile search technology which is around the rural information services areas. On the basis of Nutch engine technology, around the rural information services topics, achieving the subject filtering of rural information service in the web pages by using the vector space model(VSM), and also combining with the national gazetteer to accomplish the information extraction of spatial location in the web pages; building the hybrid index model which is first inverted file then R-tree(IR) based on studying the set-oriented text search inverted file indexing technology and two-dimensional space-oriented R-tree indexing technology, comply the indexing capabilities of rural information service mobile search. Based on the study of Lucene sorting algorithm theory, comprehensive considering the geo-spatial location factors and Web content-related factors, improve sorting algorithm, conduct effective optimization of the search results and show the sorting results of geographical proximity and search topical relevance. Around the Rural information services areas, users can be more convenient, faster and more efficient to achieve the dual retrieval based on location and keywords by using the mobile devices. The following key elements of the thesis:Firstly, study on constructing the rural information service mobile search engine overall system framework based on Nutch. Based on the focus research of traditional search engine works and key technology, by using the open source search engine Nutch, propose an improved design of rural information service mobile search engine based on Nutch, and overview the design and optimization program of each module;Secondly, study designs the rural information service mobile search engine page acquisition function module. The module mainly study on the rural information service topic filtering model and the Web page location information acquisition algorithm. The rural information service topic filtering model obtains the initial URL of rural information service relevant to the subject by artificial selection, constructs thesaurus by Chinese word segmentation system, and discriminates crawl subject relevance between pages and thesaurus according to the vector space model(VSM) algorithm, realizes the web crawling and filtering around the theme of rural information service; web page geographic position information is obtained through the combination of national gazetteer, by identifying geographical names, distinguishing geographical names and determining geographic focus point three process to ultimately achieve;Again, study designs the rural information service mobile search indexing module. For achieving that rural information service mobile search engine based on Nutch should both have text retrieval capabilities and spatial positional information retrieving function, this paper builds the hybrid index model which is first inverted file then R-tree(IR) based on studying the set-oriented text search inverted file indexing technology and two-dimensional space-oriented R-tree indexing technology, provides technical support for rural information services mobile search has an efficient retrieval capabilities;Finally, study designs the rural information service mobile search ranking module. According to the text correlation and distance proximity of the information in mobile search environment, on the basis of Nutch score sorting algorithm, proposes top-k text search(Lk T) query sort based on the location-aware, respectively does the normalization process to text-related factors between the search keyword and crawled pages, and distance similar factors between the query location and geographic focus point on the page, and does linear- combining according to weights, designs rural information service module mobile search ranking module, achieving prioritize sort out the localized important information.The experimental results show that around the field of rural information service, rural information service mobile search engine based on Nutch has the higher retrieved quality, which is able to meet the needs of farmers moving retrieving.
Keywords/Search Tags:rural information service, mobile search, geotag, mix index, Nutch
PDF Full Text Request
Related items