Font Size: a A A

The Research And Implementation Of Tourism Information Vertical Search Engine Based On Nutch And Solr

Posted on:2017-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:G X ChenFull Text:PDF
GTID:2348330482492412Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the network, the World Wide Web has become the carrier of a large amount of information, the search engine as an important tool for people to obtain and use these information, and become the entrance guide for the users to access web access.The traditional generic search engine does not distinguish between collecting data across the network, although comprehensive coverage but also has the different this shortcoming, thereby improving the specific needs of the user, the cost of screening. The vertical search engine just collects relevant pages in a particular field, can more accurately and quickly to allow users to access to its interest in the field of information. The vertical search engine for the field of tourism, can make tourists, tourism practitioners and other relevant personnel to quickly access to travel information.Nutch is Apache's open source java web crawler, mainly for the collection of Web data, and then analyze the crawling to the page, it is combined with open-source full-text indexing framework Solr, can build a search engine prototype system. This topic in the study of the basis, through the relevant function transformation module, improved algorithm, implement a vertical search engine for tourism domain. The main contents of this paper are as follows:(1)First, understand the background, clear the significance of this study, introduces the search engine working principle, classification and the development history, then expounds the general search engine and vertical search engine system structure, general search engines have limitations and vertical search engine advantage. Secondly, the key point in the analysis of vertical search engine, the proposed model for the field of tourism theme crawler.(2)The vertical search engine and general search engine, the biggest difference is the theme of the content acquisition. In the choice of a certain number of sample documents using document frequency (DF) combined with artificial selection and establishment of Tourism Subject Thesaurus, crawling in the process of application of topical relevance judgment algorithm combined with thesaurus on the web page to determine the relevance of the theme, filtration and tourism theme correlation page.(3)In the indexing process introduced IK-Analyzer to enhance support for the Chinese word segmentation in search engine, and extend the lexicon, added to the thesaurus content, extended stop words. Web page ranking algorithm the pros and cons of user query experience is closely related to, in the search rankings using PageRank algorithm based on combined with topics related to the improvement of page score, makes the page's authority and the subject of such factors taken into account in the page ranking.(4)From the UI design of the major search engines such as Baidu,Google to achieve a good user interface, enhance the user experience.(5)After the principle of in-depth understanding of nutch and Solr, source code to achieve, the theme for the tourism field collected the goal to propose own innovative ideas and solutions, and carry on the secondary development, based on nutch and Solr tourism information vertical search engine system. On the server, to build Hadoop distributed platform, and the deployment of system operation and test.Finally, the article summarizes the work done, and the direction of future research prospects.
Keywords/Search Tags:Vertical Search Engine, Tourism Information, Nutch, Solr, Subject Crawler
PDF Full Text Request
Related items