Font Size: a A A

Design And Implementation Of Distributed Logistics Vertical Search Engine Based On ElasticSearch

Posted on:2019-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:T Y ZhangFull Text:PDF
GTID:2428330545969965Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,people have become accustomed to relying on the Internet to obtain information.Search engines have emerged as a bridge between people and massive network information.However,traditional centralized search engines cannot solve the exponentially increasing amount of information in the logistics industry.The emergence of distributed computing technology can alleviate this contradiction to some extent.In addition,the search results for general search engines are not comprehensive enough and not professional enough.The vertical search engine proposed in this paper can well solve the above problems,especially for the logistics field to provide more specific,more effective,more in-depth search service.In this context,this paper designs and implements a distributed logistics vertical search engine system based on distributed technology and vertical search.The main contents and results of this article are as follows:(1)Designed a distributed logistics vertical search engine system integrating Nutch and ElasticSearch.Among them,Nutch is mainly responsible for subject data collection and data cleaning;ElasticSearch is used as a full-text search server to index and provide logistics information retrieval services.The entire system design is divided into three major modules:a logistics theme acquisition module,a distributed index module,and a logistics information search module.(2)In the logistics theme collection module,initial seed and subject filtering strategies are adopted to greatly reduce the amount of collected data,filter out logistics-independent information,and improve the relevance of logistics topics.In the distributed indexing module,the hashing strategy is adopted to solve the distributed index problem,and the distributed indexing efficiency is improved.At the same time,IKanalyzer is introduced to enhance the Chinese search capability of the search engine system.In the logistics information search module,the user's own choice of sorting is used to present the final search results,and the user experience is enhanced through the highlighting technology.(3)For the problem that the same query returns the same result for different users in the traditional search engine,this paper proposes a result reordering scheme based on user interest.Reordering the relevance results returned by the search engine does not change the search results,but merely rearranges the users with more interest to meet the personalized search requirements of different users.(4)Finally,experimental tests show that the system can quickly complete the theme web crawling,achieve high-quality search,and has a good support for Chinese search.
Keywords/Search Tags:Nutch, ElasticSearch, distributed, vertical search, personalized sorting
PDF Full Text Request
Related items