Font Size: a A A

Design And Implementation Of Website Internal Information Retrieval System Based On ElasticSearch

Posted on:2021-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:W FangFull Text:PDF
GTID:2518306104995349Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the laboratory work,it is necessary to develop an online publishing platform that combines the Internet with the traditional publishing industry.The platform will generate a large amount of information,such as books,orders,logistics and other information.If traditional database-based SQL statement retrieval is used,on the one hand,the accuracy of the retrieval results is not high enough,and on the other hand,the retrieval performance is not good in the case of massive data.Taking My SQL as an example,a full table scan is performed during a fuzzy query,which is very time consuming when the amount of data is particularly large.Based on the above reasons,it was decided to develop an on-site information retrieval system based on ElasticSearch to complete the information retrieval work in the case of massive data in the website,instead of simple database-based on-site information retrieval.This paper first analyzes the current research status of retrieval systems at home and abroad,and then introduces the relevant theoretical techniques used in the system design and implementation,including Chinese word segmentation technology,inverted indexing technology,and ElasticSearch's document relevance scoring mechanism.In the analysis of user needs,it is clear that ordinary users and system administrators of the site retrieval system serve the website,and provide them with functions such as book information retrieval,order information retrieval,logistics information retrieval and so on.The whole system is divided into four layers of architecture,including data layer,distributed index layer,search application layer and user layer.The data layer is responsible for data storage,the distributed index layer is responsible for index construction,and the search application layer builds a search application service by accessing files in the data layer and index layer to provide users with retrieval functions.Then,the three major functional modules of the system are designed in detail,including data module,index module and search application module.Among them,the data module uses My SQL for data storage,and the index module uses ElasticSearch for distributed index cluster construction.The method of primary and secondary sharding ensures high availability of index data,and the search application module is built using Spring Boot.Then the detailed design of the system database table is given.Finally,the function of the system was tested by using the black box test method,and the performance of the retrieval system was tested by using the open source framework Jemeter.After the implementation of the ElasticSearch-based information retrieval system on the website,the website-wide information retrieval work of the online publishing platform was completed.The problem that the accuracy of the retrieval result based on the SQL statement is not high and the problem of time-consuming information retrieval of the whole station in the case of mass data is solved.
Keywords/Search Tags:Retrieval system, Massive data, Distributed cluster
PDF Full Text Request
Related items