Font Size: a A A

Research On Data Storage Strategy Of Elasticsearch

Posted on:2020-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:C G LiFull Text:PDF
GTID:2428330590971705Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The computer and the Internet are developing rapidly in today's society.Every industry generates hundreds of millions of data every day and how to retrieve these data quickly from massive data has become an urgent need of users.To this end,the search engine framework represented by Elasticsearch,Solr,and IndexTank come into being.Among them,Elasticsearch has been favored by major Internet companies and many scientific research institutions in the industry because of its high real-time,high availability,simpleness of forthputting and high retrieval efficiency.Elasticsearch is widely used in the actual production and work of all walks of life,and become the mainstream full-text search engine framework.Firstly,the thesis conducts an in-depth investigation and research on the application status,research status,related literature and source code of Elasticsearch distributed search engine framework,analyzes the architecture,operation mechanism and implementation principle of Elasticsearch,and summarizes the problem of the index segment merging mechanism,the routing mechanism and data placement strategy in the Elasticsearch,and proposes the corresponding improvement strategies verified them through experiments.The main work done is as follows:In the aspect of Elasticsearch's index segment merging,since the existing strategy does not consider the load of the node at the time of merging.Under the high load of the cluster,due to thread context switching and resource contention,the speed of index segment merging and the throughput of the cluster are reduced.In addition,the existing merging strategy does not optimize the data distribution.Thus,affecting the efficiency of data retrieval.Therefore,an index segment merging strategy based on the similarity evaluation model is proposed.This strategy reduces the number of index segment merging by selecting different index segment merging methods,and improves the throughput of the cluster.Further,the similarity evaluation model selects the most suitable index segments to be merged to improve data retrieval speed.In the aspect of Elasticsearch's document routing and data placement,its default routing formula allows all data to be evenly distributed to each index shard,and then traversal from all shard in the query,the query for small amounts of data is not efficient enough;The existing placement strategy also has the problem of uneven distribution of data of different subject categories and low efficiency of data import.To this end,a data placement strategy based on shard binding is proposed.Based on the data shard binding,the data selection method can select different data storage methods according to the size of the data and the degree of heat and cold,and select the optimal one through the shard binding model.The shard binds nodes,which optimizes the storage distribution of data and improves the efficiency of reading and writing data.The research work based on the Elasticsearch distributed search engine framework of the thesis shows that research Elasticsearch's data storage strategy and further optimize its index segment merging and data placement strategies can improve the retrieval speed and throughput of Elasticsearch,and has much significance to apply and promote the Elasticsearch search engine framework.
Keywords/Search Tags:Elasticsearch, similarity evaluation model, index segment merging, shard binding, data placement strategy
PDF Full Text Request
Related items