Font Size: a A A

The Research And Implementation Of Indexing And Query Techniques Based On Range Query

Posted on:2018-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:M ChenFull Text:PDF
GTID:2428330512997991Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,global data is growing at an explosive rate.As the scale of data expands,the types of data become more diversified.Traditional databases can support complex data queries and ensure data consistency,but there are limitations on scalability and data types.In order to store and manage massive amounts of data effectively,a NoSQL database appeared.Different from traditional databases,the NoSQL database supports flexible data model and can increase the capacity of the database by adding nodes.HBase is a typical distributed,column-oriented NoSQL database that provides real-time read/write access to a large number of structured,semi-structured,and even unstructured data.According to the byte order of the primary key,HBase can store each record and support fast retrieval based on primary keys.However,in some application scenarios,you may need to retrieve other non-primary key fields.Because HBase does not have an indexing mechanism for non-primary keys at present,data retrieval based on non-primary key need to scan HBase tables and are very inefficient.With the widespread use of HBase by domestic or foreign enterprises and scientific research institutions,the optimization of query performance for HBase has become a hot research problem in large data storage management research.In order to further accelerate the range query of non-primary key data,this paper proposes a method of dividing non-primary key index data into slices.The data management granularity from one record to a data block,thereby reducing the space overhead of building the index and the time overhead of searching the index.In addition,this paper proposes a method of adaptively adjusting the data slice so that the partition of the data slice is close to the distribution of the range query.Based on the above technical methods,this paper builds a non-primary key data index for HBase,and constructs a hierarchical index management system based on data slice,which further improves the performance of HBase non-primary key range query.The major contributes and works in this paper are as follows:First of all,for the range query,this paper proposes a method to divide the index data into slices and manage the meta information of the data slices using the skiplist structure.In order to make the division of data slices more reasonable,this paper proposes an adaptive data slices partitioning method based on access frequency.Then,a hierarchical index storage model based on data slices is introduced.Experiments show that the index management based on adaptive data slices can reduce the time cost of index query.Secondly,this paper presents a range query method based on skip list and hierarchical index.At the same time,for the shortcomings of various common replacement strategies,this paper proposes a caching replacement strategy AdaHotscore based on adaptive slices,in order to obtain a higher hit rate in the cache layer.The experimental results show that the AdaHotscore replacement algorithm can predict the hot and cold data blocks well and cache the hot data blocks into the memory to improve the overall performance of the system when the cache space is small.Thirdly,based on the above proposed technical methods,this paper implements a hierarchical index query system based on data slices.Experiments show that compared with the HBase system,our system has improved the performance of the non-primary key range query by several hundred times.Compared with the HiBase system,our system improves the performance of the non-primary key range query by 5 to 6 times.The performance of our system under different size data sets and different node sizes is tested,which shows that the system has good scalability and usability.
Keywords/Search Tags:range query, adaptive partitioning, secondary index, hierarchical storage, cache replacement policy
PDF Full Text Request
Related items