Font Size: a A A

Research On Adaptive Learned Index Supporting Efficient Writes

Posted on:2022-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2518306572982979Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the era of big data,index technology is very important for efficient data access.The learned index pioneeringly provides a new direction for the application of machine learning in the field of index optimization.Compared with B+Tree,the learned index has better read performance and lower memory usage.Because the leaf node of the learned index manages the entire ordered array,so the scalability of the learned index is poor.The learned index does not support efficient write operations and has no feasible persistence scheme.The adaptive learned index supporting efficient writes scheme called EWALI includes such modules as dynamic data partition,data aware recursive-model indexes(DARMI),and incremental cache.Dynamic data partition module,based on efficient Shrinking Cone data partition algorithm,enables dynamic data partition according to data distribution,and further ensures that the data distribution within each data segment is linear,even a very simple linear model to fit data distribution can also achieve high precision.DARMI splits nodes based on the dynamic data partition module.According to the data range managed by the node and the record density of the data segment,the leaf nodes are automatically split horizontally and vertically.The index structure can be adaptively adjusted according to the data distribution.Each leaf node only manages the corresponding data segment,instead of operating all the data.Therefore,each write affects only some nodes in DARMI,which greatly improves the scalability and maintenance efficiency of DARMI.To support more efficient writes,the EWALI designs an incremental cache module to handle the incremental data,and let write operations asynchronously.When the amount of data in the incremental cache reaches a certain threshold,the incremental data is merged with the data segments managed by DARMI through the background thread.In addition,the EWALI implements index persistence through technologies such as WAL log and LRU cache management.The experimental results show that the EWALI has good read-write performance.Compared with B+Tree index,EWALI has up to 13.5% read performance improvement and53.4% write performance improvement.Compared with FITing-Tree,EWALI has up to17.4% read performance improvement and 60.9% write performance improvement.Compared with the learned index that only supports read operations,EWALI increases the read performance by 94% under the Lognormal dataset,while reduces the read performance by 73.1% on average under the NYCT and OSM datasets,but EWALI has higher write performance.Compared with the XIndex,EWALI has up to 37.2% read performance improvement and 22.5% write performance improvement.
Keywords/Search Tags:Learned Index, Dynamic Partition Algorithm, Data Aware Recursive-Model Indexes, Asynchronous Write, Persistence
PDF Full Text Request
Related items