Ability of efficiently processing both highly concurrent transactional query and analytical query is required in modern application scenarios for the reason of diversified development of Internet applications,massive growth of application data and continuous improvement of the importance of data analysis.In the traditional enterprise’s technical framework,On-line Transaction Processing(OLTP)and On-line Analytical Processing(OLAP)are usually handled by two individual data systems and ETL(Extract-TransformLoad)tools are introduced as the intermediary.This framework leads to the lag of data analysis,therefore it unable to support fast analysis and decision-making.Traditional databases use a single row-store or column-store scheme,which is difficult to take into account both OLTP and OLAP.In order to meet the needs of performing analytical processing on the latest data,it is necessary to build a storage structure with efficient transaction processing and analytical processing capabilities.Aiming at the above problems,this paper carries out systematic research in storage structure design and query optimization technology.The main contributions are as follows:(1)In order to efficiently handle hybrid workloads within a single storage structure,we propose a hybrid Log-structured Merge Tree(HLSM-tree)storage structure.Based on features of Log-Structured Merge Tree(LSM-tree)such as leveled data store and timerelated data distribution,HLSM-tree divides the whole structure into OLTP part and OLAP part to construct a hybrid storage scheme which can adapt different workloads.We expand compaction operation of LSM-tree to achieve a storage layout transformation along with data life-cycle change.In addition,a local update strategy is designed to solve the write amplification problem in the HLSM-tree,which is able to update specified attribute column value and reduces the space consumption during update process.The results of benchmark tests show that HLSM-tree performs well in both OLTP and OLAP.Also,the experiment verified HLSM-tree’s capability in processing hybrid workloads.(2)Aiming at the problem that the query efficiency of the existing LSM-tree secondary Index scheme decreases with number of data segments increase,we carry out systematic experiments and theoretical analysis.we present Approximate Membership Query Index(AMQ Index),a high efficiency secondary index which exploits cache efficiency of Vacuum filter and ability of bitmap index for ad hoc query support.AMQ Index can provide global index ability upon whole data segments and avoid multi-filter probing in the query process which can effectively decrease query latency.Compared with the traditional index,AMQ Index produces complete correct results for lookups with existing items.Besides lookups with equality predicates,we achieve range query support upon non-primary key columns by designing and implementing logical chain-based range query scheme.Our experiments with real and synthetic datasets show that the query performance provided by AMQ Index is improved significantly compared with baseline methods. |