| With the large-scale popularization of mobile Internet and Internet of Things devices,the world has entered a post-information society,and the 21st century is the century of big data.The ever-increasing demand for massive data storage,processing and analysis makes traditional databases no longer meet the requirements,and distributed databases have emerged.After years of development,the current distributed databases have three major research directions.The first is the concept of NewSQL,which uses Paxos or Raft consensus algorithms to provide data with high availability and strong consistency of distributed transactions to meet users’ consistency requirements in distributed scenarios;the second is Sharding technology,which is based on MySQL’s years of technical accumulation provide users with stable database services;the third is cloud-native database.Through the storage and computing separation architecture combined with cloud virtualization technology,storage and computing resources are regarded as resource pools to achieve rapid horizontal expansion and contraction of distributed databases,reduce usage costs and obtain higher performance.Distributed memory database with columnar storage stores data in columns.Since each column data type is known and the same,the system can compress it to a high degree.When accessing data,you can also access only the columns involved to reduce System I/O,the parallel processing capability of the processor can also be used to improve efficiency during calculation.This thesis is based on the database self-developed by the teaching and research office—GoldFish in-memory database,and is oriented to OLAP(online analytical processing)scenarios,designs and implements column storage engine module based on the Raft protocol,which not only complements the reliability of the system,but also implements SDO(Slice Data outline),which improves query performance.The main contents of this thesis are as follows:1.The use of multi-copy mechanism supplements the reliability of data,so that the system can recover data from other nodes when the physical node crashes or the persistent memory is damaged,which solves the single-point failure problem and makes the system data highly reliable.2.The use of multi-copy mechanism supplements the reliability of data,so that the system can recover data from other nodes when the physical node crashes or the persistent memory is damaged,which solves the single-point failure problem and makes the system data highly reliable.3.Implemented column fragmentation data index to improve query performance,so that the system can get results faster when running multi-condition query statements,aggregate query statements,and query statements with Join. |