The Parallel Loading Technology Of Column Data Index In Distributed In-Memory Database

Posted on:2017-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:L Liu

Full Text:PDF

GTID:2308330485984993

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the high-speed development of the Internet industry and wide spreads of concepts like "Big data", "Cloud computing", the explosive growth of data in terms of large capacity, diversity and high growth, is challenging the data processing ability of modern enterprise. For the mature traditional database technology, the rapid development of computer network technology and the expansion of the industry application requirements makes the traditional database technology faced new challenges in many situations. In this situation, the concept of distributed database and in-memory database arises at the historic moment. Compared with the traditional centralized database, distributed database has a good flexibility and extensibility, which has more advantages on the performance and reliability when dealing with huge amounts of data. In-memory database stores data in RAM(Random Access Memory) rather than disk, in this case, the performance of read and write is much higher than disk access, which greatly improves the performance.Compared to the disk, RAM is a very valuable resource, so in most scenarios distributed in-memory database in the business of production is mainly used as computing platform instead of primary database stores data. The massive data is still stored in traditional disk database. How to load mass structured data stored in the traditional disk into distributed memory database quickly is the first problem to solve.To solve this problem, in this paper, we provide a solution to quickly load mass structured data stored in disk to distributed in-memory database. First, against the original structured data, we provides a new set of quick index model, which can achieve efficient of data storage and query quickly. Then, convert the structured data to in-memoty data index with distributed system, and load the data index into distributed in-memory storage engine. In addition, this solution can support user preferences data filtering, and provide incremental data update.The main work and innovations as follows:1. Designed an in-memory database data index model, which implement high efficient data storage and quickly query. The index model based on column data storage, the original data can be compressed and provide quick query efficiency.2. According to the provided data index model, we design a fast parallel data loading scheme for distributed in-memory database, which will load external traditional structured data stored in database into the distributed in-memory database engine. This scheme can support user preferences data filtering, and then create data index. After this, load data index into the distributed in-memory database engine with certain strategy.3. Increment data update with certain strategy to synchronize data between distributed in-memory database system and the external data in the database, and solve the problem of data consistency.4. Adopts distributed system for large amounts of structured data loading in parallel, which can improve the data processing speed, alleviate the pressure of the nodes and the safety of the nodes. The distribute system can also improve the speed of data loading and stability.

Keywords/Search Tags:

Distributed in-memory database, Column data index model, Parallel loading, Incremental updates

PDF Full Text Request

Related items

1	Research On Distributed Memory Column Store Engine
2	Research And Implementation On Parallel Bulk-loading Algorithm For Spatial Index
3	Design And Implementation Of The Loading Technology Of Massive Text Data
4	A Research Of Spatio-Temporal Object Query Processing Technology Oriented To Column Storage Model
5	Study Of Distributed And Parallel Index
6	Design And Implementation Of Transaction System Based On Distributed Columnar In-memory Database
7	The Enterprise Data Real-Time Analysis System Based On In-Memory Database
8	Implementation And Research On Design Of Main Memory Database
9	Research For Intermediate Result’s Management And Data Access Technology In Column-oriented Database
10	Design And Implementation Of Query Optimization Module For Distributed Column Database Based On Memory