Design And Implementation Of Parallel Loading Technology For Massive Financial Data

Posted on:2016-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Li

Full Text:PDF

GTID:2308330461975794

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, the demand for information technology in finance, transportation, telecommunications and other industries, continue to increase. In China, financial industry has been at the forefront of information technology. Along with the growth of users and business updating, the amount of data in the financial database is also constantly increasing, which can be hundreds of terabytes and even petabytes. Gen-erally, financial companies need a large database system to store and manage massive financial data. Meanwhile, due to business needs, sharing and transferring of huge infor-mation is required between different financial systems, which requires the loading of data between different databases. So, massive financial data storing and loading have been a prominent challenge in financial systems.This thesis focuses on massive financial data loading technology, and with a real financial system as the research object, combining with its underlying data storage ar-chitecture and data loading characteristics, design and implement a mass data loading method, which is suitable for the system. Major contributions are listed as follows:1. Based on history database system in bank of communication, we analysis its un-derlying data storage architecture, which is use of distributed database OceanBase to solve the problem of massive data storage. According to analysing loading char-acteristics, we discover that history database system faces massive data loading problem. To this, we propose two solutions.2. To address the issue of data loading into OceanBase, we design and implement two data loading method:data loading based on SQL INSERT, and direct update Memtable(DUMT). The former is common loading technique, which is implement-ed in SQL execution mode. But, the latter is based on OceanBase, which is only suitable for OceanBase. Compared to the former, this approach can reduce the volume of network transmission and transactions, and improve loading efficiency. Empirical studies show the loading efficiency of the latter is better than the former.3. According to characteristics of data loading in history database system, we propose a multi-task parallel-loading method, which splits the loading task up across mul-tiple loading servers. This loading method makes full use of computing resources in database and loading server, to run loading task in parallel on different loading server.4. To get a better parallel-loading efficiency, two task scheduling strategies are pro-posed, including scheduling based on table loading tasks, and two-phase scheduling based on fine-grained loading tasks. Both scheduling strategies are based on differ-ent task granularity, and make the execution of tasks parallelize as much as possible. Empirical studies show two-phase scheduling get a better loading efficiency, which can make better use of computing resource of loading servers.

Keywords/Search Tags:

Massive Financial Data, Data Loading, Parallel Loading, Task Schedul- ing

PDF Full Text Request

Related items

1	Research And Implementation Of The Muti-Task-Parallel Scheduling Loading Technology Of Massive Text Data
2	Design And Implementation Of The Loading Technology Of Massive Text Data
3	Parallel The Distribution Of Data In The Digital Library System, Loading And Maintenance
4	Design And Implementation Of Graph Data Loading Tool
5	Research And Implementation Of Distribute Massive Text Data Index And Retrieval System
6	Design And Implementation Of Loading Tool Software System For Marine Environmental Data
7	The Parallel Loading Technology Of Column Data Index In Distributed In-Memory Database
8	Regression Based Task Off-Loading And Optimal Resource Allocation For Cloudlet Embedded MCC
9	The Mechanical Behavior Of BGA Solder Joints Under Different Loading Modes
10	High-speed Data Generator Data To Produce The Hardware Design