Font Size: a A A

Design And Implementation Of Parallel Loading Technology For Massive Financial Data

Posted on:2016-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2308330461975794Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the demand for information technology in finance, transportation, telecommunications and other industries, continue to increase. In China, financial industry has been at the forefront of information technology. Along with the growth of users and business updating, the amount of data in the financial database is also constantly increasing, which can be hundreds of terabytes and even petabytes. Gen-erally, financial companies need a large database system to store and manage massive financial data. Meanwhile, due to business needs, sharing and transferring of huge infor-mation is required between different financial systems, which requires the loading of data between different databases. So, massive financial data storing and loading have been a prominent challenge in financial systems.This thesis focuses on massive financial data loading technology, and with a real financial system as the research object, combining with its underlying data storage ar-chitecture and data loading characteristics, design and implement a mass data loading method, which is suitable for the system. Major contributions are listed as follows:1. Based on history database system in bank of communication, we analysis its un-derlying data storage architecture, which is use of distributed database OceanBase to solve the problem of massive data storage. According to analysing loading char-acteristics, we discover that history database system faces massive data loading problem. To this, we propose two solutions.2. To address the issue of data loading into OceanBase, we design and implement two data loading method:data loading based on SQL INSERT, and direct update Memtable(DUMT). The former is common loading technique, which is implement-ed in SQL execution mode. But, the latter is based on OceanBase, which is only suitable for OceanBase. Compared to the former, this approach can reduce the volume of network transmission and transactions, and improve loading efficiency. Empirical studies show the loading efficiency of the latter is better than the former.3. According to characteristics of data loading in history database system, we propose a multi-task parallel-loading method, which splits the loading task up across mul-tiple loading servers. This loading method makes full use of computing resources in database and loading server, to run loading task in parallel on different loading server.4. To get a better parallel-loading efficiency, two task scheduling strategies are pro-posed, including scheduling based on table loading tasks, and two-phase scheduling based on fine-grained loading tasks. Both scheduling strategies are based on differ-ent task granularity, and make the execution of tasks parallelize as much as possible. Empirical studies show two-phase scheduling get a better loading efficiency, which can make better use of computing resource of loading servers.
Keywords/Search Tags:Massive Financial Data, Data Loading, Parallel Loading, Task Schedul- ing
PDF Full Text Request
Related items