Font Size: a A A

Design And Implementation Of Big Data Platform For Financial Industry Based On MPP Database

Posted on:2020-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:J C HanFull Text:PDF
GTID:2518306551952269Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Along with the development of big data technology,more and more internet financial enterprises and large state-owned banks have built their own big data platform and completed their digital transformation.In the case of low level of scientific and technological personnel and limited investment cost,how to apply big data technology to domestic small and medium-sized agricultural and commercial banks and build their own large data platform has become an imperative problem to be addressed.This thesis systematically summarized the related technologies and tools involved in large data platform: 1)data persistence technology based on distributed architecture,including No SQL unstructured database represented by Hbase and Mongo DB,and distributed relational database based on MPP(Massive Parallel Processing)architecture represented by Greenplum;2)data processing and batch processing technology and tools,including data stage,Power Center as the representative of off-database processing tools,and ETL Automation as the representative of indatabase data processing tools;3)scheduling technology and tools for ETL task scheduling and execution sequence optimization,such as Control-M,Task CTL.Based on the comparative test of No SQL database and MPP relational database in data query,analysis and processing performance,considering the characteristics of domestic small and medium-sized agricultural and commercial banks,and integrating existing large data processing technology,a large data platform with high performance and scalability is designed and implemented.Comparing to the data platform based on IOE architecture,the system designed in this thesis uses open source MPP database as data persistence layer,which solves the horizontal scalability problem of traditional data platform based on IOE architecture,and uses self-developed ETL,task scheduling tools and unified management and control platform to solve the problems of easy usability and maintainability of existing large data platform,and finally reduces the cost of hardware and software procurement,and also reduces maintenance,management costs greatly.
Keywords/Search Tags:MPP(Massive Parallel Processing) database, ETL(Extract Transform Load) tool, Task Schedule Tool, NoSQL database
PDF Full Text Request
Related items