Font Size: a A A

Research And Implementation Of Bank OLAP System Based On Data Lake

Posted on:2022-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:F B WangFull Text:PDF
GTID:2518306773497594Subject:FINANCE
Abstract/Summary:PDF Full Text Request
With the development of the Internet,big data processing related technology wave after wave,that playing an increasingly important role in all walks of life.While enterprises are becoming increasingly mature in large-scale data online analysis and processing(OLAP,Online Analytical Processing)technology,some new problems also challenge the development of this field.In the existing bank OLAP system,it's faced with problems such as data island,chimney research and development,data asset difficulties,change of master and meta data,flow batch data heterogeneous processing,and these problems affect each other,which are difficult to be fundamentally solved from the improvement of partly technology.In view of the above problems,this paper studies from the architecture level,and discusses the new generation of data lake-based OLAP architecture level solutions based on the development status of cutting-edge open source technology and banking application scenarios.First,the bank data lake platform is constructed,and then the characteristics of the bank OLAP system based on the data lake platform are discussed.The main works of this paper are as follows:1.Enterprise data asset-conversion mechanism construction.At the level of bank data lake platform,the data content and data lineage backtracking function module are built based on Apache Atlas,and a strategy-based data access control mechanism is built based on Apache Ranger,thus enhancing the ability of bank data assets.2.The integrated architecture of data lake and warehouse.Based on the construction of the bank data Lake platform,the company inherits many excellent functions and characteristics of the data Lake platform with low cost or even zero cost,and completes the standardization and reusability of the work in data processing at the enterprise level,thus further reducing the chimney research and development problems from the architecture level.3.Flow and batch integration architecture implementation.At the research and development level of bank data Lake platform and OLAP system,the flow batch integration architecture is built based on Apache Flink and Apache Iceberg.This solution solves a series of problems such as data consistency, data island and maintenance difficulties caused by real-time flow computing and batch processing heterogeneous under Lambda architecture.4.Master and meta data iterative feature research and development.At the research and development level of bank data Lake platform and OLAP system,flexible master and meta data change support is built based on the Schema Evolution,Time Travel and related excellent features of Apache Iceberg.This solution solves the high change cost of master and meta data to a certain extent.Under the background of bank OLAP application,the above works are discussed in detail.In practice,compared with the traditional banking OLAP system,the emerging OLAP architecture based on data Lake can bring better solutions to the aforementioned series of scenarios.
Keywords/Search Tags:data lake, OLAP, lake and warehouse integration, flow and batch integration, master and meta data iteration, data asset transformation
PDF Full Text Request
Related items