Font Size: a A A

Design And Implementation Of Data Warehouse And Complex Ad-hoc Query For Commercial Banks

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q G YangFull Text:PDF
GTID:2428330605462322Subject:Industrial engineering
Abstract/Summary:PDF Full Text Request
In the context of the financial industry's informatization,a commercial bank is facing difficulties such as insufficient innovation capabilities,rising industry standards,rising non-performing assets,enhanced supervision,diversified customer demand,and fierce competition in the industry.The fastest and most effective means of this kind of dilemma,data analysis is a crucial step in the information reform.However,the commercial bank currently has problems such as scattered data,inconsistent data formats,inefficient data integration,and slow data analysis,which makes data analysis very difficult.This article will study the data warehouse of'a commercial bank and its complex ad hoc query to solve the problem of difficult data analysis of the commercial bank and promote the commercial bank's transition to a data-driven model.First,a multi-dimensional data model of a commercial bank's data warehouse is designed,and data extraction-transform-load(ETL)technology is used to construct a bank data warehouse.Analyze the existing data information of the bank,use the extension classification method to select the bank's high-value data,and design the bank's subject field and corresponding fact table and dimension table to establish a multi-dimensional data model of the commercial bank's data warehouse.Design distributed data warehouse ETL,use greedy algorithm combined with optimized genetic algorithm and ant colony algorithm to complete ETL task scheduling,and achieve efficient and stable integration of data in the data warehouseSecondly,on the basis of the commercial bank's data warehouse,a calculation engine for data warehouse ad hoc query is designed to realize fast and complex ad hoc query.The existing calculation engines were screened using the superiority evaluation method.Two calculation engines,Hive and Presto,were selected and integrated using their respective advantages to form a new integrated calculation engine for the commercial bank.In order to improve the calculation speed and computing power,the complex ad hoc query statement is split into multiple simple queries.By constructing a directed query graph,a depth-first traversal of the directed query graph is performed to form a directed spanning tree,and then a breadth-first traversal is performed to traverse the directed spanning tree to generate an intermediate result table,and the intermediate result table is stored in the virtual memory file system Alluxio In the middle,the query result is obtained through the intermediate result table,and the complex ad hoc query of the data is realized.Finally,the bank's complex ad hoc query platform system was implemented,which confirmed the superiority of the research results of a commercial bank's data warehouse and its complex ad hoc query.Through this system,the ETL import performance test and data ad hoc query speed test have confirmed that the bank data warehouse and its ad hoc query calculation engine designed by this paper have solved the problems of bank data decentralization,inconsistent data formats,and slow data analysis.This solves the problem of difficult bank data analysis and helps banks to realize information reform.
Keywords/Search Tags:data warehouse, ad hoc query, multidimensional data model, ETL task scheduling, calculation engine
PDF Full Text Request
Related items