Font Size: a A A

Design And Implementation Of Data Warehouse For Insurance Platform Based On Big Data

Posted on:2022-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ChenFull Text:PDF
GTID:2518306605970849Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapidly growth of the insurance industry,the traditional insurance industry urgently needs a more effective way to deal with the ever-increasing mass of data.However,in the traditional domestic insurance industry,faced with such a large amount of data,the utilization efficiency still has great limitations.The progress of big data technology and visualization technology has really brought unprecedented changes to big data processing.Starting from the actual business of insurance company B,this article developed a big data real-time Kanban by using the combination of Hadoop ecological technology and Spark Streaming technology,and designs a data model specifically for the B company's business ETL processing program,and both designed and developed a complete real-time Kanban system application combining data storage,data processing and data application.First,by reviewing the current development of insurance industry and big data related technologies both home and aboard,this paper selects Spark computing framework as a well developed,stable technology among several common data processing frameworks;Next,this article by studying the MapReduce processing mechanism of the Hadoop platform,the HDFS distributed system and the Hive data warehouse,and as well as the Kafka message queue,Spark Streaming and Hbase data storage,really lay a theoretical foundation for the realization of upcoming specific system functions;Then,this article designs the functional and non-functional requirements for the specific insurance herein scenarios the systems involved,and then completed the system logical structure and design technology based on this;On this basis,this article has implemented a specific design of the those functions involved in the real-time Kanban processing module,including the design of the ETL task of the data flow,the design of the version control method for managing real-time data and offline data at the same time,and the abnormal data processing method,and the data aggregation and summary method,etc.In the application module,this paper designs and explains the most representative monitoring and visualization of insurance claims process of the system.Then,as for the specific implementation,this article gives the sequence diagrams and flowcharts of all the four modules,and gives specific explanations on implementation.The use of the function is supplemented in detail.Then in the testing part,this article mainly tested the data accuracy and reliability of the system.At the same time,this paper also tested the Kanban and monitoring functions of the data application module,and tested its usability,and real-time data processing capabilities.Demonstration was carried out,which proved that the system can meet the requirements of B company,whose DAU is tens of millions of data processing levels.It can basically meet various business scenarios in daily use and needs.Finally,this article recaps the full text,reviews the system design and the shortcomings in specific use,and looks forward to the next step of the system that can be improved,as well as other future application directions of insurance big data.
Keywords/Search Tags:Big Data, Insurance, Real-time Kanban, Distributed System
PDF Full Text Request
Related items