Font Size: a A A

Implementation And Application Of E-Commerce Real-Time Data Warehouse System Based On Flink

Posted on:2023-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2568307145468014Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the wake of the development of the Internet,the generation of a large amount of data leads to the demand for real-time business.The traditional offline data warehouse can only achieve the data processing capacity of delay T + 1 by regularly processing incremental data every day.The real-time performance is relatively poor,which can not meet the needs of enterprises to obtain data before minutes or seconds.Therefore,in order to obtain the changing data in time and fully mine the value of data,we must shorten the data processing time and improve the real-time performance.Based on the above reasons,this paper designs and implements a real-time warehouse system based on Flink.The main work is as follows:Firstly,based on kappa architecture,a real-time data warehouse basic platform composed of original data layer,data detail layer,data summary layer and data statistics layer is constructed by using dimension modeling method.The log and business data are collected to the original data layer.After data cleaning and processing,they are transmitted by Kafka to the data detail layer.Then,the data diversion is realized by designing and developing a scheme based on Flink CDC.In the data summary layer,the implementation of wide tables and the summary of relevant indicators are carried out.In the data statistics layer,the statistics of users,commodities and RFM are carried out to facilitate multi-dimensional analysis or data mining of data,and the final statistical results are written into the OLAP database Click House.query the data in OLAP database by setting up a data interface,and hand over the core indicators such as GMV,PV and UV in e-commerce business to sugar for real-time display.Secondly,in order to improve the reliability of the system,the monitoring and alarm module is realized.Through the integration of Prometheus and Grafana,the monitoring of Flink job and cluster operation node information are realized,and the abnormal task is sent to the relevant responsible person through email and other alarm methods.At the same time,it realizes the high availability of Hadoop cluster,solves the problem of single point of failure of Name Node,and ensures the security of data stored on HDFS.Finally,on the basis of real-time data warehouse platform,aiming at the problem of user value segmentation,a user value classification method based on improved RFM model is proposed,and the improved model is called RFMC model.The model index value is calculated in real time by the real-time data warehouse basic platform to dynamically obtain the latest index value of the current user.The entropy weight method is used to determine the weight of four indicators of RFMC model.In view of the shortcomings of K-means,a comparative experiment is designed to determine the K-means++ clustering algorithm to cluster e-commerce users.Finally,the users are divided into three categories.According to the classification results,analyze the user value of each category,and put forward marketing suggestions.The comparative experimental results show that the user value classification method proposed in this paper has better effect and is more suitable for customer value segmentation.Through the test of the system,the e-commerce real-time warehouse system designed in this paper has completed the expected function,and the system runs reliably.It can not only meet the requirements of enterprises to view the core indicators in real time,but also make full use of it to mine user value and improve the profits of enterprises.It can provide some ideas for the construction of real-time warehouse for current enterprises,and has good application value and promotion value.
Keywords/Search Tags:Flink, Realtime Data Warehouse, RFM, Monitor And Alert
PDF Full Text Request
Related items