| Nowadays smart phones are widely used,and people of all ages can buy goods at will on the Internet.The storage and analysis of huge amount of e-commerce data has become a topic of general concern.In the computing scenario of mass data,offline data processing with low timeliness is the primary choice for most businesses.When enterprises need to rely on recent events to make quick decisions,offline data processing cannot meet the requirements.For example,the stock transaction price closer to the prediction point in the financial system often affects the decision-making of investors.In the e-commerce system,coupons can be given to promote consumption according to the sales of goods,or advertisements can be pushed to similar users according to the high-frequency keywords placed by users.These problems need to be dynamically adjusted according to the current sales of commodities.These problems are time-sensitive.The offline data warehouses of major Internet companies have matured,and building real-time data warehouses can promote user consumption and increase revenue.All walks of life are using real-time data warehouses to speed up the seizure of the huge economic market.The implementation of the real-time data warehouse of Internet companies and the huge benefits also show that it is necessary and feasible to build an e-commerce data warehouse to analyze and discover the potential value of data.Therefore,it is necessary to develop a real-time data processing system.On the one hand,when an abnormal event occurs,it can be quickly handled by developers to reduce the losses of enterprises.On the other hand,it can be combined with offline data warehouse to troubleshoot problems.This article analyzes the evolution process of processing engine in data warehouse technology,and compares Map Reduce,Spark streaming and Flink in many aspects.Finally we use Flink as the processing engine for building real-time data warehouse.The main contents are as follows:(1)According to the layering strategy of data warehouse,the real-time data warehouse is divided into ODS,DWD,DWM and DWS layers from bottom to top.The data granularity of each layer changes from small to large.Developing some common middle tier data can greatly reduce repeated calculation.During the top-level calculation,the underlying data is reused,which improves the calculation efficiency,and it is convenient to locate the data layer when finding and using data.(2)A real-time data processing system based on Flink is designed.By consuming the buried points data and business transaction data in Kafka message queue,the ecommerce data is cleaned and processed in real-time ETL.Finally,the aggregation results of the data are written into the external container to provide data support for data analysts.(3)The key indicators of the real-time digital warehouse of e-commerce are calculated.For e-commerce companies entering the market,the number of active users,retention rate and GMV of the website are important indicators to measure the popularity and operation of e-commerce websites.This article queries the DWS layer Clickhouse database through the Springboot interface,and the data is rendered into a chart through a visualization tool.It is convenient for data analysts to check the operation of the e-commerce system,and troubleshoot exceptions and competitive product analysis through indicators.When a new business line is added or the original business line is changed in the e-commerce system,it will lead to mismatch the existing data processing logic.Therefore,it is necessary to modify the business logic of the underlying code,and the development cost is high.In order to update the business line processing strategy of the data warehouse in real time and update the underlying data processing logic of the data warehouse by tracking the change of the business table after the change of the business line,this article uses Flink CDC to monitor the binary binlog log of the business table,and processes the parsed data according to the data items of the business configuration table and saves it to external container. |