| As the carrier of information,data has inherent value.In the face of the increasing amount of data on the Internet,developing the hidden value behind a large amount of data has become one of the important ways for people to use data.As the name implies,real-time macro user portrait is to view the real-time phenomenon epitome reflected in the current business from a larger perspective,providing scientific basis and guidance for business development.In this paper,Flink and real-time data warehouse are used to realize the real-time index statistics of the system.The main work of this paper is as follows:(1)A real-time data warehouse system is designed in this paper.The system mainly uses Flinkās streaming calculation and related optimization to ensure the timeliness and throughput of the real-time statistical process of a large number of data,and finally outputs the index statistics of the portrait system.(2)In real-time computing,data flow is often associated with external data source system for dimensionality data to supplement the data in the stream.Time consumption during interaction often becomes a bottleneck problem for the timeliness of data processing.In this paper,the optimization scheme of bypass cache and asynchronous I/O is adopted in the construction of data intermediate layer.Since Flink asynchronous I/O is often used to process time-consuming logic,there is a greater probability of back pressure problems in the whole Flink flow processing link compared with other COMMONLY used Flink operators.In asynchronous I/O through detailed analysis of asynchronous processing and Flink asynchronous I/O the underlying implementation,cleverly designed the asynchronous I/O when implemented by joining the thread pool emergency thread combined with dynamic expansion and shrinkage of thread safety design of database connection pool,makes the asynchronous I/O can increase the amount of data flow situation and problem of back pressure,Optimize the interaction bottleneck between dimension association and external data source system to ensure the timeliness and throughput of real-time stream processing.(3)In the data collection of DB database system,this paper introduces a newly emerging connection component that can directly read full data and incremental change data from MySQL database to realize the data collection requirements of various aspects in this paper.In the detail data of the shunt,this paper designed a kind of dynamic distribution scheme,using FlinkCDC++ radio flow configuration information table of combination design,realized the shunt data flow can be arbitrary change,only need to maintain good configuration information tables in MySQL,shunting procedure without any changes to change the flow of data in data stream.(4)Based on Flink,this paper built a real-time data warehouse for real-time index statistics of the portrait system.In the whole process of data processing of real-time data warehouse,from data acquisition through data shunt,data middle layer to data wide surface layer,the whole process is optimized as far as possible.The results show that the real-time data warehouse in this paper performs well in stability,throughput and timeliness. |