Font Size: a A A

Design And Implementation Of Real Time Computing Platform For E-commerce Transaction Data

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ChenFull Text:PDF
GTID:2428330611965911Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Real-time computing also known as stream processing,is a process of generating from the data and collecting it in real time,which can meet the needs of faster calculation and analysis.The emergency and development of big data technology provides the new solutions for the processing of massive data.The offline batch processing system originally built based on Hadoop is implemented by HDFS storage,Map Reduce,and Hive computing modes.This pattern is often used to build the large data warehouse currently.The traditional data warehouse-based query statistics synchronizes the data of each business system to the data warehouse,forms the ods layer data set,and develops statistics on this basis.Data warehouse often store T-1 data,which cannot meet the needs of real-time data processing.Although the scheduling time of data can be adjusted to 30 minutes to several hours,characterized high throughput and time-consuming,this method can meet the requirements of offline analysis reports with low timeliness.However,more and more application scenarios have not met the requirements only through offline batch processing.Enterprise expect to perform online real-time processing and instant response to streaming data,because the value of data is time-sensitive.The emergence and development of real-time computing technology has accelerated the processing of data and made data more valuable.Based on the Blue Moon's requirements for real-time processing of e-commerce transaction data,buried point data,and manufacturing data,we discusse the design and implementation of building a unified big data real-time computing platform.The data processing flow of the real-time computing platform is as follows: first,analyse the My SQL binlog and write to the local file through canal.Second,Flume collects and sends the data to Kafka,the distributed message queue.Finally,Storm,the distributed real-time computing framework,pulls the data from Kafka for dealing,and writes the dealing results to redis,the high-performance cache component.In addition,the platform also built a distributed index service elasticsearch to meet the needs of real-time indexing,as well as elasticsearch stores and analyses the dirty data encountered in the process of real-time computing.
Keywords/Search Tags:Real-time computing, Storm, Binlog, Canal, Kafka
PDF Full Text Request
Related items