Font Size: a A A

Research And Implementation Of Real-time Stream Processing Platform For Massive Log

Posted on:2022-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:F W LiangFull Text:PDF
GTID:2518306734957649Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the progress and development of the era of big data,the amount of data is growing exponentially.Among them,the study of log data has high value in many fields,such as system performance optimization,user behavior analysis,real-time recommendation,etc.Therefore,the real-time collection of log data,mining internal data information and making real-time response have high research significance and value.However,in the current study of real-time stream processing of log data,there are bottleneck problems such as high computational delay and low sensitivity of load scheduling.Based on this,this paper designs a real-time stream processing platform for massive log data.The main research contents of this paper are as follows:(1)In order to solve the problem of excessive data accumulation and delay caused by the original resource scheduling strategy in Flink real-time stream processing platform,a resource scheduling strategy based on load prediction is proposed.The strategy mainly includes: the load prediction algorithm based on machine learning is incorporated into the resource scheduling strategy formulation process,and the resource allocation scenarios of data processing resource shortage and resource surplus are considered simultaneously.The asynchronous state data is used to formulate the online resource scheduling strategy,In order to solve the bottleneck problem of data accumulation and high delay in offline resource scheduling strategy(2)A real-time stream processing platform for massive log data was built based on the massive log scene of the cloud platform of China Service.The platform firstly collects the logs in realtime and reliably through Flume,then simply cleans and classifies the logs,and then connects the classified logs to the optimized Kafka cluster for efficient and reliable circulation.Then Flink completes real-time ETL,real-time deduplication and real-time abnormal warning according to the requirements,and finally HBase efficient storage and Kibana display.(3)The logging real-time stream processing platform designed in this paper is deployed and applied to the cloud of Chinese service.And through the comparison with the cloud log processing system of China server,it is proved that the system can not only fit the log data processing requirements of China server cloud platform,but also has good advantages.
Keywords/Search Tags:Massive log data, Flink real-time stream processing, Load prediction, Resource scheduling strategy
PDF Full Text Request
Related items