Font Size: a A A

Research And Implementation Of Efficient Log Information Extraction Platform Based On Flink

Posted on:2021-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2518306308972989Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Various systems on the Internet continuously generate massive log data at an unprecedented speed.How to extract information from massive textual log data has become an increasingly serious issue in the field of log processing.Due to the increasing complexity of web applications,applications are usually split into multiple sub-services according to their functions,resulting in the log content being split into multiple log files.When extracting log information,it is usually necessary to stitch the data in multiple log files to obtain the complete log information.The distributed computing engine has excellent scalability,and is more suitable for extracting information from massive textual log data than traditional information extraction technologies.As a new generation distributed computing engine,Flink provides a unified programming model and execution engine for real-time streaming data analysis and batch data processing.However,Flink has some shortcomings when excuting multiple joins.It cannot effectively optimize multiple joins,resulting in poor performance of the job which contains multiple joins.In the field of distributed computing,some related methods have been used to optimize multiple joins,but most of them are based on MapReduce and cannot be directly applied to Flink.Therefore,it is necessary to optimize the performance of multiple joins in Flink.Based on the existing research background,this thesis optimizes the Flink's multiple joins,so that the platform can efficiently extract massive log information.The specific work of this thesis is as follows:1.This thesis describe the key technologies involved in the efficient log information extraction platform based on Flink in detail,including Flink's distributed computing engine,existing distributed table connection algorithm and existing multiple joins optimization algorithm,etc.2.Based on the existing join order optimization algorithms,this thesis proposes a Multi Bushy Tree algorithm that can promote the execution efficiency of Flink's multiple joins and a Semi Join algorithm that can optimize Flink's star join.3.Based on the previous research,this thesis designs and implements an efficient log information extraction platform based on Flink.
Keywords/Search Tags:Flink, multiple join, join parallelism, join order, information extraction
PDF Full Text Request
Related items