Research And Implementation Of Efficient Log Information Extraction Platform Based On Flink

Posted on:2021-08-31

Degree:Master

Type:Thesis

Country:China

Candidate:W Li

Full Text:PDF

GTID:2518306308972989

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Various systems on the Internet continuously generate massive log data at an unprecedented speed.How to extract information from massive textual log data has become an increasingly serious issue in the field of log processing.Due to the increasing complexity of web applications,applications are usually split into multiple sub-services according to their functions,resulting in the log content being split into multiple log files.When extracting log information,it is usually necessary to stitch the data in multiple log files to obtain the complete log information.The distributed computing engine has excellent scalability,and is more suitable for extracting information from massive textual log data than traditional information extraction technologies.As a new generation distributed computing engine,Flink provides a unified programming model and execution engine for real-time streaming data analysis and batch data processing.However,Flink has some shortcomings when excuting multiple joins.It cannot effectively optimize multiple joins,resulting in poor performance of the job which contains multiple joins.In the field of distributed computing,some related methods have been used to optimize multiple joins,but most of them are based on MapReduce and cannot be directly applied to Flink.Therefore,it is necessary to optimize the performance of multiple joins in Flink.Based on the existing research background,this thesis optimizes the Flink's multiple joins,so that the platform can efficiently extract massive log information.The specific work of this thesis is as follows:1.This thesis describe the key technologies involved in the efficient log information extraction platform based on Flink in detail,including Flink's distributed computing engine,existing distributed table connection algorithm and existing multiple joins optimization algorithm,etc.2.Based on the existing join order optimization algorithms,this thesis proposes a Multi Bushy Tree algorithm that can promote the execution efficiency of Flink's multiple joins and a Semi Join algorithm that can optimize Flink's star join.3.Based on the previous research,this thesis designs and implements an efficient log information extraction platform based on Flink.

Keywords/Search Tags:

Flink, multiple join, join parallelism, join order, information extraction

PDF Full Text Request

Related items

1	Research Of Interval Join Method Base On MapReduce
2	Optimization Of Database Join Algorithms On DRAM/NVM-Based Hybrid Memory
3	Design And Optimizations Of Multi-table Join Operations For NVM-based In-memory Databases
4	Optimization And Implemetation Of Parallel Join Algorithm
5	Hadoop Based Efficient Join Algorithm Research On GPU
6	Research On Hash Join Algorithm In DM Database
7	Research On Optimization For Multi-way Join In A Map-Reduce Environment
8	Research Of High-Dimensional Space Join And Query Algorithms Based On Main-Memory
9	The Distinct Element Problem In Equi-join For Multiple Data Streams
10	Research On SSD-based Join Query Optimization Technology In Array Database