Font Size: a A A

Research On Key Technologies Of Graph-Based Large-Scale Log Processing System

Posted on:2019-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y AiFull Text:PDF
GTID:1368330623461892Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data center and distributed systems will produce a large amount of detailed and relational logs.Using this information,system analysts can quickly perform fault diagnosis,performance analysis,and effective value mining.Modeling logs into graphs can easily translate log processing issues into graph computing or graph matching problems.Meanwhile,with the further increase in the type and intensity of services,the size of graph-based log data is constantly increasing.How to deal with these data more efficiently has become a research hotspot in academia and industry.In addition to the basic features of the graph,the log data also has the characteristics of temporal relationship,structural diversity,and so on.These features bring new challenges to graph-based log processing.By analysis of existing systems,we find that the performance bottlenecks mainly focused on the disk I/O,memory computation,incremental computation and so on.Therefore,this paper describes a series of optimization strategies and improvement methods to further improve the performance of log processing systems.The main contributions of this paper include:(1)For the problem of too many iterations and the huge amount of disk I/O during the log graph computing,this paper proposed a disk I/O reduction strategy.We explore a fundamentally different tradeoff: less total amount of I/O rather than better locality.By squeezing out all the value of loaded data,the system will reduce the number of iterations for the log analysis applications drastically,which leads to a significant reduction in the total amount of disk I/O.To demonstrate the ideas,we build CLIP,a new out-of-core graph processing system.The experiments show that the algorithms that can be implemented in CLIP are much faster than the original disk-locality-optimized algorithms in many real-world cases(up to tens or even thousands of times speedup).(2)For the problem of poor parallelism in-memory computing and a serious waste of CPU resources during the graph computing,this paper further proposed a series of optimization strategies include selective scheduling mechanisms,parallelization of sequential algorithms,diagonal-based graph partitioning and scheduling strategies.Further enhance the computing performance of the system,so as to meet the performance requirements of full memory and fast storage media.Experiments show that these strategies can achieve significant speedup compared with existing systems(up to 43.3× on in-memory mode and up to 1.5× on external mode).(3)For the problem of excessive CPU resource waste and huge intermediate results during the log graph matching,this paper proposed a graph matching model(TimeWindow),which limits the search space inside the time window.By this mechanism,it reduces the huge disk I/O problems caused by too many matching intermediate states and reduces the search space of the algorithm,thus improving the computing performance.The experiments show that TimeWindow can achieve significant speedup than existing methods(1-2 orders of magnitude on in-memory mode and at least 2 orders of magnitude on external mode).(4)For the problem of slow dynamic log graph reconstruction and poor incremental computing performance during the online processing of the log graph,this paper proposed three effective performance optimization strategies including the parallel dynamic graph reconstruction method,dynamic incremental graph matching method,and dynamic incremental graph computing method,which further reduces the latency of the system.Meanwhile,this paper further built a new graph-based online log processing system Pisces.According to the evaluation,Pisces can meet the graph reconstruction speed requirement of 10 million edges per second.Moreover,the newly proposed incremental algorithms achieve significant performance improvements over existing methods(up to10.77× for graph matching under 16 threads and 45% for graph computing).
Keywords/Search Tags:Log Processing, Graph Processing, Disk I/O, Time Window, Dynamic Graph
PDF Full Text Request
Related items