Font Size: a A A

Algorithm And System For Large-scale Program Static Analysis Based On Distributed Graph Computation

Posted on:2020-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q JiangFull Text:PDF
GTID:2428330575952507Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Static program analyses are widely applied along the whole process of program development for bug detection,code optimization,testing and etc.Interprocedural static analysis based on CFL(Context-Free Language)is a kind of analysis method with high generality.This method is essentially a kind of graph reachability problem.Although the research on static program analysis has made significant achievements,due to the limitations of hardware resources and traditional methods,analyses on single machine have been difficult to support the efficient processing of growing large-scale graph data,resulting in low efficiency and lack of scalability of static analysis methods Therefore,complex interprocedural analysis of large-scale modern software is still challenging.Aiming at the problem of large-scale program static analysis performance inefficiency and low scalability,this paper proposed an efficient and scalable parallel large-scale program static analysis method based on distributed graph computing model,and designed and implemented an efficient and scalable distributed large-scale program static analysis system.The main research work and contributions of this paper can be summarized as follows(1)We abstract the static analysis problem based on CFL reachability as a large-scale graph reachability problem.Based on the data-parallel framework,a novel parallelized transitive closure algorithm and a Join-Process-Filter computation model are designed.Based on the algorithm and the computation model,we designed and implemented a distributed offline full-quantity static program analysis system,And we proposed algorithm-level and system-level optimization respectively,and designed a data structure with compression effect(2)Based on the offline full-quantity analysis system,the online incremental static program analysis system supporting small batches of code update is further designed and implemented.The research proposed a triangle counting method to support the continuous updating of the graph topology,and also improved the data structure of the offline system for online analysis,and for the large-scale computation that may be triggered by a small batch update,the design realizes the mechanism of the prediction and automatic switching of computation modes(3)Based on the big data processing platform spark,distributed file system HDFS and distributed in-memory database Redis,a set of large-scale distributed static program analysis system and framework integrating computation,storage and query is designed.The implemented static analysis program system has high generality and can support all static analysis tasks that can be expressed as CFL reachability problems.(4)The extensive experiments on real large-scale software datasets show that the offline full-quantity static program analysis system implemented in this paper has good performance and scalability,and can exceed an order of magnitude compared with the most advanced analysis tools available on performance.Experiments with incremental analysis on the same data sets show that for small batch updates,the online analysis system can achieve a second-level response with a certain real-time performance.
Keywords/Search Tags:interprocedural static program analysis, distributed system, data-parallel processing, graph reachability, dynamic transitive closure
PDF Full Text Request
Related items