Defect Detection Techniques For Complex Programs

Posted on:2022-05-21

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:1488306725470554

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology,the scale and complexity of software are increasing rapidly.Programs contain complex data structures such as pointers,arrays,structures,etc.Moreover,they also introduce runtime mechanisms like concurrency,interrupts,loops,method calls,etc.Programs with such complex data structures and runtime mechanisms are prone to software defects,which may lead to software crashes and result in serious consequences.Therefore,detecting software defects is a challenging but imperative task.Static analysis and dynamic testing are important ways for software defects detection.However,due to the lack of the ability in analyzing deep software semantics,static methods have a high false positive rate,which requires developers to manually validate.Dynamic testing is unable to fully test a program due to the path explosion problem caused by branches and loops.With the development of artificial intelligence,machine learning based defect prediction techniques are becoming more attractive.This kind of method trains a model according to existing defect datasets and predicts defects by the model.However,due to the lack of programs' structure information in existing pro-gram representation methods,such techniques do not perform as well as static analysis.This thesis focuses on improving the efficiency,reducing the cost and enhancing the scalability of detecting data race in complex programs such as concurrent programs and interrupt-driven embedded programs.Then,this thesis aims at validating warnings from static analysis methods,avoiding solving complex constraints in symbolic execution,and modeling program structures in machine learning-based defect detection methods.The main works are as follows:1.Data race is among the most severe and prevalent defects in concurrent software.Due to the limitation that data race is hard to reproduce dynamically because of the complicated thread interleaving,static analysis techniques have attracted growing interest and attention.However,state-of-the-art static analysis methods are unable to analyze the whole program path,resulting in high false positive rate.Moreover,they also cannot balance the complexity of analyzing the whole program state,resulting in low efficiency.To tackle this problem,this thesis proposes a multi-stage detection method based on sliced program paths.It first identifies data race subpaths instead of the whole program paths.Second,it employs May-Happen-in-Parallel(MHP)analysis to identify whether two data accesses in a program may execute concurrently.Finally,for each subpath whose two data accesses are MHP,it computes whether the data race subpath is feasible.The evaluation demonstrates that it is faster than state-of-the-art techniques with the average speedup 6.08 X and significantly fewer false positives(16.0%).2.Static methods usually report numerous warnings,which require developers to manually validate,such validation is costly and low efficiency.To overcome this problem,this thesis proposes an automated framework that detects and validates data races in interrupt-driven embedded programs.It is based on static warnings and guides symbolic execution to generate input data for exercising the potential races.Then,it employs virtual platforms to dynamically classify and validate these race warnings.The experiment results show that this method can validate all data races reports from 11 interrupt-driven embedded programs.3.Symbolic execution is an effective way to explore program paths and detect defects,but it wastes a lot of computation resources in hard-to-solve constraints.To overcome this problem,this thesis proposes a constraint prediction method based on convolutional neural networks(CNN).It uses CNN to build a prediction model and leverages the model to predict the satisfiability of a constraint.With the help of the method,the symbolic execution skips hard-to-solve constraints and spends more time on other paths,which improves the efficiency of symbolic execution.Our experiment shows that it improves statement coverage by 7.5%.4.Training a model based on source code and using the model to predict defects is still a new research field.However,existing methods are hard to perform well because they lack the program structure information when representing programs.This thesis proposes a defect prediction method based on graph interval neural networks.This method represents programs by control flow graph and trains a prediction model that focuses on program local structures by constructing sub-graphs according to loops.The experiment shows that it improves F1 score by 27.8% comparing with the baseline model.

Keywords/Search Tags:

Defect Detection, Program Analysis and Testing, Warning Validation, Data Race Interrupt-driven programs, Symbolic Execution, Deep Learning, Simulation

PDF Full Text Request

Related items

1	Detecting Data Races For Interrupt-driven Embedded Systems
2	Symbolic Execution Of The DSP Programs
3	The Data Race Detection And Replay Based On Instrumentation And Dynamic Symbolic Execution
4	The Research Of Partitioned Symbolic Execution Model And Its Environment Interaction Problem
5	Symbolic Execution Of Complex Programs Driven By Black-box Abstraction
6	Research And Design Of Symbolic Execution Based Software Testing
7	Symbolic Execution For C++ Programs
8	Towards Modeling And Verification Of Interrupt-driven Programs
9	Research Of Data Race Detection Method Based On Deep Learning
10	Guided Symbolic Execution Technology And Its Application