Font Size: a A A

Defect Detection Techniques For Complex Programs

Posted on:2022-05-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1488306725470554Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,the scale and complexity of software are increasing rapidly.Programs contain complex data structures such as pointers,arrays,structures,etc.Moreover,they also introduce runtime mechanisms like concurrency,interrupts,loops,method calls,etc.Programs with such complex data structures and runtime mechanisms are prone to software defects,which may lead to software crashes and result in serious consequences.Therefore,detecting software defects is a challenging but imperative task.Static analysis and dynamic testing are important ways for software defects detection.However,due to the lack of the ability in analyzing deep software semantics,static methods have a high false positive rate,which requires developers to manually validate.Dynamic testing is unable to fully test a program due to the path explosion problem caused by branches and loops.With the development of artificial intelligence,machine learning based defect prediction techniques are becoming more attractive.This kind of method trains a model according to existing defect datasets and predicts defects by the model.However,due to the lack of programs' structure information in existing pro-gram representation methods,such techniques do not perform as well as static analysis.This thesis focuses on improving the efficiency,reducing the cost and enhancing the scalability of detecting data race in complex programs such as concurrent programs and interrupt-driven embedded programs.Then,this thesis aims at validating warnings from static analysis methods,avoiding solving complex constraints in symbolic execution,and modeling program structures in machine learning-based defect detection methods.The main works are as follows:1.Data race is among the most severe and prevalent defects in concurrent software.Due to the limitation that data race is hard to reproduce dynamically because of the complicated thread interleaving,static analysis techniques have attracted growing interest and attention.However,state-of-the-art static analysis methods are unable to analyze the whole program path,resulting in high false positive rate.Moreover,they also cannot balance the complexity of analyzing the whole program state,resulting in low efficiency.To tackle this problem,this thesis proposes a multi-stage detection method based on sliced program paths.It first identifies data race subpaths instead of the whole program paths.Second,it employs May-Happen-in-Parallel(MHP)analysis to identify whether two data accesses in a program may execute concurrently.Finally,for each subpath whose two data accesses are MHP,it computes whether the data race subpath is feasible.The evaluation demonstrates that it is faster than state-of-the-art techniques with the average speedup 6.08 X and significantly fewer false positives(16.0%).2.Static methods usually report numerous warnings,which require developers to manually validate,such validation is costly and low efficiency.To overcome this problem,this thesis proposes an automated framework that detects and validates data races in interrupt-driven embedded programs.It is based on static warnings and guides symbolic execution to generate input data for exercising the potential races.Then,it employs virtual platforms to dynamically classify and validate these race warnings.The experiment results show that this method can validate all data races reports from 11 interrupt-driven embedded programs.3.Symbolic execution is an effective way to explore program paths and detect defects,but it wastes a lot of computation resources in hard-to-solve constraints.To overcome this problem,this thesis proposes a constraint prediction method based on convolutional neural networks(CNN).It uses CNN to build a prediction model and leverages the model to predict the satisfiability of a constraint.With the help of the method,the symbolic execution skips hard-to-solve constraints and spends more time on other paths,which improves the efficiency of symbolic execution.Our experiment shows that it improves statement coverage by 7.5%.4.Training a model based on source code and using the model to predict defects is still a new research field.However,existing methods are hard to perform well because they lack the program structure information when representing programs.This thesis proposes a defect prediction method based on graph interval neural networks.This method represents programs by control flow graph and trains a prediction model that focuses on program local structures by constructing sub-graphs according to loops.The experiment shows that it improves F1 score by 27.8% comparing with the baseline model.
Keywords/Search Tags:Defect Detection, Program Analysis and Testing, Warning Validation, Data Race Interrupt-driven programs, Symbolic Execution, Deep Learning, Simulation
PDF Full Text Request
Related items