Font Size: a A A

Automated Detection And Localization Techniques For Python Program Defects

Posted on:2018-03-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G XuFull Text:PDF
GTID:1318330542974297Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technologies and the advent of the Big Data era,dynamic programming languages have been drawing more and more attention.Python,as a typical dynamic programming language,has become one of the most popular and widely used programming languages because of its simplicity,flexibility and exten-sive library support.Just like programs in other languages(e.g.,C and Java),Python programs also face a variety of challenges in terms of correctness and reliability.Pro-gram analysis is an important fundamental to guarantee the correctness and reliability of software.Earlier,due to the fact that static programming languages dominate in-dustrial applications,most of the research on program analysis techniques is mainly focused on static languages such as C/C++ and Java.Program analysis techniques for dynamic languages such as Python are still in their infancy.The flexible features of Python bring many new challenges to program analysis,which makes the traditional analysis techniques no longer suitable for Python programs.Therefore,it is extremely urgent and important to propose and develop analysis techniques for Python programs to ensure their correctness and reliability.This thesis focuses on how to automate the detection and localization of Python program defects.The first research problem involved in this thesis is Python type in-ference and checking,as type defects are one of the most common types of defects in Python programs.Detecting type defects requires precise and efficient type inferences as a guarantee.Many type inferences have been proposed for Python and other dy-namic typed languages so far,and most of them are based on the program data flow.However,there are a large number of external function calls in Python programs that make it difficult to obtain the complete data flow in the analysis.As a result,most existing methods are not accurate and easy to fail.Therefore,our first work tries to propose a new Python type inference to improve the performance.Next,we attempt to extend the problem and further study the general techniques used to automate the detection of Python program defects.Symbolic analysis is one of the key technologies to address the problem.However,most of the existing symbolic analysis techniques are proposed towards static programming languages,and they do not support the vari-ations of types and attribute sets in Python program,so that it is very difficult to apply them directly on Python programs.Our second work hence tries to propose a new symbolic analysis for Python to detect potential defects and generate their triggering inputs.Finally,the third problem we concentrated on is how to automatically localize the defects during debugging.Most of the existing fault localization and automated debugging techniques require many passing/failing runs as the oracle.However,oracle executions with high quality may not be available in practice.Currently,debugging with a single run does not have human-like reasoning capabilities,and therefore of-ten requires lots of manual feedbacks to assist the defect localization.Our final work hence tries to combine the human-like intelligence with traditional machine reasoning to improve the performance of localization,thereby reducing the manual effort during debugging.In conclusion,the main contributions of this thesis are summarized as the following:(1)For the Python type inference and checking,we propose a probability-analysis based type inference.This method combines the type hints collected from both pro-gram semantics and natural language by probabilistic constraints,and model the type inference as a probabilistic inference process.It effectively addresses the problem that existing methods often fail inferring many types due to the incomplete data flow in Python programs.We implemented a prototype and evaluated the proposed method by an experiment on 18 well-known open source Python projects.The results show that our method can type 79.09%of the variables that cannot be typed by traditional type inferences,with 82.86%precision.(2)For the automation of general bug detection and test case generation,we propose a symbolic predictive analysis for detecting bugs in Python programs.This method first collects the execution trace of a passing run and then encode the collected trace and some neighbouring unexecuted branches into symbolic constraints.Solving the encoded symbolic constraints identifies bugs as well as their triggering inputs.It effectively addresses the problem that existing analyses are difficult to tackle the chal-lenges introduced by many dynamic features and a large number of external function calls in Python.We implemented a prototype and evaluated our method on 11 high popular Python projects.Our evaluation shows that our technique is able to detect 46 bugs with 16 unreported before.All of them are true positives.(3)For the bug localization of a single execution,we propose a probability-inference based bug localization technique.This method combines the human knowl-edge,human-like reasoning rules and program semantics by probabilistic constraints,and models the entire debugging as continuous probabilistic inference processes.It greatly reduces the number of manual feedbacks required by existing single trace-based automated bug localization methods.We implemented a prototype and evaluated our method by a set of real-world bugs.Our results show that the technique is highly effec-tive.It can precisely identify root causes for a set of real-world bugs in only averaged 3 to 5 interactions with developers,much smaller than a recent proposal that does not encode human intelligence.Our user study also confirms that it substantially improves human productivity.We believe that the approaches presented in this thesis will help to improve the automation of the detection and localization of Python program defects and provide effective ways to ensure the correctness and reliability of Python programs.
Keywords/Search Tags:Python Program Analysis, Type Inference, Bug Detection, Fault Localization, Symbolic Analysis, Probabilistic Inference
PDF Full Text Request
Related items