Font Size: a A A

Research On Automatic Software Defects Patches By Exploring Large-scale Open Source Repositories

Posted on:2022-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q FanFull Text:PDF
GTID:2518306521464264Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Software defects restrict the development of the software industry and seriously affect software quality.Exploring efficient and automatic software defect repair methods is currently one of the important topics in the field of software engineering.The development of deep learning technology and the emergence of large-scale open source repositories bases have brought the possibility to improve the traditional defect repair method based on the G&V technique,and it is becoming a hot research direction at present.The current defect repair methods based on deep learning mainly rely on data sets in the open source repositories.Although compared with traditional repair methods based on manually extracting defect repair patterns,they can generate effective patches that can pass a set of matched test cases,but the generated patches usually deviate from the semantics of the original program,and the correctness still needs to be improved.The main reasons are as follows:(1)Because these methods use simple rules to obtain data from open source repositories,due to the diversity of open source repositories,the quality of training data obtained by this single data screening method is low,which affects the learning effect of the model;(2)Existing methods ignore the control flow and data flow information in the concerned program;(3)The existing method uses a single granularity when tokenizing the source code,and abstractly expresses the user-defined identifier in it,which loses part of the program semantic information,thus affecting the effect of the model to generate patches.Because of the above problems,this thesis proposes a defect repair solution based on deep learning technology.The specific research content is summarized as follows:(1)Facing the challenge of unbalanced quality distribution of data in open source repositories,this thesis proposes a data optimization method based on large-scale open source repositories.The data in the open source repositories is purified through the data filtering process,and in order to capture the program context features that are meaningful for defect repair,construct a program slicing method based on program control flow and data flow to preprocess data and improve the quality of data obtained from open source repositories.(2)Propose a program defect repair model using an encoder-decoder structure.Aiming at the problem of single granularity and loss of partial program information in the source code tokenization process of existing methods,this article uses a tokenization method based on subword representation to process the source code,and retain as many user-defined identifiers in the program context as possible.Then,through the encoder-decoder model based on the local attention mechanism,the defect repair mode is learned to realize the automatic generation of patches.Considering the similarity of repair operations for each defect,this thesis constructs a defect pre-classification model based on the structural features of the program abstract syntax tree and selects the most matching patch generation model for the defect to be repaired.(3)Designed and implemented a Github-based Java program defect repair prototype system VulRepair.And in order to verify the defect repair effect of VulRepair,a set of evaluation and comparison experiments are also designed and implemented to evaluate and verify the defect repair model and system proposed in this thesis from multiple angles.The experimental results show that the defect repair method proposed in this thesis is based on the benchmark data set and the data set from the open source repositories.Compared with the existing defect repair methods,VulRepair can generate more effective patches and can repair multiple lines of defects.
Keywords/Search Tags:Automatic program repair, Deep learning, Open source repositories
PDF Full Text Request
Related items