Font Size: a A A

Research On Defect Code Localization Based On Software Bug Report

Posted on:2021-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:G L LiuFull Text:PDF
GTID:1368330614959936Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In software development,developers often receive bug reports that describe the abnormal behavior of software products.When receiving a new bug report,developers need to reproduce the defect described in the bug report and perform code review to find the cause of defect,which can be tedious and time-consuming.In order to reduce the workload of developers to find software defects,this dissertation proposes some methods for locating defect codes.These methods can rank the source code files corresponding to given bug reports,thereby helping to find the buggy source code files.In this dissertation,two types of methods are used to study defect code localization.One type of method is based on vector space model and word embedding for defect code localization,which can be applied to a small number of datasets.Another type of method is a method based on deep learning,which is generally used for large-scale data sets.This type of method uses convolutional neural networks to extract the surface lexical and semantic correlation features of bug reports and source code files for defect code localization.And use multi-scale convolutional neural networks to extract the semantic and structural features of bug reports and source code files for defect code localization.The main research work of the dissertation includes:(1)The dissertation describes the research background,research motivation and research status of defect code localization,introduces the relative technologies of defect code localization in software engineering,including the structure of bug reports,the basic idea of defect code localization,relative models and algorithms,and explains the relative technologies for text retrieval and natural language processing and their application in defect code localization.At the same time,an abstract syntax tree is introduced to describe the structure of source code,and relative indicators are introduced to evaluate the model of defect code localization.(2)Based on vector space model and word embedding technology,defect localization method combining surface lexical and semantic similarity is studied.First,the importance of different parts of speech is different.In the dissertation part-of-speech tagger is used to mark the summary part of bug report.Due to the importance of nouns,the corresponding weight is increased when calculating similarity.Secondly,different codes in source code file have different importance,so the abstract syntax tree is used in source code file to extract important words to increase their corresponding weights to calculate the similarity,and the method and class name of source code is extracted in the dissertation.Based on this,surface lexical similarity is calculated based on vector space model and semantic similarity is calculated based on word embedding.And combine the surface lexical and semantic similarity for defect code localization.Finally,by comparing with other methods,it is verified that this method can improves the performance of defect code localization.(3)The method of extracting the correlation features between bug reports and source code files using convolutional neural networks for defect localization is studied.First,the surface lexical correlation features between bug reports and source code files are extracted.Second,the semantic correlation features based on word embedding and sentence embedding between bug reports and source code files are extracted.Then,the joint features obtained by the surface lexical and semantic correlation feature are used to locate defect codes.In addition,each bug report is only related to a small number of buggy source code files,which causes the data imbalance.The dissertation uses Focal loss function to solve the problem of data imbalance.The experimental results show that the model considering data imbalance has better defect code localization performance.(4)The method of extracting semantic features and structural features of bug reports and source code files using multi-scale convolutional neural networks for defect localization is studied.First,the semantic features of bug reports and source code files are extracted through multi-scale convolutional neural networks based on natural language-based word embedding vectors.Second,source code files are converted into abstract syntax trees,and Word2 Vec is used to obtain code vectors with source code structural features.Based on the code vectors,the structural features of source code files are extracted through multi-scale convolutional neural network.Then,the semantic and structural features of bug reports and source code files are leveraged to learn unified features for locating defect codes.The experimental results show that the model considering the structural features is helpful to improve the defect code localization performance.
Keywords/Search Tags:Bug report, defect code localization, vector space model, word embedding, convolutional neural network, abstract syntax tree, code vector
PDF Full Text Request
Related items