| With the development of information science,a large number of intelligent software systems have appeared in various fields of society.Intelligent software has dramatically improved people’s daily lives,but bugs in the software may also have a negative impact on users,including poor user experience,economic losses,and even casualties.In order to deal with these software bugs more efficiently,how identifying potential buggy source code files from the complex software system is one of the core steps.The file-level bug localization task was born for this.This paper mainly analyzes the file-level bug localization task based on bug report,which aims to identify potential buggy source code files in a software system according to the descriptions in the bug reports.The file-level bug localization methods based on bug reports are generally appropriate for scenarios with sufficient bug-fix records;that is,the target project has numerous bug reports that have been located and fixed.In such a scenario,traditional methods are mainly the supervised classification model.However,such supervised classification models generally suffer from two core problems.One is the difference between natural language-based bug reports and programming language-based source files.On the other hand,there is a problem with “bug report-source code” representation learning ability.In addition,the file-level bug localization task also needs to consider the scenario of insufficient bug-fix records.For newly released or immature software projects,there are often a large number of unfixed bug reports and a small number of fixed records.In such a scenario,the performance of traditional supervised models will be greatly degraded.How to efficiently utilize the bug-fix records of mature projects to assist the bug localization task on the target project and how to make full use of unfixed bug reports to reduce the demand for bug-fix records in the training phase of the model are both challenging problems to be solved by researchers.Based on the above discussion,this paper proposes several bug localization models from two scenarios of sufficient bug-fix records and insufficient bug-fix records,and explores the advantages and disadvantages of the proposed models.Specifically,the main contributions of this paper are as follows,1)In response to the difference between natural language and programming language,this paper proposes a deep multimodal-based software bug localization model.The model treats bug reports and source code files as data of different modalities,and uses multimodal representation learning to further project the representations of bug reports and source code files in their respective independent language spaces into a coordination space.In the coordination space,we encourage the representation of related source code files to be as close as possible to the representation of a given bug report by a correlation distance constraint.The proposed method is simple in structure,suitable for large-scale data,and significantly outperforms multiple baseline models on four projects.2)In response to the “bug report-source code” insufficient representation learning ability,this paper proposes a software bug localization model based on bug report decomposition and intermediate representation.Specifically,this paper adopts the bug report decomposition strategy to learn the diversity characteristics of bug reports and designs a graphical intermediate representation with hierarchical structure to efficiently capture the multi-behavior characteristics of source code.Experimental results on multiple projects demonstrate the effectiveness of the model in the bug localization scenario with sufficient bug-fix records.3)In order to assist the bug localization task on the target project with the help of mature projects,this paper proposes a cross-project software bug localization model based on adversarial transfer learning.In addition,for the purpose of reducing the strict requirements for the source project in the process of knowledge transfer,the proposed model leverages adversarial learning to extract only public features applicable to target project from source project,i.e.,filter out the private features of source project that might harm the target project.Meanwhile,the model learns bug reports of source and target projects in a shared manner to learn the semantic features of natural language.Experiments on multiple source-target projects demonstrate the effectiveness of the proposed model.4)In order to utilize a large quantity of unfixed bug reports in target projects,this paper proposes a semi-supervised bug localization model based on generative adversarial networks.The core idea of this model is to employ a generative adversarial network to capture the potential distribution of correlations between bug reports and their corresponding source files,and to generate more simulated samples from unfixed bug reports in a semi-supervised manner.Experiments on multiple selfconstructed datasets demonstrate the superiority of the proposed model. |