Revisiting the experimental design choices for approaches for the automated retrieval of duplicate issue report

Posted on:2018-09-14

Degree:Ph.D

Type:Thesis

University:Queen's University (Canada)

Candidate:Rakha, Mohamed Sami

Full Text:PDF

GTID:2448390002498141

Subject:Computer Science

Abstract/Summary:

Issue tracking systems, such as Bugzilla, are commonly used to track reported bugs and change requests. Duplicate reports have been considered as a hindrance to developers and a drain on their resources. To avoid wasting developer resources on previously-reported (i.e., duplicate) issues, it is necessary to identify such duplicates as soon as they are reported. In recent years, several approaches have been proposed for the automated retrieval of duplicate reports. These approaches leverage the textual, categorical, and contextual information in previously reported issues to determine whether a newly-reported issue has been previously-reported. In general, studies that are designed to evaluate these approaches treat all the duplicate issue reports equally, make use of data chunks that span a relatively short period of time, and ignore the impact of newly-activated features (e.g., just-in-time lightweight retrieval of duplicates at filing time) in the recent issue tracking systems.;This thesis revisits the experimental design choices of such prior studies along three perspectives: 1) Used performance measures, 2) Evaluation process, and 3) Experiment's data choice. For the performance measures, we highlight the need for effort-aware evaluation of such approaches, since the identification of a considerable amount of duplicate reports (over 50%) appears to be a relatively trivial task.;For the evaluation process, we show that the previously-reported performance of such approaches is significantly overestimated.;Finally, recent versions of ITSs perform just-in-time lightweight retrieval of duplicate issue reports at the filing time of an issue report. The aim of such just-in-time retrieval is to avoid the filing of duplicates. We show that future studies of the automated retrieval of duplicate reports have to focus on after-JIT duplicates, as these duplicates are more representative of issue reports in practice nowadays.;Our results through this thesis highlight the current state of progress in the automated retrieval of duplicate reports while charting directions for future research efforts.

Keywords/Search Tags:

Duplicate, Automated retrieval, Issue, Approaches

Related items

1	Research On The Effectiveness Of Duplicate Bug Report Detection Based On Deep Learning
2	Research Of Automated Duplicate Bug Report Detection
3	Research And Implement Of Near-Duplicate Video Retrieval Based On Toeplitz PLS
4	Research On Near-duplicate Document Image Retrieval Based On Deep Learning
5	Research On Partial-duplicate Image Retrieval Algorithms Based On The Multi-contextual Clues
6	Research Of Near-Duplicate Video Retrieval Based On Hash Learning
7	Near Duplicate Image Retrieval Based On Geometric Information
8	Based Key Frame Near-duplicate Video Retrieval
9	Research Of Improved Near-Duplicate Removal
10	Research On Near-duplicate Video Detection Based On Correlation Analysis