An empirical study of the impact of experimental settings on defect classification model

Posted on:2018-03-05

Degree:M.S

Type:Thesis

University:Queen's University (Canada)

Candidate:Ghotra, Baljinder

Full Text:PDF

GTID:2478390020457712

Subject:Computer Science

Abstract/Summary:

Software quality plays a vital role in the success of a software project. The probability of having defective modules in large software systems remains high. A disproportionate amount of the cost of developing software is spent on maintenance. The maintenance of large and complex software systems is a big challenge for the software industry. Fixing defects is a central software maintenance activity to continuously improve software quality. Software Quality Assurance (SQA) teams are dedicated to this task (e.g., software testing and code review) of defect detection during the software development process. Since testing or review an entire software system are time and resource-intensive. Knowing which software modules are likely to be defect-prone before a system has been deployed help in effectively allocating SQA effort.;Defect classification models help SQA teams to identify defect-prone modules in a software system before it is released to users. Defect classification models can be divided into two categories: (1) classification models that classify a software module is defective or not defective; and (2) regression models that count the number of defects in a software module. Our work is focused on training defect classification models such classification models are trained using software metrics (e.g., size and complexity metrics) to predict whether software modules will be defective or not in the future. However, defect classification models may yield different results when the experimental settings (e.g., choice of classification technique, features, dataset preprocessing) are changed.;In this thesis, we investigate the impact of different experimental settings on the performance of defect classification models. More specifically, we study the impact of three experimental settings (i.e., choice of classification technique, dataset processing using feature selection techniques, and applying meta-learners to the classification techniques) on the performance of defect classification models through analysis of software systems from both proprietary and open-source domains. Our study results show that: (1) the choice of classification technique has an impact on the performance of defect classification models --- recommending that software engineering researchers experiment with the various available techniques instead of relying on specific techniques, assuming that other techniques are not likely to lead to statistically significant improvements in their reported results; (2) applying feature selection techniques do have a significant impact on the performance of defect classification models --- a Correlation-based filter-subset feature selection technique with a BestFirst search method outperforms other feature selection techniques across the studied datasets and across the studied classification techniques. Hence, recommending the application of such a feature selection technique when training defect classification models; and (3)meta-learners help in improving the performance of defect classification models, however, future studies employ concrete meta-learners (e.g., Random Forest), which train classifiers that perform statistically similar to classifiers that are trained using abstract meta-learners (e.g., Bagging and Boosting); however, produce a less complex model.

Keywords/Search Tags:

Defect, Software, Experimental settings, Impact, Feature selection techniques, Meta-learners, Modules

Related items

1	Research And Application Of Feature Selection For Software Defect Data
2	Research On Software Defect Prediction Method Based On Feature Selection
3	Research On Application Technology Of Feature Selection In Software Defect Prediction
4	Research On Software Defect Prediction Method Based On Feature Selection
5	Cross-project Software Defect Prediction Based On Feature Transfer
6	Research On Software Defect Prediction For Evolving Projects
7	Research On Software Defect Prediction Model Based On Multi-layer Feature Selection
8	Research On Software Defect Prediction Method Based On Fusion Feature Selection And Ensemble Learning
9	A Method Of Feature Selection Based On Extended Bayesian Information Criteria In Software Defect Prediction
10	Cost-Sensitive Feature Selection Algorithms With Application In Software Defect Prediction