Improving predictive models of software quality using search-based metric selection and decision trees

Posted on:2011-11-26

Degree:Ph.D

Type:Dissertation

University:University of Manitoba (Canada)

Candidate:Vivanco, Rodrigo

Full Text:PDF

GTID:1448390002968286

Subject:Computer Science

Abstract/Summary:

Software engineering is a human centric endeavour where the majority of the effort is spent understanding and modifying source code. The ability to automatically identify potentially problematic components would assist developers and project managers to make the best use of limited resources when taking mitigating actions such as detailed code inspections, more exhaustive testing, refactoring or reassignment to more experienced developers. Predictive models can be used to discover poor quality components via structural information from the design and/or source code.;In machine learning, large dimensional feature spaces may contain inputs that are irrelevant or redundant. Feature selection is the process of identifying a subset of features that improve a classifier's discriminatory performance. In analysis of software system, the features used are source code metrics. In this work, an analysis tool has been developed that implements a parallel genetic algorithm (GA) as a search-based metric selection strategy. A comparative study has been carried out between GA, the Chidamber and Kemerer metrics suite (for an objected-oriented dataset), and principal component analysis (PCA) as metric selection strategies with different datasets.;Program comprehension is important for programmers and the first dataset evaluated uses source code inspections as a subjective measure of cognitively complexity that degrade program understanding. Predicting the likely location of system failures is important in order to improve a system's reliability. The second dataset uses an objective measure of faults found in system modules in order to predict fault-prone components.;The aim of this research has been to advance the current state of the art in predictive models of software quality by exploring the efficacy of a search-based approach in selecting appropriate metrics subsets for various predictive objectives. Results show that a search-based strategy, such as GA, performs well as a metric selection strategy when used with a linear discriminant analysis classifier. When predicting cognitive complex classes, GA achieved an F-value of 0.845 compared to an F-value of 0.740 using principal component analysis, and 0.750 when using only the CK metrics suite.;There exist many traditional source code metrics to capture the size, algorithmic complexity, cohesion and coupling of modules. Object-oriented systems have introduced additional structural concepts such as encapsulation and inheritance, providing even more ways to capture and measure different aspects of coupling, cohesion, complexity and size. An important question to answer is: Which metrics should be used with a model for a particular predictive objective?;By examining the GA chosen metrics with a white box predictive model (decision tree classifier) additional insights into the structural properties of a system that degrade product quality were observed. Source code metrics have been designed for human understanding and program comprehension and predictive models for cognitive complexity perform well with just source code metrics. Models for fault prone modules do not perform as well when using only source code metrics and need additional non-source code information, such module modification history or testing history.

Keywords/Search Tags:

Source code, Predictive models, Metric selection, Using, Software, Search-based, Quality

Related items

1	Research On Relationship Between Code Quality And Software Defects For Open Source Software
2	Change-History-based Automatically Fixing Of Code Internal Quality Issues
3	Research And Implementation Of Source Code Based Software Maintainable Measurement System
4	Research And Implementation Of The Source Code Structure Quality Assessment Subsystem
5	The Application In Hitachi Project For Code Complexity Metric And Quality Control
6	Research And Application Of The Quality Messurment Based On Software Testing
7	Research On Code Segment Search Method For Open Source Ecology
8	Support vector parameter selection using experimental design based generating set search (SVEG) with application to predictive software data modeling
9	Research And Implementation Of Automatic Code Summarization And Retrieval Technology For Open Source Reuse
10	A comparative study of attribute selection techniques for CBR-based software quality classification models