Aggregation in Statistical Models of Software Quality

Posted on:2013-02-10

Degree:Ph.D

Type:Dissertation

University:University of California, Davis

Candidate:Posnett, Daryl P

Full Text:PDF

GTID:1458390008474738

Subject:Computer Science

Abstract/Summary:

The hierarchical decomposition of software, e.g., into modules, packages and files, has a profound influence on evolvability, maintainability, and work assignment; consequently, it is of central concern for researchers. However, it also poses a quandary with respect to the measures we can gather at each level of decomposition: How does the level of study affect the truth, meaning, and relevance of the findings?;Aggregation is simply the process by which values of a variable are combined to yield a new value that is then used in a modeling context. If we aggregate files to packages, modules, or even entire systems, then we may increase the variation in our outcome of interest while simultaneously decreasing our sample size. Although this may result in data that is more easily modeled, does it, in fact, answer the same question that we originally asked? Aggregation can also be viewed at a more fine grained level. Some measures, e.g., file size, can be easily captured directly, while others, e.g., developer focus, are impractical, if not impossible, to measure directly and require identification of a suitable proxy. Such complex constructs often require the composition or aggregation of existing measures across different entities. Choosing appropriate aggregated measures is important to software engineering researchers, as such measures risk losing explanatory power if highly correlated with other measures of interest.;In this work we present a study of aggregation in software quality models with respect to reliability, flexibility, and readability. We investigate the general question of aggregation of a dataset to a different hierarchical level and present a framework for understanding issues and concerns of aggregation in software quality models. We present a tool and associated methods to more easily and accurately measure object oriented flexibility as inferred through the use of Design Patterns. We use this tool to aggregate design pattern instances and study the change proneness of design patterns. Using a disaggregated measure of developer contributions, we challenge existing perceptions of the role of new features and code improvements on defects. We observe that information entropy can be viewed as a general aggregation method allowing us to leverage results across several disciplines. With this in mind, we present a model of readability, based on entropy and established software science, that is more parsimonious than previous results and drives a more theoretical understanding of the construct of source code readability. We also present two theoretically driven entropy based measures of developer focus. Finally, we use a measure of association based on mutual information to yield new insights that other measures of association are unable to reveal.

Keywords/Search Tags:

Software, Aggregation, Measures, Models

Related items

1	Macroscopical Quantity Balance Of TCP Packets
2	Network state models for analysis and aggregation in large-scale quality of service -aware multiclass networks
3	Impact of time-varying and time-invariant measures of adherence to secondary prevention therapies post-acute myocardial infarction: An application of marginal structural models (MSMs)
4	Design And Implementation Of OPC UA Multi-Server Aggregation Software
5	Experimental evaluation of textual and information theoretic measures of software development
6	Influence analysis of some complicated latent variable models
7	Research Of Networks Routing Protocols Based On Dynamic Aggregation Tree Models
8	Research Of Aggregation Design Tool Based On Dameng OLAP
9	Decomposition and aggregation of multimachine power system models
10	On the nature of relationships between measures and reliability