Font Size: a A A

Aggregation in Statistical Models of Software Quality

Posted on:2013-02-10Degree:Ph.DType:Dissertation
University:University of California, DavisCandidate:Posnett, Daryl PFull Text:PDF
GTID:1458390008474738Subject:Computer Science
Abstract/Summary:
The hierarchical decomposition of software, e.g., into modules, packages and files, has a profound influence on evolvability, maintainability, and work assignment; consequently, it is of central concern for researchers. However, it also poses a quandary with respect to the measures we can gather at each level of decomposition: How does the level of study affect the truth, meaning, and relevance of the findings?;Aggregation is simply the process by which values of a variable are combined to yield a new value that is then used in a modeling context. If we aggregate files to packages, modules, or even entire systems, then we may increase the variation in our outcome of interest while simultaneously decreasing our sample size. Although this may result in data that is more easily modeled, does it, in fact, answer the same question that we originally asked? Aggregation can also be viewed at a more fine grained level. Some measures, e.g., file size, can be easily captured directly, while others, e.g., developer focus, are impractical, if not impossible, to measure directly and require identification of a suitable proxy. Such complex constructs often require the composition or aggregation of existing measures across different entities. Choosing appropriate aggregated measures is important to software engineering researchers, as such measures risk losing explanatory power if highly correlated with other measures of interest.;In this work we present a study of aggregation in software quality models with respect to reliability, flexibility, and readability. We investigate the general question of aggregation of a dataset to a different hierarchical level and present a framework for understanding issues and concerns of aggregation in software quality models. We present a tool and associated methods to more easily and accurately measure object oriented flexibility as inferred through the use of Design Patterns. We use this tool to aggregate design pattern instances and study the change proneness of design patterns. Using a disaggregated measure of developer contributions, we challenge existing perceptions of the role of new features and code improvements on defects. We observe that information entropy can be viewed as a general aggregation method allowing us to leverage results across several disciplines. With this in mind, we present a model of readability, based on entropy and established software science, that is more parsimonious than previous results and drives a more theoretical understanding of the construct of source code readability. We also present two theoretically driven entropy based measures of developer focus. Finally, we use a measure of association based on mutual information to yield new insights that other measures of association are unable to reveal.
Keywords/Search Tags:Software, Aggregation, Measures, Models
Related items