Font Size: a A A

Improved methods for mining software repositories to detect evolutionary couplings

Posted on:2015-10-17Degree:Ph.DType:Dissertation
University:Kent State UniversityCandidate:Alali, AbdulkareemFull Text:PDF
GTID:1478390017993456Subject:Computer Science
Abstract/Summary:
The dissertation investigates techniques to improve the results of mining large software repositories for evolutionary couplings. Evolutionary couplings are dependencies between software artifacts that impact how systems are maintained and evolved. They appear as co-changing artifacts in the version history of software systems. Detection of evolutionary couplings has traditionally been done using data mining techniques (e.g., frequent pattern mining). The focus of this research is twofold.;First, develop new techniques for detecting evolutionary couplings by employing orthogonal information derived using program analysis techniques (e.g., metrics). The goal is to reduce the number of false positives to improve the quality of detection. These new hybrid approaches include the use of statically derived metrics, repository meta-data (i.e., age and distance), and different transaction size. Of the four approaches examined, three produce fewer false positives and higher quality patterns over using the traditional approach to compute evolutionary coupling. The best approach had a precision of 90% in some cases. Slight improvements to recall were also observed. Distance appears to have no clear effect on filtering out false patterns.;Second, an empirical investigation of the impact that the parameters of the data mining techniques have on the detection of evolutionary couplings is undertaken. This provides fundamental evidence for the selection of data mining parameters in the context of software repositories. The parameters studied include a comparison of different transaction sizes. Additionally, a regression prediction model on minimum support, confidence, duration, and training size was constructed to uncover the effects on the generated patterns and association rules. Different transaction sizes have an effect on the quality of the generated association rules. It was found that larger time windows have better prediction accuracies and completeness. The week time window gave the best results and the time window of an individual commit gave the worst results. Additionally, a large-scale study of frequent pattern mining parameters showed that the regression models are able to accurately predicate outcomes. The confidence parameter is the most dominant on the final outcome results.
Keywords/Search Tags:Evolutionary couplings, Mining, Software repositories, Results, Techniques
Related items