Improving Software Productivity and Quality via Mining Source Code

Posted on:2012-01-04

Degree:Ph.D

Type:Dissertation

University:North Carolina State University

Candidate:Thummalapenta, Suresh

Full Text:PDF

GTID:1458390011457262

Subject:Computer Science

Abstract/Summary:

The major goal of software development is to deliver high-quality software efficiently. To achieve this goal of delivering high-quality software efficiently, programmers often reuse existing frameworks or libraries, hereby referred to as libraries, instead of developing similar code artifacts from the scratch. However, programmers often face challenges in reusing existing libraries due to two major factors. First, many existing libraries are not well-documented. Even when such documentations exist, they are often outdated. Second, many existing libraries expose a large number of application programming interfaces (APIs), which represent interfaces through which libraries expose their functionalities. For example, the .NET base library provides nearly 10,000 API classes. Due to these two preceding factors, there exist three major problems that affect both software productivity and quality. First, programmers often spend more time in reusing existing libraries, thereby reducing software productivity. Second, programmers introduce defects while using APIs due to lack of proper knowledge on how to reuse those APIs. Third, existing white-box test generation techniques face challenges in effectively generating test inputs for the client code that reuses libraries.;To address these three preceding issues, in this dissertation, we propose a general framework, called WebMiner, that uses existing open source code available on the web by leveraging a code search engine. In particular, WebMiner infers usage specifications for API methods under analysis by automatically collecting relevant code examples from the open source code available on the web. WebMiner next applies data mining techniques on those collected code examples to identify common patterns, which represent likely usage of APIs, referred to as API usage specifications. The primary reason for identifying common patterns is based on the observation that majority of the programmers correctly adhere to API usage specifications and those common patterns are likely to represent the correct usage of APIs.;We further propose six approaches based on our general framework, where each approach focuses on a specific software engineering (SE) task such as detecting defects in an application under analysis. In particular, the first two approaches assist programmers in effectively reusing APIs provided by existing libraries. The next two approaches use mined API usage specifications as programming rules and detect defects in applications under analysis as deviations from the mined specifications. Finally, the last two approaches mine static and dynamic traces, respectively, for effectively generating test inputs that achieve high structural coverage of the code under test. We also propose another approach that addresses a major issue with mining-based approaches, which are not effective in scenarios where usage information is not available for the API methods under analysis or usage information is not sufficient to achieve the SE task under analysis.;Our empirical results show that the approaches developed based on our WebMiner framework effectively address the respective SE tasks handled by those approaches. In particular, our empirical results demonstrate the effectiveness of expanding the data scope of mining-based approaches to large open source code available on the web. Our results also show that our approaches address queries posted in developer forums and detect new defects that are not detected by existing related approaches, thereby improving both software productivity and quality.

Keywords/Search Tags:

Software, Code, API usage specifications, Approaches, Existing, Major, Defects

Related items

1	Design-time software quality modeling and analysis of distributed software-intensive systems
2	Automatic transformation of high-level logic specifications into high-performance target code
3	Object file program recombination of existing software programs using genetic algorithms
4	Research On Detection Of Redundancies And Related Software Defects Of C Programs
5	Mining API specifications from source code for improving software reliability
6	Research On Location Of Software Defects Based On Frequent API Usage Pattern Mining
7	Technology Research And Application Of Software Defect Prediction Based On Bayesian Networks
8	Dynamic Code Instrumentation And Its Application In Windows
9	A framework for merging Object-Oriented formal specifications
10	A framework for verification of SDL software specifications