Font Size: a A A

Data Driven Attribute Construction For Mining Software Repositories

Posted on:2016-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiFull Text:PDF
GTID:2348330461478564Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Mining Software Repositories (MSR) is an important field in software engineering in recent years. In MSR, software tasks are usually transformed into data mining problems for solving. Domain-specific attributes heavily impact the solving of software tasks, since they are the key part to connect software tasks and data mining algorithms. However, no systematic investigation has been conducted on the issue of constructing attributes for specific software tasks.In this study, we summarize attribute construction approach in MSR with a simple survey. Based on the results of the survey, we propose the Data Driven Attribute Construction (DDAC) approach for MSR. It is a new attribute construction approach with the help of several volunteers. For a given software task, DDAC extracts a set of software data (e.g., source code, bug reports, etc.) and employs some volunteers to manually accomplish this software task according to the software data. During the process, these volunteers are requested to submit their reasons under consideration. From these submitted reasons, researchers can construct domain specific attributes for software tasks. The experimental results on the typical MSR task of bug report summarization demonstrate that DDAC may find effective features and achieve better predictive results against the state-of-the-art algorithm in the literature. Meanwhile we also find some interesting conclusions. First, the number of volunteers has a positive influence on this approach. Along with the incensement of the volunteers, DDAC may assist researches in constructing attributes in various aspects. Second, domain knowledge is not the only criterion for recruiting volunteers, and a large amount of junior volunteers may reduce the problem of lacking of senior volunteers. In the last of this paper, we analyze the threats to validity from internal validity and external validity, and propose the solutions to each threats.
Keywords/Search Tags:Mining Software Repositories, Data Driven Approach, AttributeConstruction, Bug Report Summarization
PDF Full Text Request
Related items