Font Size: a A A

Software defect prediction on unlabeled dataset

Posted on:2016-10-19Degree:Ph.DType:Dissertation
University:Hong Kong University of Science and Technology (Hong Kong)Candidate:Nam, JaechangFull Text:PDF
GTID:1478390017980604Subject:Computer Science
Abstract/Summary:
Defect prediction on new software projects or projects with limited historical data is an interesting problem in software engineering. It is difficult to collect defect information to label a dataset for training a prediction model. This is a known problem, defect prediction on unlabeled datasets. Cross-project defect prediction (CPDP) has tried to address this problem by reusing prediction models built by other projects that have enough historical data. However, CPDP may not always build a strong prediction model because of the different distributions among datasets. In addition, existing approaches for defect prediction using only unlabeled datasets have one major limitation, the necessity for manual effort.;To address the limitations such as the different distributions among datasets and the necessity for manual effort, we propose three techniques that can build prediction models on unlabeled datasets. First, we propose TCA+ that improves the prediction performance of CPDP by adopting transfer component analysis (TCA). TCA+ is an extended TCA to suggest the most appropriate normalization technique before applying TCA for CPDP. Second, we propose heterogeneous defect prediction (HDP) that enables cross-project defect prediction on projects with different metric sets. HDP matches metrics that have similar distributions between datasets used in CPDP. Lastly, we propose CLAMI that enables defect prediction by using unlabeled datasets. The key idea of the CLAMI approach is to label an unlabeled dataset by using the magnitude of metric values.;Our proposed techniques, TCA+, HDP, and CLAMI, address the limitations for defect prediction on unlabeled datasets. However, the three techniques still have challenging issues to be addressed. We also discuss them as future work.
Keywords/Search Tags:Defect prediction, Unlabeled, Software, TCA, CPDP, Projects
Related items