Font Size: a A A

Correlation Analysis Based Cross-project Software Defect Prediction

Posted on:2019-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:F P LouFull Text:PDF
GTID:2428330566999365Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Software defect prediction(SDP)is a process that unknown label modules can be predicted by known modules.However,there are not enough historical defect data for new projects to build predictive models.Therefore,many scholars have proposed a model for cross-project software defect prediction(CPDP).Sample characteristics are same for CPDP,but the distribution of samples is different,which leads to the poor performance for defect prediction.Therefore,this paper proposes a series of methods to improve the performance of CPDP.Firstly,a CPDP method called locality preserving canonical correlation analysis(LP-CCA)is proposed.LP-CCA first employs KNN algorithm to learn a neighbor matrix for train samples and test samples respectively,then two new datasets can be learned by canonical correlation analysis approach.Combining with the neighbor matrix,the neighbor relationship is maintained between each sample of the new dataset.Secondly,a CPDP method based on the orthogonality constraint local canonical correlation analysis(OCLP-CCA)is proposed.The obtained projection matrix is usually non-orthogonal in LP-CCA.So,we introduce orthogonal constraints for LP-CCA,which ensures samples in subspace are orthogonal to each other and further eliminates sample redundancy.Finally,SDP dataset is usually class-imbalanced.So,a new clustering algorithm based on under-sampling technique(CU)is proposed.Two different strategies for the class-imbalanced problem are proposed.The dataset we used is processed by above two methods,and then the CU method is employed to get a balanced dataset.This paper's experiment is conducted on three published datasets,including NASA,AEEEM and ReLink.Experimental results are compared with some existing feature extraction and CPDP algorithms results which fully proved the effectiveness of the proposed method.
Keywords/Search Tags:software defect prediction, locality preserving, canonical correlation analysis, class-imbalance, cluster, under-sampling
PDF Full Text Request
Related items