Font Size: a A A

Distance Metric Learning Based Software Defect Prediction

Posted on:2017-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2308330488997113Subject:Information security
Abstract/Summary:PDF Full Text Request
With the development of computer technology, how to predict the potential defects in software systems preciously is an important task. Analyzing the historical data from the software development period can help predict the software defects. It can not only improve the quality of developed softwares, but also greatly save the workload of software testing. Recently, researchers try to introduce the machine learning methods into the software defect prediction(SDP). However, these methods usually utilize the traditional Euclidean distance in the training phase, while Euclidean distance cannot well reveal the discriminability among samples.Firstly, this thesis proposes an SDP approach called cluster-specific large margin nearest neighbor(CS-LMNN). CS-LMNN first employs the large margin nearest neighbor(LMNN) algorithm to learn a global discriminant metric. Then, it partitions the training set into K clusters by using the K-means clustering. Each cluster needs to learn a cluster-specific discriminant metric. In the prediction phase, the testing sample finds its cluster using the global discriminant metric and then is labeled as defective or defective-free using the cluster-specific discriminant metric.Secondly, to solve the noise probem in training set, this thesis further proposes an SDP approach called cluster-specific local sparse reconstruction distance metric learning(CS-LSRML). It incorporates the sparse reconstruction information and the local weights, and designs the inter-class and intra-class local sparse reconstruction terms. CS-LSRML can not only learn discriminant metrics, but also be robust to noise problem.Lastly, to solve the class-imbalance problem in SDP, this thesis further proposes two SDP approaches called over-sampling CS-LSRML(OCS-LSRML) and under-sampling CS-LSRML(UCS-LSRML). These two approaches first utilize two sampling techniques, i.e., over-sampling and under-sampling, to transform the original training set into a relatively imbalance training set. Then, the above CS-LSRML approach is employed to learn the global and cluster-specific discriminant matrices.This thesis conducts experiments on the NASA, AEEEM, and Re Link projects. The experimental results demonstrate that the proposed approaches are effective for SDP. And they can outperform related methods to some extent.
Keywords/Search Tags:Software defect prediction, distance metric learning, sparse representation, cluster-specific, sampling, class-imbalance learning
PDF Full Text Request
Related items