Distance Metric Learning Based Software Defect Prediction

Posted on:2017-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:2308330488997113

Subject:Information security

Abstract/Summary:

PDF Full Text Request

With the development of computer technology, how to predict the potential defects in software systems preciously is an important task. Analyzing the historical data from the software development period can help predict the software defects. It can not only improve the quality of developed softwares, but also greatly save the workload of software testing. Recently, researchers try to introduce the machine learning methods into the software defect prediction(SDP). However, these methods usually utilize the traditional Euclidean distance in the training phase, while Euclidean distance cannot well reveal the discriminability among samples.Firstly, this thesis proposes an SDP approach called cluster-specific large margin nearest neighbor(CS-LMNN). CS-LMNN first employs the large margin nearest neighbor(LMNN) algorithm to learn a global discriminant metric. Then, it partitions the training set into K clusters by using the K-means clustering. Each cluster needs to learn a cluster-specific discriminant metric. In the prediction phase, the testing sample finds its cluster using the global discriminant metric and then is labeled as defective or defective-free using the cluster-specific discriminant metric.Secondly, to solve the noise probem in training set, this thesis further proposes an SDP approach called cluster-specific local sparse reconstruction distance metric learning(CS-LSRML). It incorporates the sparse reconstruction information and the local weights, and designs the inter-class and intra-class local sparse reconstruction terms. CS-LSRML can not only learn discriminant metrics, but also be robust to noise problem.Lastly, to solve the class-imbalance problem in SDP, this thesis further proposes two SDP approaches called over-sampling CS-LSRML(OCS-LSRML) and under-sampling CS-LSRML(UCS-LSRML). These two approaches first utilize two sampling techniques, i.e., over-sampling and under-sampling, to transform the original training set into a relatively imbalance training set. Then, the above CS-LSRML approach is employed to learn the global and cluster-specific discriminant matrices.This thesis conducts experiments on the NASA, AEEEM, and Re Link projects. The experimental results demonstrate that the proposed approaches are effective for SDP. And they can outperform related methods to some extent.

Keywords/Search Tags:

Software defect prediction, distance metric learning, sparse representation, cluster-specific, sampling, class-imbalance learning

PDF Full Text Request

Related items

1	Research On Class Imbalance Problem In Distance Metric Learning
2	Research And Implementation Of Software Defect Prediction Model Construction And Sharing Methods
3	Research On Software Defect Prediction Based On Learning Mechanism
4	Researches And Applies On Software Defect Prediction Method Based On Ensemble Learning
5	Research On Heterogeneous Software Defect Prediction Based On Transfer Learning
6	Correlation Analysis Based Cross-project Software Defect Prediction
7	Research On Software Defect Prediction Model Based On Active Ensemble Learning
8	Research On Software Defect Prediction Based On Extreme Learning Machine
9	Research On Software Defect Prediction Based On Active Learning
10	Wide Research Of Data Mining With Machine Learning On Software Defect Prediction