Font Size: a A A

Research On The Methods Of Cross Project Software Defect Prediction Based On Learning To Rank Approach

Posted on:2021-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2518306194475854Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,cross-project defect prediction(CPDP)has attracted increasing attention in the field of software defect prediction(SDP).Most of the previous studies treated it as a binary classification problem or a regression problem.Actually,in the scenario of defect prediction,developers do not need the specific prediction results of each software entity in most cases.They only need automatic defect prediction tools to find entities that may have high risks and fix them.Inspired by the classic learning-to-rank algorithm in the recommendation system,we designed a top-k learning-to-rank framework for cross-project software defect prediction.This framework can match the dataset of defect prediction and perform preprocessing,such as label transformation and resampling.Finally,it can select an appropriate ranking algorithm,apply optimization methods to the ranking list,and get better results.To verify the validity of this framework,we design experiments from four aspects:(1)For the problem of data structure mismatch,we first set up a single query scenario of“Is the given software entity severe?”.Aiming at the problem,we provide statistics on the number of defects in the dataset.The folded-Gaussian distribution was used to fit the defect distribution in the dataset.Then,we apply the 3? rule to divide the dataset with different correlation.(2)For the data imbalance problem,we design a hybrid sampling method called SMOTEPENN(Partial Edit Nearest Neighbors).The experiment compares six classic resampling methods,such as Random Over-Sampling.The algorithm proposed in the paper can achieve the highest score of every evaluation metric.Moreover,this paper discusses seven classic distance functions required for resampling to calculate the distance.The experiment shows that the standardized Euclidean distance has the best measurement effect in the software defect prediction scenario.(3)For the selection of learning-to-rank methods,this paper selects five classic algorithms,such as Rank Net,and evaluates the performance of each approach regarding the NDCG evaluation metric.Experiments show that List Net and Rank Net perform better among the five methods under our framework.Furthermore,we discuss the change of NDCG k value in the ranking scenario.(4)For the optimization issue of ranking results,we design an elite learning-to-rank model independent of the rough ranking scenario,which improve the ranking results.Experiments show that this method can significantly improve the performance of the model when the number of recommended samples is large enough.In the future,we will expand the data set to verify the model effect.Next,we will select more ranking learning algorithms or design new learning-to-rank algorithms and formulate multiple query scenarios to improve the robustness of the model.Furthermore,We will use the methods of parameter searching to improve the effectiveness of the model.
Keywords/Search Tags:Defect Prediction, Relevance, Resampling, Learning To Rank, Elite reranking approach
PDF Full Text Request
Related items