Research On The Methods Of Cross Project Software Defect Prediction Based On Learning To Rank Approach

Posted on:2021-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:F Wang

Full Text:PDF

GTID:2518306194475854

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years,cross-project defect prediction(CPDP)has attracted increasing attention in the field of software defect prediction(SDP).Most of the previous studies treated it as a binary classification problem or a regression problem.Actually,in the scenario of defect prediction,developers do not need the specific prediction results of each software entity in most cases.They only need automatic defect prediction tools to find entities that may have high risks and fix them.Inspired by the classic learning-to-rank algorithm in the recommendation system,we designed a top-k learning-to-rank framework for cross-project software defect prediction.This framework can match the dataset of defect prediction and perform preprocessing,such as label transformation and resampling.Finally,it can select an appropriate ranking algorithm,apply optimization methods to the ranking list,and get better results.To verify the validity of this framework,we design experiments from four aspects:(1)For the problem of data structure mismatch,we first set up a single query scenario of�Is the given software entity severe?�.Aiming at the problem,we provide statistics on the number of defects in the dataset.The folded-Gaussian distribution was used to fit the defect distribution in the dataset.Then,we apply the 3? rule to divide the dataset with different correlation.(2)For the data imbalance problem,we design a hybrid sampling method called SMOTEPENN(Partial Edit Nearest Neighbors).The experiment compares six classic resampling methods,such as Random Over-Sampling.The algorithm proposed in the paper can achieve the highest score of every evaluation metric.Moreover,this paper discusses seven classic distance functions required for resampling to calculate the distance.The experiment shows that the standardized Euclidean distance has the best measurement effect in the software defect prediction scenario.(3)For the selection of learning-to-rank methods,this paper selects five classic algorithms,such as Rank Net,and evaluates the performance of each approach regarding the NDCG evaluation metric.Experiments show that List Net and Rank Net perform better among the five methods under our framework.Furthermore,we discuss the change of NDCG k value in the ranking scenario.(4)For the optimization issue of ranking results,we design an elite learning-to-rank model independent of the rough ranking scenario,which improve the ranking results.Experiments show that this method can significantly improve the performance of the model when the number of recommended samples is large enough.In the future,we will expand the data set to verify the model effect.Next,we will select more ranking learning algorithms or design new learning-to-rank algorithms and formulate multiple query scenarios to improve the robustness of the model.Furthermore,We will use the methods of parameter searching to improve the effectiveness of the model.

Keywords/Search Tags:

Defect Prediction, Relevance, Resampling, Learning To Rank, Elite reranking approach

PDF Full Text Request

Related items

1	Research On Methods Of Ranking-Oriented Software Defect Prediction
2	Research On Just-in-time Software Defect Prediction Method Based On Learning To Rank
3	Research On Automatic Annotation, TAG Processing And Reranking In Image Retrieval
4	Research On Content-based Image Search Reranking
5	Research On Defect Prediction Model Of Internet Of Things System Based On Machine Learning
6	Software Defect Prediction Based On Machine Learning
7	Research And Application Of Music Retrieval On Learning To Rank
8	Research On Some Key Technologies Of Software Defect Prediction
9	The Reranking Method Based On Image Content
10	Research On Software Defect Prediction Based On Machine Learning Algorithm