Font Size: a A A

Research On Cross-project Software Defect Prediction Method Based On Active Learning

Posted on:2022-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:W B MiFull Text:PDF
GTID:2518306746981329Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,software has been widely used and has become an integral part of daily life.As the complexity and diversity of software has increased,higher requirements for software quality assurance have been put forward.In software engineering,software defect prediction technology can identify suspicious information in software modules and improve software testing efficiency and ensure software quality by effectively allocating testing resources.However,in practical applications,newly developed projects cannot obtain better performance due to insufficient historical data,so cross-project software defect prediction technology has become a research hotspot in academia and industry.Cross-project software defect prediction is performed by data mining of other software projects,and then constructing software defect prediction models.However,cross-project software defect prediction does not consider the a priori knowledge of the target project,and the cross-project data still has the class imbalance problem,which makes it difficult to match the defect patterns between the source project and the target project.Based on this,this paper proposes an active learning-based cross-project software defect prediction method,which uses active learning techniques to construct the prior knowledge of the target project and solve the class imbalance problem.Specific research efforts include.(1)Considering that the prior knowledge of the target project can make the defect patterns of the source project and the target project match,a data selection algorithm based on active learning is proposed.First,the target items are aggregated into clusters by clustering,and the center of mass of each cluster is manually labeled;then,a priori knowledge is constructed in the target items based on the clustering results of the defect data combined with the active learning method;finally,the source items are selected based on the priori knowledge,and the augmented dataset is formed to construct a software defect prediction model by combining the priori knowledge.Experiments are conducted on the NASA public dataset,and the results show that the data filtering algorithm can fully utilize the prior knowledge of the target project and effectively improve the model performance.(2)Considering the class imbalance problem in cross-project data,an active balancing method based on uncertainty sampling is proposed.First,the contribution ranking of cross-project data is performed by active learning based on the target project prior knowledge;then,the balance weight is adjusted according to the contribution ranking to construct balanced cross-project data;finally,the defect prediction model is constructed based on the balanced cross-project data.Experiments are conducted on the NASA public dataset to compare with the classical imbalance method,and the experimental results show that the algorithm can effectively solve the imbalance problem and improve the model performance.This paper provides a research idea based on active learning for cross-project software defect prediction,and also proposes future research directions for cross-project defect prediction models based on active learning.
Keywords/Search Tags:Cross-Project Software Defect Prediction, Active Learning, Transfer Learning, Class Imbalance
PDF Full Text Request
Related items