| With the development of smart tools,software updates are more and more iterative.Finding software defects and providing solutions in advance can effectively reduce a lot of labor costs and time costs.The model of software defects prediction which based on machine learning can predict and find software defects quickly,and help testers rationalize resources and test defective modules preferentially,which is an efficient software defect prediction method that can reduce losses and ensure the quality of software.However,there are usually two problems in software defect prediction technology based to machine learning: the characteristics of imbalanced data and cost sensitive.In real life,software defects only exist in a small number of software modules,and the number of defective modules is much smaller than the number of normal modules.Therefore,the issue of software defects belongs to the problem of imbalanced data.The problem of imbalanced data often affects the accuracy of classification in traditional machine learning.The second problem is cost sensitive.In the traditional classification learning algorithm,it is assumed that the different types of errors generated by the classifier will result in the same cost.However,in practical applications,the different types of errors generated by the classifier will result in different costs,for instant,the cost of identifying a defective software module as a flawless software module is far greater than the cost of identifying a flawless software module as a defective software.The former wastes only labor,material resources,and time to test non-defective modules that are misclassified as defective modules,but the latter can directly lead to software errors or even software paralysis.This article has made the following work based on the characteristics of software defect datasets.1.We use IMMFIA(with low time complexity and space complexity)to get frequent itemsets,and generate association rules that satisfy the confidence and support thresholds,and prioritize small classes(defective software modules)based on relevance and new rules,getting classifier.For issue of mismatching rule(for test cases that cannot be satisfied by the rules in the classifier)and matching spills rule(for the case where there are multiple rules to satisfy the test case),we use EDSVM to classify.Experiments show that compared with the current software defect prediction method,FREDAVM has high precision.2.Combining cost sensitive,we construct a new loss function and propose a software defect prediction model CostXGBoost algorithm based on XGBoost.Compared with the related software defect prediction model on the NASA datasets,the results show that the CostXGBoost has higher precision and recall than the traditional software defect prediction model. |