Font Size: a A A

Based On Ensemble Sampling And Data Imbalance Self-adaptive Processing Method In Defect Prediction Context

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhangFull Text:PDF
GTID:2518306197456624Subject:Domain software engineering
Abstract/Summary:PDF Full Text Request
During the software development process,software defects will occur.The later defects are discovered,the higher the cost of fixing them.In order to find software defects,software engineers usually requires a huge amount of work and takes up a lot of project time.In order to determine the order of the code to be tested,allocate resources,and find the defect code quickly and accurately,we need software defect prediction.After clarifying the requirements of test,we need to collect data and use them to describe the software.We can obtain some software quantitative data by using software metrics.Due to the characteristics of software development and the production management,these data show a class imbalance.In addition,in the context of the longterm development of software measurement,more angles and more dimensions of software Metrics come out.This requires both efforts to eliminate the impact of imbalanced data on modeling prediction results in software defect prediction.It also needs to consider the use of feature selection to improve modeling prediction accuracy.Based on the previous work,this paper constructs Gen-Stacking to eliminate the imbalanced data and feature selection.And a detailed analysis of design ideas and workflow of Gen-Stacking.In order to verify this method,this paper proposes three research questions,whether Gen-Stacking is effective;whether Gen-Stacking has advantages over the direct modeling;and whether Gen-Stacking has advantages compared to other methods(sampling).From the experimental results,first of all,Gen-Stacking itself can successfully complete the target task at the beginning of the design.Gen-Stacking can eliminate class imbalance and feature selection and model predictions;secondly,in order to fairly compare Gen-Stacking and the directly model,We have constructed the boost of basic classification.In this case,the effect of Gen-Stacking is stronger than the effect of direct modeling.Finally,by comparing with under-sampling,over-sampling,and SMOTE,GenStacking can perform better on data sets that other methods are not good at,and can perform better on data sets that other methods are not good at.The Gen-Stacking has obvious advantages.
Keywords/Search Tags:Gen-Stacking, class imbalance data, feature selection
PDF Full Text Request
Related items