Based On Ensemble Sampling And Data Imbalance Self-adaptive Processing Method In Defect Prediction Context

Posted on:2021-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Zhang

Full Text:PDF

GTID:2518306197456624

Subject:Domain software engineering

Abstract/Summary:

PDF Full Text Request

During the software development process,software defects will occur.The later defects are discovered,the higher the cost of fixing them.In order to find software defects,software engineers usually requires a huge amount of work and takes up a lot of project time.In order to determine the order of the code to be tested,allocate resources,and find the defect code quickly and accurately,we need software defect prediction.After clarifying the requirements of test,we need to collect data and use them to describe the software.We can obtain some software quantitative data by using software metrics.Due to the characteristics of software development and the production management,these data show a class imbalance.In addition,in the context of the longterm development of software measurement,more angles and more dimensions of software Metrics come out.This requires both efforts to eliminate the impact of imbalanced data on modeling prediction results in software defect prediction.It also needs to consider the use of feature selection to improve modeling prediction accuracy.Based on the previous work,this paper constructs Gen-Stacking to eliminate the imbalanced data and feature selection.And a detailed analysis of design ideas and workflow of Gen-Stacking.In order to verify this method,this paper proposes three research questions,whether Gen-Stacking is effective;whether Gen-Stacking has advantages over the direct modeling;and whether Gen-Stacking has advantages compared to other methods(sampling).From the experimental results,first of all,Gen-Stacking itself can successfully complete the target task at the beginning of the design.Gen-Stacking can eliminate class imbalance and feature selection and model predictions;secondly,in order to fairly compare Gen-Stacking and the directly model,We have constructed the boost of basic classification.In this case,the effect of Gen-Stacking is stronger than the effect of direct modeling.Finally,by comparing with under-sampling,over-sampling,and SMOTE,GenStacking can perform better on data sets that other methods are not good at,and can perform better on data sets that other methods are not good at.The Gen-Stacking has obvious advantages.

Keywords/Search Tags:

Gen-Stacking, class imbalance data, feature selection

PDF Full Text Request

Related items

1	Relationships Between Evaluation Criteria Of Feature Selection And Analysis On Class Imbalance Problem Over Vhr Remote Sensing Imagery
2	Online Streaming Feature Selection Algorithms Of High-dimension And Class-imbalanced Data
3	Studying Class Imbalance Characteristics And Classification Methods On Internet Traffic Flows
4	Combating the class imbalance problem in small sample data sets
5	Feature Selection Method For Label Hierarchical Data
6	Ensemble Particle Swarm Feature Selection Algorithm For Large-scale Data
7	Research On Software Defect Prediction Method Based On Feature Selection And Oversampling
8	Prediction Of Students' Academic Level Based On Feature Selection And Stacking Framework
9	Research On Data Imbalance In Visual Tracking
10	Research On Key Techniques For Class Imbalanced Data Classification