Research And System Construction Of Data Preprocessing Mechanism In Software Defect Prediction

Posted on:2020-11-12

Degree:Master

Type:Thesis

Country:China

Candidate:W Chen

Full Text:PDF

GTID:2428330590996024

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology,a variety of computer software is widely used in all walks of life.During the process of software development and maintenance,it is inevitable to produce various defects.By analyzing historical defect and building software defect model,software defect prediction can recognize the latent defect prone software entities.There are some problems in software defect prediction that need to be solved,such as imbalanced data,dimensional disasters,etc.However,the current algorithms for imbalanced data processing are generally based on the K-Nearest Neighbor,which is computationally intensive and susceptible to noise data.The traditional feature selection algorithm can not effectively remove most of irrelevant features and redundant features.In this thesis,the imbalanced data processing technology and feature selection methods in software defect prediction are deeply researched and explored.The main research work of this thesis is as follows:(1)Summarize the current imbalanced data processing algorithms in software defect prediction,and analyze their advantages and disadvantages.Density-based spatial clustering of applications with noise(DBSCAN)is not rigorous when dealing with the samples near the borderline,we optimize the DBSCAN algorithm for this problem to make the clustering more reasonable.Combining the optimized DBSCAN and SMOTE,this thesis proposed a synthetic minority over-sampling technique based on density clustering.Firstly,we use the optimized DBSCAN divide the samples of minority class into three groups where are noise samples,core samples and borderline samples,then remove the noise samples of minority class,finally,the algorithm use different strategies to over-sample core samples and borderline samples.In empirical study,the algorithm is compared with several classical oversampling algorithms on the NASA software defect dataset.The results show that the algorithm can effectively solve the problem of data imbalance in software defect prediction.(2)Summarize the current feature selection algorithms in software defect prediction,and analyze their advantages and disadvantages.Aiming at the problem that the traditional feature selection algorithms can not effectively remove most of irrelevant features and redundant features.A cluster-based feature selection algorithm is proposed.Firstly,this algorithm use ReliefF algorithm to calculate the relevance between each feature and the target class,then sort the features to remove irrelevant features,after that,cluster the features according to the correlation between the remaining features,finally select the representative features of each cluster.The algorithm considers the correlation between features and the relevance between features and the target class,which can effectively remove redundant features and irrelevant features.In empirical study,we compare our method with classical feature selection algorithms on NASA software defect prediction datasets.The results show that the algorithm can effectively solve the dimensional disaster problem in software defect prediction.(3)Based on the above synthetic minority over-sampling algorithm and cluster-based feature selection algorithm,this thesis designs and constructs a software defect prediction data preprocessing system.The client includes upload module,oversampling module,feature selection module and algorithm comparison module.The server includes data analysis module,system algorithm module and algorithm comparison module.The system can oversample and select features of software defect prediction data sets,compare different algorithms,and clearly and accurately display the results of software defect prediction,which is beneficial to reduce the time and cost of software development and testing.

Keywords/Search Tags:

software defect prediction, data preprocessing, imbalanced data, oversampling, feature selection

PDF Full Text Request

Related items

1	Research On Data Preprocessing Technologies For Software Defect Prediction
2	Research Of The Software Defect Prediction Method For Imbalanced Data
3	Software Defect Prediction Model Driven By Imbalanced Datasets
4	Improved Methods Of Oversampling And Feature Selection Based On Imbalanced Data
5	Research On Imbalanced Data Processing In Software Defect Prediction
6	Research On Class Imbalanced Data Generation Method For Software Defect Prediction
7	Research On Data Preprocessing Technology In Cross Project Software Defect Prediction
8	Wide Research Of Data Mining With Machine Learning On Software Defect Prediction
9	Software Defect Prediction Strategy Design For Imbalanced Data
10	Research And Application Of Feature Selection For Software Defect Data