Research On Software Defect Prediction Method Based On Feature Selection

Posted on:2021-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Ren

Full Text:PDF

GTID:2428330611464314

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,software products have been gradually integrated into various fields of current society and life.The scale and the complexity of software development has also increased rapidly.However,it is impossible to avoid software defects in the process of software development.Due to the increasingly close relationship between software product and people's production and life,the serious bad consequences of software defects are often difficult to estimate,so software quality has received more and more attention.Software testing is really an important measure to ensure software quality.Software testing can find defects in the software and repair them in a timely manner,but for those complex and huge software products,it is impossible to complete the test work of all paths.To find as many defects as possible,it requires a lot of labor and time.The software defect prediction technology uses the historical defect data of the software to build a model and perform defect prediction on software modules.In this way,modules that may have defects in the software can be identified in advance,and sufficient testing resources are invested for high-risk modules which have so many software defects to ensure the quality and testing efficiency of software products.Software defect prediction technology is a hot topic in the field of software engineering.At present,the data in software products is usually massive,and features usually have very high dimensions.In order to ensure the execution efficiency and classification accuracy of defect prediction model,feature selection becomes an important method.In this paper,from the aspect of feature selection,solutions to the main problems of software defect prediction is proposed.At present,the main problems faced by software defect prediction are:(1)There are a lot of redundant features and irrelevant features in software defect data.There are many features in historical defect data that are not helpful for predicting the results,but also affect the efficiency of the model and even the accuracy of the results.(2)Classification performance problems.During the research,the feature selection algorithm and the classification algorithm are not related,and it is impossible to determine which feature subsets can achieve the best prediction accuracy for a particular classification algorithm.(3)Class imbalance.Software defect prediction is a two-category problem.Software modules are divided into two types: defective tendency and non-defective tendency.Among them,the defective tendency module is far less than the non-defective tendency module,which will interfere with the accuracy of prediction results.In view of the above problems,this paper proposes the following defect prediction methods:(1)Aiming at the problem of redundant and irrelevant features of software defect data,a two-stage filter feature selection algorithm is proposed.The defect data set is processed in two stages.The first stage calculates the correlation of the features,and the second stage calculates the redundancy,and combines the previous correlations to filter out the final features to achieve the purpose of dimensionality reduction and improve Efficiency and accuracy of prediction models.Experiments prove the effectiveness of the method.(2)Aiming at the classification performance problem,a hybrid feature selection algorithm based on filter algorithm and wrapper algorithm is proposed.The algorithm combines the advantages of filter and wrapper.First,features are selected from the correlation and redundancy,and then combined with the sequential forward floating strategy to select features.Classification accuracy is used as an evaluation index to select the optimal feature subset.This method achieves the purpose of improving the classification accuracy and reducing the dimensionality of the features.The validity of the method is verified by comparative experiments.(3)Aiming at class imbalance,a cost-sensitive hybrid feature selection algorithm is proposed.First,cost-sensitive information is put into the filter feature selection algorithm to solve the class imbalance problem.The combination of filter and wrapper feature selection algorithms is then used to obtain the optimal feature subset,which effectively improves the overall prediction performance and improving the prediction accuracy.Experiments prove the effectiveness of the method.To sum up,this paper is mainly based on the theory and method of feature selection to solve the practical problems in software defect prediction.On the one hand,the research based on feature selection broadens the application field of machine learning,and also provides a new idea for the research direction of software defect prediction;on the other hand,it is of great practical value to guarantee the quality of software products and improve the efficiency of testing.

Keywords/Search Tags:

Defect Prediction, Feature Selection, Class Imbalance, Software Engineering

PDF Full Text Request

Related items

1	Research On High-dimensional Data Processing In Software Defect Prediction
2	Research On Software Defect Prediction Method Based On Fusion Feature Selection And Ensemble Learning
3	Researches And Applies On Software Defect Prediction Method Based On Ensemble Learning
4	Research On Data Preprocessing Technologies For Software Defect Prediction
5	Research On Software Defect Prediction Method Based On Semi-supervised Integration
6	Research On Software Defect Prediction Method Based On Cost Sensitive Learning Adacost
7	Research On Data Preprocessing Technology In Cross Project Software Defect Prediction
8	Research On Software Defect Prediction Based On Feature Selection And Instance Transfer
9	Research And Application Of Feature Selection For Software Defect Data
10	Research And Implementation Of Software Defect Prediction Model Construction And Sharing Methods