Research On Software Modules Defect Prediction Methods Based On Clustering

Posted on:2015-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2298330422471823

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

With the rapid development of Software Engineering and the improvement of thecomplexityofsoftware systems, the reliabilityof software is becoming the focus.However, software defects threaten the reliabilityof software. How to predict defectsbefore software release has become a serious problem.Traditional researches on software defect prediction mainly focus on supervisedmethod. This type of method needs a lot of instances with labels as the training set.However, the instances with labels are difficult to obtainin engineeringpractice.There are three kinds of metrics: metrics about source code, McCabe metrics andHalstead metrics. The thesisadopts Principal Component Analysis (PCA) to deal withthose metrics. So that it lowers the dimensions and raises precision.The thesis proposes a series of pre-processing method for software defect data sets,including filling missing values, removing incorrect data, standardizing data by z-score.At last, the thesis proposes two improved clustering methods to predict defects:①The thesis proposes an improved fuzzy c-means algorithm, combining simulatedannealing and genetic algorithm, to solve the defect that fuzzy c-means is easily affectedby the initial clustercenters.The thesis makes some experiments on public data setsprovided by NASA promise repository. The results which are compared with the resultsof traditional methods prove that the proposed method is effective.②The thesis proposes an improved DBSCAN algorithm to solve the defect thatimproved fuzzy c-means is only able to discover the circle cluster. Using improvedDBSCAN algorithm, the thesis makes experiments on six software defect data sets bythe same pre-processing method. By the comparison between the results of improvedDBSCAN and improved fuzzy c-means, we find that when the data sets have lowdimension, the accuracy of improved DBSCAN is higher than improved fuzzy c-meansand when the data sets have high dimension, the accuracy of improved DBSCAN islower than improved fuzzy c-means.Two improved clustering algorithm are compared with traditional methods. Thereare many advantages: without class attribution, more adapted to actual engineering,higher accuracy and robustness.

Keywords/Search Tags:

software defect prediction, clustering, fuzzy c-means, DBSCAN, metrics

PDF Full Text Request

Related items

1	Metrics-Based Software Defect Prediction
2	Research On Some Key Technologies Of Software Defect Prediction
3	Research On Software Defect Prediction Method Based On Integrated Learning
4	The Data Mining Techniques In The Software Defect Management
5	Software Defect Prediction Using Fuzzy Support Vector Regression
6	Based On The Clustering Of The Program Software Defect Prediction Method Study
7	Research On Software Defect Prediction Technology Based On Data Mining
8	Research On Software Defect Prediction For Evolving Projects
9	Incomplete Supervision Of Software Defect Prediction Technology Research
10	Static Metrics Based Cross-Project Software Defect Prediction