Font Size: a A A

Research On Software Modules Defect Prediction Methods Based On Clustering

Posted on:2015-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2298330422471823Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of Software Engineering and the improvement of thecomplexityofsoftware systems, the reliabilityof software is becoming the focus.However, software defects threaten the reliabilityof software. How to predict defectsbefore software release has become a serious problem.Traditional researches on software defect prediction mainly focus on supervisedmethod. This type of method needs a lot of instances with labels as the training set.However, the instances with labels are difficult to obtainin engineeringpractice.There are three kinds of metrics: metrics about source code, McCabe metrics andHalstead metrics. The thesisadopts Principal Component Analysis (PCA) to deal withthose metrics. So that it lowers the dimensions and raises precision.The thesis proposes a series of pre-processing method for software defect data sets,including filling missing values, removing incorrect data, standardizing data by z-score.At last, the thesis proposes two improved clustering methods to predict defects:①The thesis proposes an improved fuzzy c-means algorithm, combining simulatedannealing and genetic algorithm, to solve the defect that fuzzy c-means is easily affectedby the initial clustercenters.The thesis makes some experiments on public data setsprovided by NASA promise repository. The results which are compared with the resultsof traditional methods prove that the proposed method is effective.②The thesis proposes an improved DBSCAN algorithm to solve the defect thatimproved fuzzy c-means is only able to discover the circle cluster. Using improvedDBSCAN algorithm, the thesis makes experiments on six software defect data sets bythe same pre-processing method. By the comparison between the results of improvedDBSCAN and improved fuzzy c-means, we find that when the data sets have lowdimension, the accuracy of improved DBSCAN is higher than improved fuzzy c-meansand when the data sets have high dimension, the accuracy of improved DBSCAN islower than improved fuzzy c-means.Two improved clustering algorithm are compared with traditional methods. Thereare many advantages: without class attribution, more adapted to actual engineering,higher accuracy and robustness.
Keywords/Search Tags:software defect prediction, clustering, fuzzy c-means, DBSCAN, metrics
PDF Full Text Request
Related items