Font Size: a A A

Applying Learning Algorithm To Predict Software Defect

Posted on:2016-06-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Gabriel Kofi ArmahFull Text:PDF
GTID:1108330473956088Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The current trend of the usage of software in virtually every sphere of life like education, aviation, military installations, manufacturing, banking, agriculture and other fields has resulted in paying much attention to software quality and reliability. Software defect prediction is an important area in the field of software quality and software reliability. Software defects carry with it an expensive fixing cost and poor quality software which will make it highly unreliable and thus resulting in the failure of the Software. Software Engineering researchers have and continue to do a lot of work in the area of Software defect prediction in order to minimize the number of defects and their severe impact on software projects.Researchers in this area have applied machine learning models and data mining techniques on software repositories in extracting defect on software product. Some of these common techniques used for classification are bayesian classifiers, bayesian belief networks, backpropogation (used in neural network), support vector machines and k-neighbor classifiers. Some methods for prediction are linear regression, nonlinear regression and other regression based models. In this dissertation, we proposed a classifier/predictor model by taking into account the highly imbalanced datasets of the defective data point with our proposed variant formula for:precision, recall and F-measure were presented.In our first solution we propose the use of weka for multi-level data pre-processing by filtering. We compared the performance of four different k-nearest neighbor (KNN)-LWL, Kstar,1BK and IB1 classifiers with non nested generalized exemplars (NNGE), random tree and random forest. The multi-level data pre-processing includes double attribute selection and tripartite instance filtering which resulted in excellent prediction. The excellent performance achieved was attributed to the removal of irrelevant attributes dimension reduction and resampling to handle the problem of class imbalance.The second solution is geared towards our proposed variant formula for precision of an imbalanced class distribution; our formula gave values which were closer to the accuracy computation for both balanced and imbalanced class distribution.NASA dataset was used to come out with well-documented examples as to how to get a higher accuracy, with its corresponding higher precision and subsequently a higher recall and F-Measure values which are reflection of our high classifier performance predictor. We used data with the minority class of between 5 to 10 percent (5%-10%) data points inclusive. We pegged our, true positive rate (TPR) of one (1) for our proposed solution, whiles variable false positive rate (FPR) ranging from 0.01 to 0.05 inclusive at an interval of 0.01 was used for this analysis.The third solution proposes application of variable regularized logistic regression which was extended to the use of modify variable regularized logistic regression. This solution looks at two Novel algorithms for modeling software defect prediction; using a non-linear logistic regression. The first algorithm, variant variable regularized logistic regression (VVRLR), for the computation of the hypothesis functions. The second algorithm, Evaluator’s Computation (P), is used for the computation the overall accuracy of our prediction model, mean average error(MAE); computation of the variant formulae for precision, recall and F-measure which was proposed theoretically in solution two to come up with modify variant variable regularized logistic regression(MVVRLR). Our algorithms give an improved defect prediction by the use of two cross-company datasets from NASA and SOFTLAB in comparison with some popular and related weka algorithms.Finally, experiments are carried out to test efficiency and performance of our proposed algorithms. In the first experiment, the results show that multi-level data pre-processing by filtering gave an enhanced defect prediction results. At the same time there was also an improved performance when we considered the prediction results independently, by considering attribute selection and resampling filtering separately. In the second and third experiments, the results show that the proposed variant precision formula and Algorithms did give an enhanced and better performance prediction values than their related weka algorithms for classification prediction.
Keywords/Search Tags:Multi-Level, Data Pre-processing, F-measure, Regularized logistic regression
PDF Full Text Request
Related items