Font Size: a A A

Software Defect Prediction Based On Machine Learning

Posted on:2013-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:W W TuFull Text:PDF
GTID:2298330434475666Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Software has become an indispensable part of modern life, so the quality of software is particularly important, but the presence of software defects seriously affects the quality of the software. Software defect prediction approaches which predict defects that may exist in the software help the software developers to target the limited resources at more important modules which may be defective to improve the quality of software. Many machine learning approaches have been used to build the software defect prediction models. Now machine learning based software defect prediction has become one of the hotest topics in software engineering research society.The two crucial problems of building a software defect prediction model are how to build an accurate model and how to help the software developers find the key factors that affect the software defects. But in real scene, because of the heavy cost of labeling, we can only have limited modules to be well tested to obtain the labeled data. At the same time, traditional software defect prediction models are insensitive to the software defects. Besides, traditional software defect prediction approaches cannot help the software developers to find the key factors that affect the software defects. This thesis studies the problems mentioned above, achieving the following innovations:1. The limited data issue and the lack of sensitivity to the software defects issue always exist at the same time. The cost-sensitive semi-supervised learning algorithm CS4VM is introduced to solve the two issues. Empirical studies show that CS4VM is effective when solving the two issues together. But CS4VM is based on the embedded cost-sensitive learning approach, this is not general. And CS4VM is based on low density separation based algorithm, it suffers from heavy computation cost.2. To obtain a more general and efficient algorithm, a cost-sensitive semi-supervised learning algorithm named CoForest-CS is proposed to solve the two issues together. CoForest-CS is based on the general cost sensitive learning approach, this makes it more general than CS4VM. And CoForest-CS is based on disagreement based semi-supervised learning, it’s efficient. Empirical studies show that CoForest-CS is effective when solving the two issues together.3. To mine the key factors that affect the software defects, different from most previous work, the factors from the whole software development procedure are considered. A rank learning algorithm is introduced to combine these factors together to build a rank prediction model for the software defect density. By analyzing the rank prediction model, the key factors that affect the software defects are found. Empirical studies show that the ranking model is suitable for mining the key factors that affect the software defects.
Keywords/Search Tags:Machine learning, software mining, software defect prediction, cost-sensitivesemi-supervised learning, rank learning
PDF Full Text Request
Related items