Font Size: a A A

Research On Some Key Technologies Of Software Defect Prediction

Posted on:2017-03-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:M ChengFull Text:PDF
GTID:1368330512454964Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Software defect prediction technology is an active research topic in the domain of software engineering and machine learning. It builds a prediction model by mining the software history data and extracting the metrics features related to software defects. In the early stage of the system development, software defect prediction model is used to identify defect-prone modules in a forthcoming version of a software system and help allocating the efforts on those modules, reducing the cost of software development and maintenance. This paper aims to explore how to build more efficient and accurate prediction model by using some new methods of machine learning. The present software defect prediction techniques have the three main problems in the practical application:(1) Building defect prediction model usually requires enough labeled data, but the manual labeling defective modules is a time-consuming process, the current methods that learn from such a small labeled training set may not perform well. (2) In cross-project defects prediction, data distribution are different between different projects, the traditional defect prediction methods have weak adaptive ability for cross-project defect prediction task; (3) Most previous cross-project defect prediction efforts assumed that the cross-project defect data have the same metrics set. However, in real scenarios, this assumption may not hold. Therefore, the existing methods cannot deal with heterogeneous cross-project defect prediction problem. In this paper, we propose three effective solutions for the above three problems. The detailed works in this thesis are as follows:(1) Dictionary learning for semi-supervised software defect predictionWe address class imbalanced and limited labeled data problem in software defect prediction by proposing a novel semi-supervised task-driven dictionary learning approach. Difference from the previous data-driven dictionary learning method that only tries to reconstruct the training samples well and not considers the optimal classification task, we jointly optimize the classifier parameters and dictionary ensuring that the learning sparse code features are optimal for classifier. Moreover, the defect data is essentially imbalanced, and the traditional classification methods are more inclined to the overall classification accuracy, ignoring the minority class. During the dictionary learning process, we take the different misclassification costs into consideration and emphasize the risk cost to make the classification inclining to classify a module as a defective one, alleviating the class-imbalance issue.(2) Research on cross-project defect prediction on transfer learning methodThe traditional cross-project defect prediction methods usually select the similar data from source data as training data to build the prediction model. The discarded dissimilar data may contain useful class discrimination information for training. In this paper, we consider all the training data information without discarding any samples, and propose a weighted Bayesian transfer learning algorithm. This algorithm first constructs the attribute feature vector for the training and target set, and then we calculate the differences of each training sample and test set; we employ the data gravity method to transfer the differences information into the weights of the training data. On these weighted data, the defect prediction model is built. We conduct a theoretical analysis for the comparative methods, and conduct experiments on the open source data sets. The experiments validate the effectiveness of our method.(3) Research on cross-project defect prediction based on heterogeneous metricsWe propose a novel cost-sensitive correlation transfer support vector machine approach to deal with heterogeneous cross-project defects prediction problem. This approach first uses Unified Metric Representation (UMR) to reconstruct the source and target data; Based on the obtained UMR, we advance canonical correlation analysis for deriving a joint feature subspace to associate cross-project data, so the correlation transfer information can be exploited to train the prediction model and improve the class discrimination ability. Moreover, we use the cost sensitive learning technique to make the classification inclining to classify a module as a defective one, alleviating the impact of imbalanced data.
Keywords/Search Tags:Software Defect Prediction, Cross-project Defect Prediction, Dictionary Learning, Transfer Learning, Heterogeneous Metrics, Canonical Correlation Analysis
PDF Full Text Request
Related items