Font Size: a A A

Research On Software Defect Prediction Based On Machine Learning

Posted on:2018-04-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q YuFull Text:PDF
GTID:1318330539975101Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the increases of software scale and its complexity,software maintenance is becoming more and more difficult.During the process of software development and maintenance,it is inevitable to produce various defects.Software defect is the main factor which could affect the quality of software.If there are defects in the software product,they may cause the software failure during its running,and even lead to the software collapse.Software testing can find the defects in the software product,but excessive testing may affect the progress of software development and increase its cost.Software defect prediction plays an important part in software testing,which aims to find potential defects based on historical data,thereby allocating the testing resources reasonably and improving the testing efficiency.In recent years,software defect prediction has drawn much attention of researchers in software engineering.The machine learning based approaches has also become the research focus of software defect prediction.To solve the key problems of supervised defect prediction for classification,such as feature selection,the impact of class imbalance and the inadequate usage of evolution information in within-project defect prediction,the irrelevant or redundant features in cross-project defect prediction and the heterogeneous features in cross-company defect prediction,this paper aims to present new techniques and new approaches based on machine learning for software defect prediction to further improve its performance.The contents are described as follows:(1)To measure the correlation between different features and the class(defective or non-defective)in software defect prediction,we present a feature selection approach based on a similarity measure.The feature weights are updated according to the similarity and feature differences of samples in different classes,then a feature ranking list is generated by sorting the feature weights in descending order.After that,all feature subsets are selected from the feature ranking list in sequence and evaluated for their classification performance.Compared with four feature selection approaches,the results show that the proposed feature selection approach performs better than or is comparable to the compared approaches.(2)In order to explore the impact of class imbalance on the performance of software defect prediction models,we present an approach to analyzing the impact of class imbalance.An algorithm is designed to construct new datasets,which can convert an original imbalanced dataset into a set of new datasets with imbalance ratio increased one by one.Typical prediction models are selected to make predictions on these new constructed datasets,thereby evaluating the performance stability of each prediction model with class imbalance.Moreover,we also evaluate the performance stability of cost-sensitive models and ensemble models with class imbalance.(3)Based on the evolution information of object-oriented programs,we present two evolution metrics from the defect rates of historical packages and the change degree of classes.We use a feature selection approach to compare the correlation between code metrics and the class with that between evolution metrics and the class.The results show that,compared with the code metrics,the proposed evolution metrics are more relevant to the class.Moreover,adding these evolution metrics can improve the performance of defect prediction effectively.(4)To solve the problem of irrelevant or redundant features in cross-project datasets,we present a cross-project defect prediction approach based on feature selection.We use feature subset selection and feature ranking to show their efficiencies for cross-project defect prediction.The results show that,similar to within-project defect prediction,feature selection can also improve the performance of cross-project defect prediction to a certain extent.(5)To address the heterogeneous features of cross-company datasets,we present a feature transfer approach for cross-company defect prediction.First,according to the ‘distance’ of different feature distribution curves,we design a feature matching algorithm to convert the heterogeneous features into the matched features.Then,we use the transfer learning approach to transfer the features in the source project to the matched features in the target project,thereby achieving the cross-company defect prediction.Finally,we conduct a large number of experiments to show the validity of the proposed approach,and we also discuss its performance with different factors.To sum up,this paper uses machine learning theories and methods to solve the practical problems in software defect prediction,which can not only enrich and broaden the applications of machine learning theories,but also improve the worth of machine learning methods.What’s more,it provides a new research guide for software defect prediction,and it is of great significance in improving software quality and software reliability.
Keywords/Search Tags:machine learning, software defect prediction, feature selection, software evolution, feature transfer
PDF Full Text Request
Related items