Font Size: a A A

Research On Application Technology Of Feature Selection In Software Defect Prediction

Posted on:2016-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:S L LiuFull Text:PDF
GTID:2308330461456532Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increasing growth of software scale and complexity, software quality receives a lot of attention. During the process of software development, predicting defect-prone software entities in advance which can help to detect and repair defects will effectively allocate limited testing resource and improve software reliability. By mining software history archives, analyzing historical defect information and building defect prediction models, software defect prediction (SDP) can recognize the latent defect-prone software entities in advance.During the construction of SDP dataset, redundant features and irrelevant features are inevitable if too many software metrics (features) are chosen. The existence of these features improves the complexity and deteriorates the performance of SDP mod-els. Therefore, Designing an effective feature selection method that can recognize and remove these features has an significant influence on SDP.By analyzing the correlation between two features (FF-Correlation) and assessing the relevance of a feature to the target class (FC-Relevance), we propose a feature s-election framework FECAR (FEature Clustering And feature Ranking). FECAR can handle irrelevant features and redundant features at the same time. Firstly, FECAR per-form feature clustering based on FF-Correlation to find redundant information among features. Then FECAR sort the features in each cluster according to FC-Relevance of each feature, and select some of the highest ranking features from each cluster to construct the wanted feature subset. By studying FECAR in Eclipse and NASA project datasets, we find that FECAR can effectively remove irrelevant and redundant features, and further improve SDP performance.The contributions of this paper can be summarized as follows:1. We make a detailed survey on the state of the art software defect prediction. We firstly introduce the concepts and definition of SDP. Then we make a detailed survey on the state of the art SDP from four aspects:software metrics, SDP models, feature selection and evaluation methods.2. We propose a new feature selection framework FECAR. To handle irrelevant features and redundant features in SDP, we propose a feature selection frame-work FECAR. We will introduce the motivation of FECAR, the framework of FECAR, the FF-Correlation and FC-Relevance measures used in FECAR and the time complexity of FECAR in sequence.3. We design and perform empirical studies on real projects to show the effec-tiveness of our proposed framework. By designing and performing empirical studies on 13 real projects with different empirical strategies, we study the ef-fectiveness of FECAR from the redundancy rate of selected feature subset by FECAR and the influence of FECAR on SDP performance.
Keywords/Search Tags:Software Defect Prediction, Feature Selection, Feature Clustering, Feature Ranking
PDF Full Text Request
Related items