Font Size: a A A

Research On Software Defect Prediction Technology Based On Data Mining

Posted on:2013-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:1118330371498876Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Software defect is an inherent property of the software, it is the "by-product" inthe software development process,its main nazards is reducing the quality ofsoftware,extending the development cycle and increasing development cost.Stagetest is a very importand tool to find software defect timely and improve softwarequality.And accurately predict the distribution of software defects has a greatsignificance for the software testing. With the continuous development of computertechnology, software size and complexity is growing exponentially, in order toaccurately predict software defect generation and distribution, the impact factorswhich people need to analyze are become more and more. At this point, thetraditional predict methods have been difficult to deal with the reasoning predictionwhich has complex causal relationship and uncertainty knowledge, and the predictresults of these methods are often too broad to lost its practical significance. To solvethis problem, people began to try to apply the research achievement of otherdisciplines to the field of software defect prediction,the data mining techniques is acommon tool.This paper launched a research and application work on the software defectprediction techniques based on an in-depth study of data mining technology andsoftware defect prediction techniques, proposed three prediction algorithms based onusing the two branches of the data mining technology-Probabilistic Relational Models and Clustering Analysis Techniques for the software defect predictiontechniques, developed a software defect management and prediction system toachieve the three algorithms and verify their effectiveness. The main contributionand research results of this paper are as follows:1. Classified and predicted software defects based on the test methods. Atpresent, the main research direction of the software defect prediction field is how topredict the number and the error level of software defects. In order to improve thepracticality of the predict results, this paper classified and predicted software defectsbased on the test methods, it can make the predict results be more meaningful.Testers could develop a targeted testing plan based on the predict results.2. Proposed a software defect prediction algorithm based on ProbabilisticRelational Models (PRM). PRM is a statistics relationship learning method evolutesfrom Bayesian network. It can be used in the more complex class and therelationship between class, thus increasing the uncertainty knowledge representationand reasoning ability. In the software development process, we can regard theseentities which affect the production and distribution of software defects as theclasses. These entities or their own property affect the generation and distribution ofsoftware defects directly or indirectly. So, the software defect prediction problem isactually an uncertainty knowledge inference problem with complex dependencies. Atpresent, the research using PRM to predict software defect is very few left, so thispaper proposed a software defect prediction algorithm based on PRM, and improvedit.3. Proposed a software defect distribution pattern prediction algorithm basedon High-dimensional clustering. Clustering Analysis is a typical unsupervisedlearning methods, It divides these instances in the instance set into clusters basedon some of their attributes to ensure that the instances which in the same cluster aresimilar to each other, and the instances which in the different cluster are different toeach other. On the one hand, in the software development process, the test result ofthese software testing projects which been developed by these people who have similar abilities always have similar defect distribution pattern; on the other hand, weclassify and predict software defects based on the test methods, so the dimension ofsoftware defect data will be higher and higher. It is a big problem that how to findthe implicit pattern from the high-dimensional data. So this paper proposed asoftware defect distribution pattern prediction algorithm based on High-dimensionalclustering. With this algorithm, we can find hidden data pattern in the defect data,and integrate these software test item class which have similar attributes to increasethe number of prediction reference data to improve prediction results.4. Proposed a software defect prediction algorithm based on Mixed-variableclustering of people capacity. Another problem of the software defect prediction isthe cold start of the data. For example, there is no related record of the developers ortesters who is newly to join in historical data, and then the prediction aimed on thesepeople is out of the question. In this case, this paper presented a method ofmeasuring personnel capability and according to this method, proposed a softwaredefect prediction algorithm based on Mixed-variable clustering of people capacity.This algorithm can find some person who have similar capability with the newperson, and make a prediction according to their corresponding data.5. Developed a software defect management and prediction system, and usedit to archive and test these three algorithms by experiment. Experimental resultsshow that these three algorithms have their own characteristics. The software defectprediction algorithm based on PRM has high accuracy and low computationalcomplexity on the large-scale data, but its accuracy will be lower on the small-scaledata; In comparison, the software defect distribution pattern prediction algorithmbased on High-dimensional clustering has higher accuracy on the small-scale data,but its computational complexity will be higher on the large-scale data; When wemeet the problem of cold start of the data, the software defect prediction algorithmbased on Mixed-variable clustering of people capacity can make an approximateresult by collecting these instances which have similar properties in the instance set.So, we can improve the quality of the prediction results and reduces the time overhead of the prediction by select the appropriate algorithm flexibility according tothe actual situation. This is very important to improve software quality and reducedevelopment costs.The research of this paper enriches the research work on how to use the datamining technology for software defect prediction field well, improve the practicalvalue of the software defect prediction results, and propose a solution for theproblem of cold start of the data. These works are positive to improve the relatedresearch of the software defect prediction.
Keywords/Search Tags:Software Defect Prediction, Probabilistic Relational Models, High-dimensional Clustering, Defect Distribution Pattern, Mixed-variable Clustering
PDF Full Text Request
Related items