Font Size: a A A

Software Fault Localization Based On Causal Inference And Cluster Analysis

Posted on:2018-07-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:1318330566452273Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Program debugging is one important way to ensure the software reliability.When a failure is detected during software testing,program debugging should be conducted to locate and fix the faults for guaranteeing the correctness of software.Unfortunately,locating a fault is never easy.Since fault status spreads among many statements,methods,modules and systems,the running result may have a great difference with the initial fault status.Thus,it is difficult to find the faulty elements directly.At the same time,with the advanced development techniques and the huge user requirements,software has becoming more and more complexity and functionality.Not surprisingly,this will dramatically increase the workload of program debugging and the difficulty of fault localization.Therefore,advanced fault locating techniques,which can automatically narrow down the fault scope and provide some guideline information,have become the current demands for improving the efficiency of program debugging,as well as software development.Fault localization can be seemed as the process of causal inference,which aims to find the cause of the effect of program failure.Since fault localization has the important practical significance to solve the realistic problem,it attracts a lot of researchers,and many kinds of fault locating techniques had been proposed.However,the current studies still face some challenges in practice.That are:(1)the locating result is not accuracy.Most techniques do not take into account the influences of confounding biases between the program elements and the executing results.Thus,the locating result is inaccurate after suffering the confounding effect.(2)some techniques do not work well in practice.Some techniques,which had shown their effectiveness in empirical study,may not work well in practice.Many uncertainty factors,such as multi-fault and omission error,can violate the assumptions of the techniques,thereby decreasing the effectiveness.(3)the locating process costs so much time.Many techniques,which aim to find the most likely faulty part by analyzing the different executions,have to spent a lot of time on running information collection and executing results inspection.To overcome the listed problems,in this dissertation we focus on the fault localization and propose a series of approaches.The main contributions can be summarized as follows:(1)We propose the causal inference based approach to improve the effectiveness of fault localization,and prove its effectiveness in increasing the fault locating accuracy by empirical study.In this section,we first demonstrate the negative influence of confounding biases on predicate suspiciousness estimation,and then recognize both the control dependence confounding bias and data dependence confounding bias by using the back-door criterion.Finally,using the linear regression analysis,we mitigate the confounding effect in suspiciousness estimation and provide a more accurate result.Moreover,we present a Variable Type-based Predicate Pre-filtering(VTPP)technique to improve the ability of fault-relevant predicate identification,and design a combined predicate collection technique with static slicing and dynamic reducing to improve the efficiency of fault locating.Experimental results show that our approach can significantly improve the effectiveness as well as the efficiency of predicate-based statistical fault localization.(2)We propose the param-value replacement based approach to improve the effectiveness of fault localization,and prove its effectiveness using omission fault program.In this section,we first explain the limitation of the current techniques when locating the omission fault,and then propose a method-level approach to overcome the shortage.For each method,we estimate its impact on executing result by analysing the differences of the parameter's value in passed executions and failed executions.For each method with high impact,its Interesting Parameter Value Mapping Pair(IPVMP)is searched by using the param-value replacement.Finally,based on the impact and IPVMP,a method list is provided for further debugging.Moreover,multithread technique is also used to improve the efficiency of param-value replacement.Experimental results show that omission faults are common in software,and our approach performs better than other fault locating techniques.(3)We provide a theoretical investigation on the interference among the multiple faults in a program,and analyse the influence of fault interference on Spectrum-based Fault Localization(SBFL).In this section,we hope to answer two questions:(a)How to define the fault interference?(b)How will the fault interference impact the effectiveness of SBFL?For the first question,on the basis of PIE model,we propose the E-IP model,which is more suitable for describing multi-fault program,to describe the running behavior of each fault in a program,and then using the model difference to define two kinds of fault interference(i.e.E-IPI and RI).For the second question,based on the coverage frequencies in both the passed executions and failed executions,we first divide the program into nine Mutually Exclusive Subsets(MES),and determine the checking priorities for a given SBFL result.Then we use the changes of fault in MES when the program get affected by the fault interference to analyze its impact on the effectiveness of SBFL.Theoretical analysis show that in most cases,fault interference decrease the effectiveness of SBFL.(4)We propose the Fuzzy C-Means(FCM)clustering based approach to improve the effectiveness of fault localization,and prove its effectiveness using multi-fault program.Based on the former section,we first explain the negative influence of the failed test cases,which are independent with a specific fault,on suspiciousness estimation,and then using the FCM clustering to characterize the membership of each failed test case in every fault clusters.Note that if a failure is caused by several faults,the corresponding test case should be assigned to multiple clusters rather than only one cluster.Therefore,to prevent that from happening,we choose a typical soft clustering technique,FCM,to cluster the failed test cases.Finally,we propose a membership matrix based method to estimate the suspiciousness of each statement,and provide a checked list for further debugging.Experimental results show that our approach can mitigate the negative influence of fault interference and improve the SBFL effectiveness with respect to multi-fault program.(5)We propose the distance based approach to improve the efficiency of test-suite reduction,and prove its effectiveness in reducing the time cost of fault localization by empirical study.In this section,we first explain the negative influence of collecting the testing information of the whole original test-suite,and then summarize three test requirements(i.e.result based requirement,coverage based requirement and partition based requirement)that are helpful in fault localization.Finally,using a greedy algorithm,we select a subset of original test-suite that is most likely to meet all of the test requirements.Note that our approach is guided by the distances(e.g.string distance,functional distance)among the test cases rather than the whole testing information.Therefore,our approach only need to collect the testing information of a part of test cases.Experimental results show that our approach can effectively reduce the size of the given test-suite as well as the time cost of fault localization.Meanwhile,the fault localization effectiveness is close to that of using the whole test-suite.All in all,an automated software fault locating approach based on causal inference and cluster analysis is proposed in this dissertation.It provides a new idea and method for solving the following critical scientific problems: increase the fault locating accuracy and efficiency,improve the fault locating effectiveness on multi-fault program and omission fault program.
Keywords/Search Tags:program debugging, fault localization, casual inference, cluster analysis, distance estimation, program slice, spectrum-based fault localization, dependence analysis, suspiciousness estimation
PDF Full Text Request
Related items