Font Size: a A A

Research On Fault Localization Using Correlation Analysis

Posted on:2015-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LeiFull Text:PDF
GTID:1108330509460988Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Being a key activity in software debugging, fault localization is important for improving debugging performance and software quality. In these years, researchers from both academia and industry have devoted their effort to develop effective fault localization techniques, and made a lot of progress. However, due to the complicated principle of how faults behave in software, there are still many challenges that need to be addressed to obtain more improvements on fault localization. Therefore, this paper conducts a systematical study on fault localization in five aspects: test case generation, the impact of test cases on fault localization, semantic correlation model, fault localization without test oracles, and the impact of information types on fault localization. Based on our work, we propose a series of fault localization techniques. The large-scale experiments show that the proposed techniques can significantly improve fault localization effectiveness. The main contributions of this paper are summarized as follows:1) This paper proposes a new test case generation approach based on feedback, and presents a new solution to the issue on the lack of interaction between debugging erngineers and fault localization.The effective interaction between software debugging engineers and fault localization techniques can greatly improve fault localization performance. However, existing fault localization techniques usually ignore this interaction and therefore the lack of interaction may lead to the issue of information inadequacy, which can substantially influence fault localization performance. To address this issue, this paper proposes a test case generation approach based on feedback to simulate the pattern of debugging engineers as they apply their knowledge and experience to this interaction to improve fault localization performance. This approach first extracts the pattern of debugging engineers into a feedback rule. Then, it utilizes the test case generation technique to generate new test cases satisfying the feedback rule. Finally, the new test cases are fed back to the original test suite for interacting with a fault localization technique, and the above process will be iterated until a specific stopping condition is satisfied. Experiments demonstrate our approach obtains the significant improvement in terms of fault localization effectiveness.2) We propose a random sampling approach to study the impact of test suites on fault localization, and present a new perspective on fault localization and suggest fresh directions of research on test case optimization.Test suites provide runtime information for fault localization to analyze and deduce the locations where faults exist in software. The study on how test suites impact on fault localization can benefit an extensively research topic, such as the design of test suites, test case generation, and test suite reduction. This paper use random sampling to generate a large number of random samples of the existing test suites used for fault localization, and apply a promising fault localization technique on existing test suites and all their samples to investigate the relationship between localization effectiveness and test suite size. The results show that there is no strong correlation between localization effectiveness and test suite size. Furthermore, in a test suite, the passing test cases which do not execute the faulty statements and the failing test cases have a positive impact on the fault localization effectiveness; whereas the passing test cases which exercise the faulty statements can penalize the localization performance. We further find that SFL can always obtain the maximal benefit from the failing test cases and thus the fluctuation among the samples is mainly caused by passing test cases. These results and analysis lead us to define a new metric, Passing Tests Discrimination(PTD), to quantify the ability of passing test cases of a test suite in helping increase the suspiciousness of a faulty statement. To validate our analysis and provide additional insight for the observed fluctuation inherent in test suites, we propose a test suite optimization approach using PTD and the results demonstrate that in general the optimized test suites perform better than the original test suites.3) This paper presents a new fault localization approach based on semantic correlation, and breaks through the accuracy bottleneck of weak correlation evaluation.Most current fault localization approaches usually analyze isolated program entities and ignores the semantic correlation among program entities. It is vital to introduce more semantics into fault localization to promote fault localization effectiveness. Thus, this paper adopts backward slice to define a semantic correlation model to incorporate more semantics into the fault analysis. This model can slice a context that is correlated with test results, that is, it can identify whether the execution of a statement affects the output of a test case. We successfully incorporate this semantic model into the state-of-the-art statistical methodology for fault localization, namely the statistical methodology of Spectrum-based Fault Localization(SFL). The experiments show that the proposed approach can greatly improve fault localization effectiveness.4) We propose a new fault localization approach without test oracles, and introduce a new methodogy of alleviating the oracle problem in fault localization.In general, fault localization techniques require a test oracle to determine whether a test case is failed or passed. Otherwise, these techniques are infeasible. However, in reality, it is difficult or almost impossible to find test oracles in many cases. This famous problem is called oracle problem. To alleviate the oracle problem, this paper leverages the framework of metamorphic testing to replace the result of failure or pass for an individual test case with the metamorphic result of violation or non-violation for a metamorphic test group. In this case, test oracles are no longer required to conduct fault localization. We apply such replacement to the fault localization approach based on semantic correlation, and the experiments demonstrate that the effectiveness of our proposed approach is comparable to that of existing techniques in the cases where test oracles exist. Furthermore, the results show that backward slices outperform execution slices in improving fault localization effectiveness, and also recommends that the formulas GP19 and ER1’ have a higher probability of performing better than the other formulas in the current statistical methodology.5) This paper studies the effect of information types on fault localization, and finds the negative effect of Frequency Execution Count on localization effectiveness. The findings suggest fresch directions on the selection and design of information types for fault localization.Runtime information of a test suite needs to be described by a specific information type. Different information types have different expressive power and therefore pose different impacts on fault localization effectiveness. It is necessary to investigate the main information types used by fault localization to guide the selection and design of information types for fault localization. This paper chooses four representative information types and conducts a systematic study on their impacts on fault localization in the methodology of the popular fault localization technique, namely SFL. The four information types are Binary Information of Execution Count(BC), Frequency Execution Count(FC), Backward Slice(BS) and the combination of BS and FC(FC&BS). The results show that frequency execution count involves a high risk of decreasing the effectiveness of fault localization, and dependence information types(e.g. BS) are more effective than code coverage(e.g. BC and FC) in improving fault localization effectiveness. Therefore, we recommend using program dependence to develop a new information type for obtaining more improvements on fault localization.
Keywords/Search Tags:fault localization, program spectra, correlation analysis, backward slice, test case generation, test suite, test oracle, information type
PDF Full Text Request
Related items