Font Size: a A A

Software Fault Localization Based On Data Mining

Posted on:2016-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L CaoFull Text:PDF
GTID:1108330479486213Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, social development brings a new challenge for software testing and debugging. The higher requirements are put forward for software due to the ever-changing of conditions and demands. Software debugging is an important way to ensure software quality at present, in which fault localization is one of the most time-consuming activities. Fault localization intends to automatically locate the faults in software, which improves software quality and reduces the cost of software debugging. Therefore, research on fault localization is of great significance.It is an important research topic for fault localization at home and abroad that has a significant scientific significance and application prospect. So far, because of the complexity and heterogeneity of software system, research on fault localization is not enough perfect. However, fault localization has obvious drawbacks, which are as follows:(1) The effectiveness of fault localization remains to be further improved. Most existing fault localization uses statistical theory to refine the result and to improve the precision of fault localization to a certain extent.(2) There is little consideration for the effectiveness of fault localization caused by coincidental correctness, which can reduce the effectiveness of fault localization.(3) Existing methods are difficult to locate multiple faults effectively at the same time that makes it difficult for developers to efficiently locate multiple faults simultaneously.To alleviate the mentioned problems above, this dissertation started with program slicing and then utilized data mining to locate faults. First, we proposed a fault localization approach combining association analysis with rank strategy. Second, we studied the effect of coincidental correctness on fault localization. Finally, for multiple faults in software, we proposed a multiple fault localization approach. The main contributions of this dissertation are summarized as follows:(1) A fault localization approach based on association analysis and rank strategy(FLAR) was proposed. Association analysis reflects the relationship between program statements and the corresponding execution results that was helpful to locate faults. Then, rank strategy was utilized to prioritize program statements. The experimental results show that FLAR was effective in locating faults. Furthermore, association analysis and rank strategy were extended to the improved dynamic slicing. Thus we proposed a fault localization approach(DS-FLAR) combining dynamic slicing with association analysis. Empirical studies show that DS-FLAR is more effective than the compared ones.(2) For the problem of coincidental correctness in software, we developed a theoretical framework based on function derivation to analyze the impact of coincidental correctness on suspiciousness formulas. The goal is to investigate how coincidental correctness impacts the effectiveness of 30 suspiciousness formulas from the view of theoretical analysis. This provides a theoretical basis for a coincidental correctness identification method will be chosen when the faults are located in software.(3) To improve the effectiveness of fault localization, a coincidental correctness identification approach for effective fault localization was proposed. First, the higher suspicious coincidental correctness elements are selected as feature elements of program execution traces, and then program execution traces are reduced in terms of feature elements. Second, fuzzy c-means is used to cluster program execution traces in order to identify coincidental correctness test cases. Finally, fault localization is conducted based on the program execution traces after removing coincidental correctness. Different from existing approaches, the new approach takes advantage of choosing feature elements to reduce program execution traces and fuzzy c-means clustering, in order to improve the effectiveness of fault localization. It has been applied to analyze three groups of programs, and test cases removing coincidental correctness have been used as input for four popular fault localization approaches(i.e., Tarantula). Results show that the approach outperforms k-means based coincidental correctness identification approach in terms of effectiveness, and that it has a low false positives and false negatives than the compared one.(4) For existing multiple faults in software, a multiple fault localization based on Chameleon clustering was proposed. The processes are as follows. First, the suspiciousness of program elements is computed based on the combination of each failed program execution trace with all passed program execution traces. The most suspicious elements are selected as the feature elements of program execution traces and program execution traces are reduced by feature elements. Second, the reduced failed program execution traces are performed by clustering analysis, after that, each failed cluster contains one fault. Third, the failed execution trace cluster merges all passed execution traces, and then the suspiciousness of program elements is computed. Finally, multiple faults are located at the same time in terms of the descending suspiciousness of program elements in each failed cluster in parallel debugging. Results show that the approach of simultaneously locating multiple faults is effective according to the descending suspiciousness of program elements in each failed cluster, through clustering the failed program execution traces.
Keywords/Search Tags:software debugging, dynamic slicing, fault localization, association analysis, cluster analysis
PDF Full Text Request
Related items