Font Size: a A A

Using data mining techniques to improve software reliability

Posted on:2007-01-07Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Li, ZhenminFull Text:PDF
GTID:1448390005465558Subject:Computer Science
Abstract/Summary:
Reliability has become ever important. Unfortunately, software errors continue to be frequent and account for the major causes of system failures. In order to facilitate bug detection and fixing, it would be highly beneficial if we can first analyze and understand the bug characteristics, and then detect the bugs automatically.; This dissertation proposes a novel approach that applies data mining techniques to extract information in large software and exploit such extracted information for bug detection. Thanks to the distinguished characteristics in data mining, this approach can efficiently discover useful information from large software code and documents.; Specifically, to understand the bug characteristics, this dissertation proposes applying text classification and information retrieval techniques to automatically classify tens of thousands of bug reports. The study shows that this approach can help developers analyze and understand bug characteristics efficiently, and facilitate testing and bug detection so as to improve reliability.; One of the findings in bug characteristic study is that semantic error is the major root cause of bugs. Semantic bugs are application specific and so it requires knowledge about the application to detect them. This dissertation proposes using data mining techniques to automatically detect semantic bugs, including PR-Miner that extracts programming rules and detects violations, and CP-Miner that detects copy-pasted code and related bugs.; Programs usually follow many implicit programming rules. When these rules are violated, bugs can be easily introduced. Therefore, it is highly desirable to automatically extract such rules and also to automatically detect violations. PR-Miner uses frequent itemset mining to extract implicit programming rules from large software code, requiring little effort from programmers and no prior knowledge of the software. In addition, PR-Miner can also detect violations to the extracted programming rules, which are strong indications of bugs.; Copy-pasted code is very common in large software, but it is prone to introducing bugs. CP-Miner uses frequent sequence mining to efficiently identify copy-pasted code in large software, and detects copy-paste related bugs. In order to further understand copy-paste in system software, this dissertation also analyzes some interesting characteristics of copy-paste in Linux and FreeBSD.
Keywords/Search Tags:Software, Data mining techniques, Characteristics, Bugs, Programming rules, Dissertation, Understand
Related items