Recent study shows that large software suites contain many implicit programming rules, which indicate the intrinsic features and specific requirements of the program. When these rules are violated by programmers who are unaware of or forget about them, defects can be easily introduced. Implicit programming rules are polymorphous which means that rules may contain multiple types of program elements. They are also location-scattered which means that rules may span across multiple functions or files. So these rules are very difficult to be found by manual review, which necessitates a pursuit of automatically extracting and documenting these rules. However, previous work in this direction just focuses on simple function-pair based programming rules, not yet considers that compose of multiple program elements of various types such as functions, variables and data types. Additionally, in order to accomplish the task of extracting programming rules, they also require programmers to provide rule templates or some specifications.To triumph over these issues, this paper proposes a general and efficient method to automatically extract the rules and also to automatically detect violations to the rules extracted. It leverages a data mining technique called closed frequent itemset mining to mine programming patterns from large software code, which then are used to generate programming rules. It introduced the concept of PoStive orDer Rule (PSDRule) to avoid generating multiple redundant rules from the same programming pattern. Based on these efforts, we also propose an efficient violations detecting algorithm to detect program segments that are not consistent with the extracted rules, which are strong indications of bugs. The whole process is automatic, requiring little effort from programmers and no prior knowledge of the target software.The experiment results on multiple large open source projects show that our method can automatically extract lots of implicit programming rules in general forms(without being constrained by any fixed rule templates or specifications) and also can efficiently detect the code segments that violate the extracted rules. By auditing the experiment results, we discover that our method extract multiple reasonable rules and detect some real bugs. |