Font Size: a A A

The Duplicate Code Detection Based On AST

Posted on:2016-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:C WuFull Text:PDF
GTID:2308330461484841Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the process of software development and maintenance, the role of refactoring techniques increasingly brought to the attention of the scholars and the programmers.As an effective way to reduce the code smell of software, refactoring has gradually become a hot field of software engineering. Through refactoring, can reduce the bugs in the software, optimize the internal structure of software, and improve software quality, prolong the life time of the software. However, since the first step in the implementation of refactoring work, which named code smell detection, still needs to be done manually, the development and application of refactoring technology are constrained.The article selected the duplicate code, which is the most common code smell,as the research object. Depend on the research of domestic and foreign existing duplicate code detecting technology, the article proposed a detection method of duplicate code based on the abstract syntax tree: the abstract syntax tree will be taken as the middle description pattern here, on the basis of the use of Micro CSParser which is the simplify of an open source project named CSParser, and the customized abstract syntax tree grammar, the abstract syntax tree corresponds to source code was extracted from each source file in edit. The semantic of source code will be parsing to the abstract syntax tree in each node; In the process of building the abstract syntax tree, the details information of the parsing tree that helps to compile will be eliminated, so ware the redundant nodes, thus facilitating the extraction of the key information from the abstract syntax tree nodes; Through the traversal of abstract syntax tree on method level, the node information of the abstract syntax tree extracted from each function was stored in the Hashtable as string format.Next,the strings were handled by Sim Hash algorithm through dividing, hash,weighted, merge and dimension, the digital signature of information strings was generated. Further more, the duplicate code was judged by the hamming distance between digital signature.Finally, on the basis of the above theory, this paper designed and implemented a simple duplicate code detection tool based on the abstract syntax tree, and carried out a simple test. Although the function of tool is simple, and the detection of duplicate code has some certain limitation, the process is a positive exploration of automatically code smell detection. Through the function optimization and the realization of morekinds of code smell detection, it can promote the development of refactoring technique.
Keywords/Search Tags:code refactoring, code is a bad smell, the abstract syntax tree, SimHash algorithm
PDF Full Text Request
Related items