Font Size: a A A

Design And Implementation Of Cross-Language Code Clone Detection System

Posted on:2017-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiuFull Text:PDF
GTID:2428330590968466Subject:Software engineering major
Abstract/Summary:PDF Full Text Request
To attract more developers or to support different platforms,open source organizations or business companies tend to re-implement their projects using different programming languages.Researchs show that about 33% of the existing projects have multiple language versions.In these multi-language projects,developers are required to maintain the consistency between different versions.They have to implement same features or fix same bugs on different versions,which will produces lots of similar functionality code fragments between different language versions.Such kind of code fragments are called cross-language code clones.Generally,code clones are considered harmful as they increase software development and maintenance cost.For example,when a modification is performed to a cloned fragment,all other instances of this fragment may require the same modification.However,it is difficult to avoid code clones in multi-language projects.Therefore,cross-language clone detection becomes an important part of the maintenance.However,most tools can only detect code clones in the same language,they cannot detect cross-language clones such as Java and C# effectively.Traditional code clone detection tools like DECKARD,CCFinder,CP-Miner,are limited to the same language;while cross-language clone detection tools like C2D2,they can only detect cross-language clones based on CIL since their tools are based on the.NET Common Intermediate Language(CIL),they cannot detect other cross-language clones that have no intermediate languages.Code revision similarity means code fragments that are changed to implement the same features or fix the same bugs in the revision history.In multi-language projects,there are lots of similar code changes,which reflects the change consistency of the developers and the consistency leads to lots of cross-language code clones.Therefore,cross-language code clones can be detected by inspect the revision similarity of code fragments.Full-text search technique is a document retrieval technique which matches all texts in documents and the query terms,which is widely used in pure-text information retrieval.Variable names and method names in the code fragments of open source projects can be the input of full-text search engine by removing language keywords,stop-words and API conversion,thus can be used to implement search and matching of code fragments.This thesis proposes a new cross-language clone detection approach on multi-platforms based on revision similarity and full-text search technique.The approach analyze the revision history of projects,compare the similarity of code fragments and leverage full-text search technique to match code fragments,which solve the cross-language code clone detection problems effectively.A cross-language clone detection system which is called DiffMatcher is implemented based on the full-text search engine,Elasticsearch in this thesis.Experiments on two open source projects,ANTLR and FpML,show that DiffMatcher can detect part of code clones in Java and C# projects effectively,which achieve the initial target.The following are the main contributions of this thesis:1)Analyzes the pros and cons of the existing works,introduces the cross-language clone detection problems which traditional clone detection tools encountered.2)Introduces the concept of code revision similarity,formalizes cross-language clone detection into a revision matching problem and leverages full-text search technique to solve code matching problems in cross-language clone detection.3)Implements a cross-language clone detection system based on the above approach,the system is based on Elasticsearch.4)Selects appropriate open source projects for benchmarks,and evaluate the effectiveness of the tool.
Keywords/Search Tags:clone detection, information retrieval, data mining, revision similarity
PDF Full Text Request
Related items