Font Size: a A A

Research And Implementation Of Efficient Clonal Detection Method For Massive Open Source Code

Posted on:2019-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:F TuoFull Text:PDF
GTID:2428330611993643Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of open source software,software reuse has become an important means of project development.Along with a large number of copy-paste and modification,a large number of similar code fragments are generated between software projects,which called code clone.This phenomenon is largely artificial and beneficial to software development in most cases,but may also have a negative effect on the quality of the software.Within the software,there is a potential threat to maintainability and consistency,and secondly,it can be harmful to originality and copyright between different software projects.Analysis massive open source code for efficient clone detection,and find potential clones within and between projects is very difficult,there are a lot of challenges,such as diversification of code clones,diversification of code clone detection requirements,and massive potential code clone in large-scale software.This paper focuses on the plagiarism detection problem of massive code clones between software projects,and we analyze and evaluate detection methods based on availability,scalability and resource occupancy.On this basis,this paper implements the efficient and scalable project-based open source code clone plagiarism detection service,named CopyCat,which is based on the Code Cloud.It's designed for code clone detection and copyright maintenance between open source projects.The main contents of this paper are as follows:(1)Evaluated the code clone detection methods and improved with loop and parallel optimizationThe existing code clone detection methods were te sted and analyzed by BigCLoneBench from the aspects of applicability,scalability and resource occupation,NiCad and CloneWorks were selected as good tools.After optimization,the unnecessary in-project detection of NiCad is deleted during the inter-project inspection,and the time cost is greatly reduced without changing the detection result.In CloneWorks,the optimal code parsing method is selected by test in four language.The name tag of project is used to extend CloneWorks into a code clone detection tool between two projects.In the code parsing conversion and result conversion reporting stage,the parallel optimization is performed.(2)Design of automatic adaptation method for clone detection of multi-granularity multi-programming language project between projectsAiming at the problems of multiple programming languages between software projects,four programming language matching and multi-language concurrent clone detection mechanisms are designed.For large-scale code clone detection problems,the mechanisms of estimating code size,allocating resource and choosing detection method are designed.In response to the user's demand of code clone detection,parameter selection mechanism of code clone detection is introduced.(3)Designed to implement an inter-project code clone detection system for massive open source codeWe cooperate with Open Source China's Code Cloud,implementing a tool to support multi-language and large-scale code clone detection scenarios.the Code Cloud community's extensive open source project resources are integrated with platform optimization.The mechanism of streamlined code clone report is made for the user to query the content of similar code fragments,which displays the detection result results visually.At the same time,we integrate the Gitee IDE to facilitate online editing and submission with the similar code fragments...
Keywords/Search Tags:Open Source Software, Software Reuse, Code Clone Detection, CopyCat, Gitee
PDF Full Text Request
Related items