| Code similarity detection technology,which means using a similarity detection algorithm to calculate the similarity between codes,is an important approach to identify software copyright and to judge code plagiarism.In contrast to the traditional manual detection,code similarity detection technology can not only calculate code similarity and locate the plagiarism quickly and accurately,but can also efficiently resist some complicated plagiarism approach such as rename the variable or change the order of statements.This paper summarizes a variety of standards and technologies of code similarity detection,then respectively apply fingerprint generation algorithm and string matching algorithm to detect the code similarity detection.The main contributions are as follows:1.proposes a code similarity detection algorithm base on Smith-Waterman algorithm.The algorithm make improvements on Smith-Waterman algorithm to fit the circumstance of code similarity detection,including generate tokens of code,split the tokens by functions and define marking standards.2.proposes a code similarity detection algorithm based on Winnowing algorithm.Unlike text fingerprint generation,the algorithm generates fingerprint from tokens instead of text and then calculates code similarity by comparing fingerprints.3.the paper proposes the parallel scheme for these two code similarity detection algorithms based on shared memory model.For the algorithm based on Smith-Waterman,it is implemented in a data-parallel form.For the algorithm based on Winnowing,it is implemented in a task-parallel form.4.the paper tests and compares the two algorithms with JPlag in 3 experimental data sets.The results show that two algorithms can detect a variety of code plagiarism and has better performance in change order of statements or functions than JPlag. |