Font Size: a A A

Research And Application Of Program Code Similarity Detection Method

Posted on:2013-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z J HuFull Text:PDF
GTID:2248330374488939Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Programming language courses are very important for computer science and technology, the teaching of programming language is particularly important. With the Internet technology’s developing rapidly, information education becomes a new modern education mode. However, the Internet brings the way for program plagiarism when it provides convenience for education. Program similarity detection can not only help teaches resolve the plagiarism problem of tasks, but also help to indentify the software copyright.The technology of code similarity detection has been developed for several decades. They all belong to these two kinds of technology: Attribute Counting and Structure Metrics. Attribute Counting has been proved that it worked only for copying completely. Now mature systems in the world all use Structure Metrics based on GST string matching algorithm. This paper analyzes various program detecting system, then expand the research of code similarity detection. First, this research analyzes the code similarity detecting technology based string matching, and improved. Second, this research is to fully tap the structure of program code, and present a way based sub-graph isomorphism algorithm. The new way based on backtracking search transfers program code to program dependence graph (PDG) according to node types and edge types. The structure information of code is mainly consisted by data dependence and control dependence. PDG is produced by transferring lines of code to one node type of seven and search the data and control dependence using code-level tree. In order to remove useless code, the main data flow graph is extracted from PDG. Then the max common sub-graph is searched by backtracking from main data flow graphs of two piece of code that to be detected. The similarity is computed based on the size of common sub-graph. It is effective to remove some structs and variables and improve the accuracy in this way.A code similarity detecting system based on GST and sub-graph isomorphism is designed and implemented. Every module in the system is detailed described. At last two program sets are tested with this system. It is conclude that this system can detect all common plagiarism, also reach90%accurate performance to detect higher level plagiarism.
Keywords/Search Tags:code plagiarism, similarity, GST algorithm, sub-graphisomorphism
PDF Full Text Request
Related items