Font Size: a A A

Research On Similarity Measure Method Of Program Code

Posted on:2016-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhuFull Text:PDF
GTID:2308330464467969Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of computer network technology makes the access to information resources more convenient and fast, but at the same time plagiarism between resources becomes more and more easy and difficult to prevent. For example, in operations and program code programming courses online test exam, the mutual plagiarism phenomenon between the students is very common. This phenomenon has seriously affected the quality of teaching and improving the quality of students, and more can not guarantee the fairness of the examination. In addition, in the software business field, software products appear copyright dispute incidents have also occurred frequently. The program code similarity measure method and its application for further effective research will curb the largely design in the program of the curriculum, and strengthen the protection of software business in the field of intellectual property.This paper analyzes the related research at home and abroad program similarity measure field, which made a comparative analysis of common similarity metric detection method. And on this basis, the program code similarity measure method conducted the following research. First of all, for each character string, GST traditional string matching algorithm needs one by one during the comparison string, resulting in a relatively large time complexity of the problem, so puts forward Java multi-thread parallel design based on matching will improve the matching process of GST algorithm; then according to the GST algorithm’s time complexity and matching features, the design of a method in has no effect on the semantics of the conditions of its length was shortened, further reduce the matching time complex; finally, according to the use of tools that generation of abstract syntax tree will contain a lot of redundant information that can easily cause the problem of a great waste of resources. This paper presents the design of a generation of abstract syntax tree algorithm, which creates a child node by first creating a parent node, then associated its structure and methods of classes that reflects the useful information of program semantics structure of the tree, and design the method of class storage table and information table for data storage and update which convenient use of data. Through the algorithm design of abstract syntax tree, the paper ergodic the abstract syntax tree after the construction of the grammar tree, which can generate token sequence that express the semantic structure of the string. Then the calculation combined with the impoved string matching algorithm of similarity can get the result of the similarity. And the program code in this paper has higher detection efficiency.Based on the above method on the theory, this paper design and implementation of an experimental system based on the Java language that for the program code plagiarism detection, the system can calculate similarity between pairs of source program. Results detection based on the same test procedure for the system with Moss are compared and analyzed. Experimental results show that this research contents of this paper can effectively detect various plagiarism means, and has high detection efficiency in time. Reliability are high in detection precision and accuracy.
Keywords/Search Tags:Similarity measurement, GST algorithm, Token character string Abstract syntax tree, Plagiarism detection
PDF Full Text Request
Related items