Font Size: a A A

Code Similarity Visual Detect System Research Based On DFA And Feature Quantification

Posted on:2016-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:G Y ZhangFull Text:PDF
GTID:2308330479985368Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, the program plagiarism is widespread, not only affects the quality of teaching in the university computer-related courses, but also brings software copyright disputes to commercial software development. For program plagiarism issue, there have been many studies on program code similarity detection. Evolving from the early attribute counting with low precision of detection, to the main method of detection is string matching algorithm of the structure of the measurement techniques at present the most popular of which is the string matching detection of using the token replaces code. Although existing methods use token replace code before matching, there is still room for improvement of code recognition, and the purpose of the method is mainly focused on the display of the similarity values. In allusion to the defects of existing code similarity detection method, this article has brought up another detection method based on DFA and feature quantification. In order to improve the identification of the code, in the lexical analysis phase according to the characteristics of programming language word recognition and semantic type identification have been designed, and the features of the recognized words have been quantized. In the code detection phase, such method creates multiple linear function used to calculate the quantized value of the characteristics of code statements, discusses the multiple linear regression mathematical model in coefficient of equation group for the further improvement of the code identification. In order to improve the detection accuracy, two matching operations have been designed in the matching process, the first is matching the quantized value of the characteristics of code statements, and then the matching of semantic type of the code with the same quantized value has been followed. In order to improve the readability of the test results and a more intuitive analysis of plagiarism, the quantitative information of the first two stages as input data of visualization stage is generated diverse results by visualization tools. At last, the experiments of this method compared with the existing traditional detection methods has verified that this method has obvious advantage, and proved that it is accurate within a certain error range.
Keywords/Search Tags:Similarity Detection, DFA, Feature Quantification, Visualization
PDF Full Text Request
Related items