Font Size: a A A

Design And Implementation Of Code De-anonymization System For C++ Language

Posted on:2022-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:X K LiuFull Text:PDF
GTID:2518306338968439Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Code plagiarism occurs frequently,and a large number of malicious codes and malicious software appear on the Internet,causing serious harm to many persons.In response to the above problem,source code de-anonymization technology is used to detect the attribution of source code authors,which can effectively curb the above behavior.This source code de-anonymization technology involves the study of dynamic features.Due to different language features and different relevant databases between different programming languages,there is a problem of difficulty in program migration between different programming languages.This thesis studies the source code de-anonymization technology.The main work of this thesis is presented as follows:(1)Aiming at the difficulty of program migration from Python language to C++language,we proposed a dynamic style feature design and extraction scheme for the de-anonymization of C++programming language source code.The proposed dynamic feature design scheme is designed separately based on time and space,and it is different from the existed Python language-oriented dynamic feature design scheme.Some dynamic features are added into the proposed scheme,and they can characterize the author's programming style that is ignored in the scheme for Python.The proposed dynamic feature extraction scheme is different from the implementation of the existed Python language-oriented dynamic feature extraction scheme based on the internal modules of the Python language,and the proposed scheme is implemented by the performance analysis tools i.e.Gprof and Valgrind.Experimental verifications showed that the addition of dynamic style features significantly improved the accuracy of source code de-anonymization.When 230 programming authors were involved,compared with the experiments that did not include dynamic style features,the accuracy of source code de-anonymization have increased by about 8.5%.(2)A code de-anonymization system for C++language is designed and implemented in this thesis.This system is presented to the user in the form of a web page,and the identity information of the author of the source code uploaded by the user can be detected online.At the same time,in the system design,we fully consider the coupling degree of each module to ensure that the designed system can be quickly split in the case of a surge in business volume to resist the pressure of high concurrency.Finally,20 source code files are randomly selected,and their author identity information are correctly identified.The system can be effectively used for the code de-anonymization detection,which meets the expectation.
Keywords/Search Tags:Code de-anonymization, C++, Dynamic, Performance analysis
PDF Full Text Request
Related items