Font Size: a A A

Plagiarism Detection Of Multi-threaded Programs Via Extraction And Representation Learning

Posted on:2022-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2518306554964709Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Software plagiarism has seriously threatened the healthy development of the software ecosystem.The most effective method in the field of software plagiarism detection is software dynamic birthmarking technology.However,as multi-threaded programs become the mainstream,traditional dynamic birthmarking technology cannot resist the interference of multi-threaded program thread interleaving,resulting in excessively random detection performance and even misjudgments.The existing thread-aware birthmarking methods for multi-threaded programs generally extract birthmarks from a single execution trace corresponding to a certain input,and the methods themselves show some limitations.In addition,the birthmark construction methods of the existing methods rely on manual extraction and empirical observations to a large extent,and have not undergone any real training,making them difficult to generalize to unknown samples,rendering poor generalization ability.To deal with these problems,we propose three novel multi-threaded program plagiarism detection methods via pattern extraction and representation learning.The main contributions of our work can be summarized as following:(1)A multi-threaded program plagiarism detection method based on behavioral motifs is proposed.Through the dynamic execution trace set of multi-threaded programs,this method uses the carefully designed algorithm that prunes the traces,matches and expands the gram,to extract the behavioral motifs that can characterize the semantic information of the program.On this basis,the thread-aware motifs birthmark is constructed.The experimental results show that motifs birthmark is a reliable thread-aware birthmark,which can effectively resist the current mainstream code obfuscation.The detection system integrated with motifs birthmark shows better detection performance than existing methods in terms of different evaluation metrics.(2)A plagiarism detection method for multi-threaded programs based on frequent pattern mining is proposed.By monitoring the dynamic operation process of multi-threaded programs,the program execution trace set is captured,from which frequent patterns are extracted with data mining technology.After reduction performed,the dynamic thread-aware birthmark FPBirth is constructed.The experimental results show that FPBirth birthmark has good thread awareness and resilient against code obfuscations.Compared with other methods,the plagiarism detection system integrated with FPBirth can better handle plagiarism detection of multi-threaded programs.(3)A multi-threaded program plagiarism detection method based on Siamese neural network is proposed.The method obtains the high-level semantic feature vector of the program through representation learning based on a designed deep neural network model.The semantic feature vector of the plaintiff and the defendant is fused by using the Siamese network structure.The fused feature vector is sent to the multi-layer perceptron for similarity measurement,rather than artificial similarity measurement.Finally,the similarity of the plaintiff and the defendant is obtained by bagging the similarity value under multiple inputs.Based on the proposed method,a multi-threaded program plagiarism detection system NeurMPD is developed.The experimental results based on a public software plagiarism sample set demonstrate that NeurMPD achieves encouraging detection effectiveness and excellent resilience and credibility,outperforming other alternative techniques.
Keywords/Search Tags:software plagiarism, multi-threaded program plagiarism detection, thread-aware birthmark, pattern extraction, representation learning
PDF Full Text Request
Related items