A Study On The Clone Detection Technology Of Smart Contract Code Through AST Subtree Matching

Posted on:2024-09-28

Degree:Master

Type:Thesis

Country:China

Candidate:X J Xu

Full Text:PDF

GTID:2568307139996059

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As an emerging technology,blockchain plays an important role in finance,logistics,business services,and other fields.Among its notable applications,smart contracts have stood out,operating autonomously and exhibiting significant success.However,with the requirement for improved programming efficiency,the reuse rate of smart contracts has increased considerably.Unfortunately,this has led to the widespread dissemination of defective code,including vulnerable code,which has resulted in frequent security issues.These issues not only cause tremendous economic losses but also contribute to a crisis of public trust in blockchain technology.Smart contract code clone detection can discover code reuse and reduce the security risks caused by it,which provides a new direction for smart contract security research.Currently,deep learning is the mainstream approach for similarity detection,where feature extraction is performed on the source code or byte code level,followed by feature matching to assess the similarity between codes.Despite its alignment with current challenges in this field,deep learning has inherent limitations,including weak interpretability and the issue of gradient disappearance in neural networks,which can compromise detection accuracy.To address these concerns,this paper delves into two code similarity detection methods,focusing on a detailed investigation of the issues above:(1)Smart contract code clone detection technology based on syntactic-tree matching.Our approach involves constructing an abstract syntax tree of the contract source code and dividing it into multiple syntactic-trees through a pre-order traversal method.Each syntactic-tree represents a complete line of code in the source code.To extract semantic information from the syntactic trees,we employed a syntactic-tree encoder based on a self-attention mechanism.By calculating the semantic distance between the syntactic-trees’ embeddings,our technology can detect their similarity.Finally,we conduct a comprehensive analysis of the relationships between the syntactic-trees in different contracts to determine the overall similarity of the contracts.In addition,our approach provides an interpretable analysis at the statement level.(2)Smart contract function clone detection technology based on structure-tree matching.To analyze the structure and features of the smart contract function source code,we first construct a function into an abstract syntax tree.Then,divide it into multiple structure-trees using post-order traversal,while considering the specific syntax structure in the smart contract.Next,we extract various features from the structure-trees and use them to build a Light GBM network model that predicts the similarity between structure-trees.By combining the similarities between structure-trees from different functions,we can determine the similarity between functions and provide an interpretable analysis at the structure level.Experimental results demonstrate that the incorporation of subtree ideas is an effective strategy for addressing issues related to long-range semantic extraction and disruption of primary grammatical structure.Additionally,our technique improves interpretability in detecting cloned code within smart contracts.By enabling the identification of similar specific lines and structures,facilitating subsequent code repairs by developers.

Keywords/Search Tags:

smart contracts, code reuse, code clone detection, abstract syntax tree, subtree matching

PDF Full Text Request

Related items

1	Design And Implementation Of Code Clone Analysis System Based On Sequence Matching
2	Pyreview:A Python Source Code Analysis Tool Based On Abstract Syntax Tree Differencing Algorithm
3	Research On Algorithm Optimization And Application Of Code Clone Detection Task
4	Automatically Based On The Abstract Syntax Tree And Static Analysis Of The Cloned Code Refactoring
5	Research On Code Clone Detection Based On Deep Learning
6	Research On Clone Detection Based On Intermediate Representation Of Source Code
7	Detection Of Function-based The Structural Clone And The Semantic Clone
8	Research And Implementation Of Code Clone Detection Technology Based On Deep Learning
9	Research And Implementation Of Code Plagiarism Detection Based On Subtree Tracking
10	The Research Of Code Clone Detected Method Based On Graph Neural Network