As an emerging technology,blockchain plays an important role in finance,logistics,business services,and other fields.Among its notable applications,smart contracts have stood out,operating autonomously and exhibiting significant success.However,with the requirement for improved programming efficiency,the reuse rate of smart contracts has increased considerably.Unfortunately,this has led to the widespread dissemination of defective code,including vulnerable code,which has resulted in frequent security issues.These issues not only cause tremendous economic losses but also contribute to a crisis of public trust in blockchain technology.Smart contract code clone detection can discover code reuse and reduce the security risks caused by it,which provides a new direction for smart contract security research.Currently,deep learning is the mainstream approach for similarity detection,where feature extraction is performed on the source code or byte code level,followed by feature matching to assess the similarity between codes.Despite its alignment with current challenges in this field,deep learning has inherent limitations,including weak interpretability and the issue of gradient disappearance in neural networks,which can compromise detection accuracy.To address these concerns,this paper delves into two code similarity detection methods,focusing on a detailed investigation of the issues above:(1)Smart contract code clone detection technology based on syntactic-tree matching.Our approach involves constructing an abstract syntax tree of the contract source code and dividing it into multiple syntactic-trees through a pre-order traversal method.Each syntactic-tree represents a complete line of code in the source code.To extract semantic information from the syntactic trees,we employed a syntactic-tree encoder based on a self-attention mechanism.By calculating the semantic distance between the syntactic-trees’ embeddings,our technology can detect their similarity.Finally,we conduct a comprehensive analysis of the relationships between the syntactic-trees in different contracts to determine the overall similarity of the contracts.In addition,our approach provides an interpretable analysis at the statement level.(2)Smart contract function clone detection technology based on structure-tree matching.To analyze the structure and features of the smart contract function source code,we first construct a function into an abstract syntax tree.Then,divide it into multiple structure-trees using post-order traversal,while considering the specific syntax structure in the smart contract.Next,we extract various features from the structure-trees and use them to build a Light GBM network model that predicts the similarity between structure-trees.By combining the similarities between structure-trees from different functions,we can determine the similarity between functions and provide an interpretable analysis at the structure level.Experimental results demonstrate that the incorporation of subtree ideas is an effective strategy for addressing issues related to long-range semantic extraction and disruption of primary grammatical structure.Additionally,our technique improves interpretability in detecting cloned code within smart contracts.By enabling the identification of similar specific lines and structures,facilitating subsequent code repairs by developers. |