With the development of blockchain technology,smart contracts on public chains such as Ethereum are also widely known.Smart contracts can be used to create a variety of decentralized applications such as digital currency exchanges,decentralized finance platforms,games,and marketplaces,among others.However,the ensuing security incidents not only result in serious losses to users’ assets,but also significantly reduce users’ trust in decentralized applications.Therefore,smart contract vulnerability detection is particularly important.Currently,vulnerability detection solutions based on methods such as symbolic execution and taint analysis have problems such as complicated processes,insufficient automation,and low detection efficiency,which make it difficult to meet the requirements for automation and detection scale under the rapid growth of smart contracts.The development of deep learning technology is conducive to automatic processing of batch vulnerability detection tasks,reducing manual intervention and improving efficiency.Therefore,this paper proposes a smart contract vulnerability detection method combined with deep learning technology.After realizing the task of vulnerability classification prediction,code similarity matching is further used to locate vulnerabilities.The main research content of this paper is as follows:(1)A method for vectorizing smart contracts based on sentence tree sequences has been proposed,utilizing data preprocessing techniques from natural language processing,fully utilizes the syntax and semantic information of Solidity smart contract to realize the feature expression.Firstly,smart contracts are subjected to syntax analysis to convert them into AST structures.To address the problem of gradient vanishing caused by the large structure of tree-recursive neural networks,the AST structure is divided into statement subtrees at the statement level.Then,a bottom-up recursive neural network is used to extract features from the statement subtrees and generate feature vectors.(2)A bidirectional GRU model based on attention mechanism was constructed.By learning the characteristics of the statement tree sequence of smart contract,a multi-label classification task was realized for five typical vulnerabilities: reentrant vulnerability,Unchecked low-level call,timestamp dependence,access control,denial of service attack.The comparative experiments on three aspects of dataset shows that the vectorization method proposed in this paper and the constructed model have improved accuracy,and have more advantages in terms of efficiency in the scenario of batch smart contract vulnerability detection.(3)To address the issue that the granularity of the current deep learning classification model for smart contract vulnerability detection is too coarse and cannot accurately locate vulnerabilities,a vulnerability localization scheme based on code similarity matching is proposed.Firstly,the smart contract vulnerability database is constructed by collecting the open data of various platforms,combining with the manual screening of the key statements to retain the vulnerability and generating the idiom sentence tree,and Doc2 vec is used to achieve the transformation of the feature vector.Secondly,the same data preprocessing method is applied to generate feature vector matrices for the test contracts.In the vulnerability localization phase,the region whose similarity is higher than the threshold value is obtained by the similarity calculation method,and then the mapping between feature vector and source location is used to complete the localization.Finally,the feasibility and effectiveness of the scheme are demonstrated by the evaluation of coverage and precision in experiments. |