| The decentralized distributed system of blockchain provides a suitable operating environment for smart contracts,enabling them to automatically execute contracts without third-party guarantees,and can be effectively applied to business scenarios without third-party transactions.Therefore,smart contracts give the blockchain ecosystem the ability to adapt to more business scenarios.While blockchain is popular,smart contract security incidents are also occurring frequently.Because the valuable virtual digital assets in the blockchain ecosystem are managed by smart contracts,those smart contract security incidents often bring huge economic losses.Therefore,smart contract security vulnerability detection has been the focus of relevant researchers in recent years.However,due to the short release time and rapid development of blockchain,the smart contract data is sufficient and easy to obtain,while the smart contract vulnerability data set with tags is difficult to obtain,so it needs to spend a lot of manpower and time to verify the smart contract vulnerability in practical research and application.Active learning can reduce the labeling cost as much as possible while maintaining the performance of the classification model.Uncertain sampling is a common sampling strategy for active learning.However,the uncertainty of machine learning model itself will affect the uncertainty evaluation of samples in active learning,and the marked vulnerability data will have a negative impact on the model due to tag errors or outliers.In addition,the direct application of active learning in deep learning will also lead to insufficient model learning.According to the above problems,this paper proposes a smart contract vulnerability detection method based on active learning from the perspective of reducing data annotation.The specific research contents of this paper are as follows:(1)Smart contract vulnerability labeling technology based on backward Bayesian active learning.Through the study of active learning,a new smart contract vulnerability labeling framework(BwdBAL)is proposed.The framework mainly includes two stages,namely,forward active learning process and reverse active learning process.Specifically,BwdBAL uses forward active learning to select some sol files with more information from unmarked data sets to query their labels,and then combines them with the currently marked data set to form a new training set.Then to improve the generalization ability of the model in the process of reverse active learning,BwdBAL uses the reverse noise removal method to clean the labeled data set.Finally,the framework will continue to cycle through the first and second phases until the target effect is achieved.Extensive experiments have been carried out on the collected public data,and the results show that reverse Bayesian active learning can significantly reduce the cost of manual annotation and improve the ability of automatic model annotation.(2)Smart contract vulnerability detection technology based on deep semisupervised active learning.Since vulnerability data annotation,we further explore the conflict of model training logic between active learning and deep learning and propose a vulnerability detection framework based on deep semi-supervised active learning(ASSbert).First,ASSbert uses active learning to select some uncertain contract data samples from unmarked data sets for experts to annotate and merges the newly labeled samples into the existing labeled data sets to expand the training set.Then,to alleviate the problem of insufficient training sets,ASSbert uses semisupervised learning to select some unlabeled contract data samples with high confidence from unlabeled data sets.After the sample data are pseudo-labeled,the second training set expansion operation is performed to further enhance the performance of the classification model.Similarly,ASSbert updates the training classification model Bert through the two processes of active learning and semisupervised learning.Finally,the experiment proves that the ASSbert method proposed in this paper is better than other baseline methods in the case of a small amount of data annotation.(3)Based on the above two innovative technologies,this paper designs and implements a smart contract vulnerability labeling and detection system based on active learning,which mainly includes three modules:contract data service,vulnerability detection and model center.This system can provide smart contract vulnerability data intelligent labeling and vulnerability detection and other functions and is designed to help contract developers or relevant researchers to label and detect vulnerability data to ensure the security of smart contracts more efficiently. |