Bitcoin,as a decentralized digital currency,is not itself regulated by a central bank,which allows some criminals to use Bitcoin to engage in various illegal and criminal activities.Participants in transactions are difficult to regulate by law,and their true identities are difficult to identify.The de-anonymization work aims to provide a service to the regulatory system,and it is hoped that some analytical tools will enable better regulation of Bitcoin.In this thesis,we propose new address features and transaction data synthesis methods from two perspectives of Bitcoin addresses and transactions,respectively,and use a machine learning-based approach to study Bitcoin address classification and illegal transaction detection,which provides a reference for strengthening the regulation of Bitcoin transactions,improving the compliance of transactions,and reducing transaction risks.The main work of this thesis is divided into two parts as follows:(1)By collecting bitcoin addresses with labels,a dataset containing 5 categories such as exchange,mining pool,merchant services,gambling and coin mixing,totaling17,185 bitcoin addresses,is constructed.Fine-grained feature extraction is performed on this dataset,two new features,higher-order amount moments and sample distribution features based on feature importance,are proposed,and the transaction amounts of addresses and the distribution of values taken on the important features are analyzed and calculated,and finally classification learning is performed using multiple supervised classification models.Experiments show that the use of two new features can significantly improve the performance of the classification model in four metrics: accuracy,precision,F1 score and recall.(2)By analyzing the illegal transactions in Bitcoin,it is found that there is a serious category imbalance problem in Bitcoin transaction data.Therefore,an improved data synthesis method is proposed to synthesize transaction data using a multi-headed attention mechanism and a generative adversarial network based on pre-processing the data using a Gaussian mixture model.Experiments show that this new data synthesis method helps to detect illegal transactions more accurately and significantly improves the F1 score and recall of the model.This thesis puts forward new ideas and methods.By identifying bitcoin addresses and transactions,we can better explore the transaction behavior of addresses and prevent illegal activities.It also provides more data and information to help bitcoin regulators better understand the market and take appropriate regulatory measures. |