Font Size: a A A

The Technology Of Parallelize Ethereum Blockchain Data Collection And Storage

Posted on:2022-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HeFull Text:PDF
GTID:2518306764976799Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
As an infrastructure that stores all historical records on the chain,the blockchain is widely used and the amount of data it carries is also increasing.These massive volumes of data are extremely valuable and can be used for attack detection,financial analysis,underground industry tracking,etc.However,the information of the blockchain cannot be collected and stored in a reasonable time range due to the huge data volume,which seriously hinders the exploration of the blockchain.To tackle this problem,this thesis proposes a parallelized Ethereum data collection and storage technology and implements a prototype framework for big data collection and storage on Ethereum blockchain,which is the world's largest blockchain platform that supports smart contracts.This is not an easy job,and it mainly faces the following two challenges:(1)The data on blockchain is complex and heterogeneous.A large amount of heterogeneous data is recorded on the Ethereum blockchain,which is generated and stored in different ways,including types of blocks,transactions,smart contracts,etc.Therefore,according to the characteristics of these data,this thesis records and stores them in a clear data structure.In addition,the fixed structured data set is not scalable.For this reason,this thesis also proposes the tracking and replay function.Users can construct data set with transaction granularity through the tracking and replay function.(2)The collection and storage of data are time-consuming.The massive amount of data contained in the block will lead to high time loss for collection and storage of block data.To this end,this thesis utilizes the scalability and rich computing resources of the distributed master-slave architecture to accelerate the collection of block data.Specifically,this thesis collects and stores data in different block height intervals by running all slave nodes in parallel.This is extremely challenging,because it is often necessary to synchronize a complete blockchain to complete the proof of work,and if each data collection node needs to process the complete blockchain,it will inevitably cause a lot of time loss.To solve this problem,this thesis proposes the state snapshot technology,which can save the block state,so that the slave node can complete the data collection work only by synchronizing the block segment.In addition,this thesis also proposes more fine-grained optimization,including dynamic load balancing,rotation,and local execution trajectory collection three optimization techniques to further accelerate data collection and storage.To evaluate the validity,this thesis deploys the framework in a cluster consisting of1 master node and 10 slave nodes to collect and store block data of 3 million to 4 million heights of the Ethereum blockchain.The performance improvement brought by various optimization techniques is obtained through experiments.The final experimental results show that in the case of full optimization,the minimum overhead in data acquisition is7%=(220-204)/204,data collection and storage time is 3.72 times faster than singlenode sequential data collection and storage.Besides,experiments also prove that the framework has good scalability because adding more data collection nodes will improve the performance of the whole framework.Finally,This thesis proposes two applications,Transaction Dependency Check and Token Smart Contract Authentication Vulnerability Detection,to verify the validity of the dataset.
Keywords/Search Tags:Blockchain, Ethereum, Distributed, Data Collection, Storage
PDF Full Text Request
Related items