Font Size: a A A

Research And Implementation Of Key Technologies For Efficient Biological Data Transmission Based On Distributed Computing System

Posted on:2021-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:C W ZhangFull Text:PDF
GTID:2480306308475154Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Driven by the "human genome project",it also brought a revolution to the biological industry,making the output of biological data more and more powerful.Compared with traditional data,the internal structure of biological data is extremely complicated,the amount of data is huge and there are many types,the data sources are wide,and the requirements for transmitting QOS are higher.And biological data is the basis of life science research.The transmission of biological data is an important prerequisite for the integration and sharing of biological data.However,how to transmit biological data efficiently is still a very difficult and significant problem.Based on the investigation and analysis of the existing data transmission methods,through in-depth research on data transmission based on distributed computing system,and proposed improvement schemes for its key transmission technologies.The main contributions of the paper are as follows:1.Under the overall architecture of efficient transmission of biological data based on distributed computing system,two scenarios are discussed.Firstly,in the business scenario where there is no copy of biological data,this paper proposes a migration prediction algorithm based on the historical access table of biological data.That is,overload nodes and non-overload nodes are obtained by analyzing the historical access table of biological data.Then build the resource migration relationship between overload node and non-overload node.Then each overload node selects the most popular biological data corresponding to that node for migration,in addition,in order to prevent unnecessary resource migration,a prediction algorithm based on the number of accesses to biological data is also proposed to predict the access trend of biological data in advance.That is,to decide whether to migrate according to the change of the access trend of biological data.The algorithm can reduce the time spent in resource query,reduce the load of overloaded nodes,and improve the transmission speed of biological data.The experimental results show that the proposed algorithm has a great improvement in the efficiency of biological data transmission.2.Another scenario is for the multi-copy business scenario,Based on the replica technology in the distributed system,this paper proposes a method of multi-replica resource deployment based on biological data to further improve the transmission speed of biological data.That is to say,the overload node and non-overload node are also obtained by analyzing the historical access table of biological data,then,by using the target node selection algorithm based on linear programming,the mapping relationship between overloaded node and non-overloaded node is constructed.Then,the biological data copy is copied according to the mapping relationship.In order to allow the number of copies to dynamically change with the system state,Furthermore,an adaptive algorithm based on the number of copies is proposed to ensure that the copies of biological data can change adaptively with the status of the distributed system.Experimental results show that the proposed algorithm can greatly improve the transmission efficiency of biological data.3.Design and implement an efficient biological data transmission system.On the basis of requirements analysis,determine the system design environment and select the required system development framework,then through the detailed design of the system module,including data storage module,data sharding module,resource allocation module,user management module.In addition to the realization of the basic functions of the system,the front-end visual interface was further developed for the convenience of users.Including registration interface,upload data interface,algorithm operation interface,biological data query interface,information management interface and so on,through the proposed algorithm system implementation,And has been applied in the national biological center biological data center research project,so that the proposed algorithm has a stronger practicality.In addition,this study can also be applied to the synchronization and transmission of other large-capacity,multi-source heterogeneous data.
Keywords/Search Tags:Biological Data, Distributed computing, Transmission efficiency, Resource Migration, Data Replication
PDF Full Text Request
Related items