Font Size: a A A

Research On Large-scale Structured And Semi-structured Biodata Query Method

Posted on:2019-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q R LiuFull Text:PDF
GTID:2428330566998090Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The start-up and smooth implementation of the Human Genome Project has made the study of life and science a step into the post-genome era.The growth of biological data related to various genomics,proteins,an d diseases has shown an explosive growth trend.Research on these massive biological data will give life.Science and technology provide broad prospects.However,one after another is the enormous computational pressure on traditional computing devices.How to excavate valuable information from "massive" biological data is the main purpose of bioinformatics research,and it is also the main bottleneck restricting the development of biology.Therefore,there is an urgent need to process and analyze large-scale biological data.The technologies of big data cloud computing developed in recent years have pointed out a new direction for the management and analysis of massive biological data.This article discusses how to us e large data and related knowledge and principles of the cloud platform to achieve the storage and efficient query of large-scale structured and semi-structured biological data.In this paper,a large-scale structured and semi-structured biological data storage and query method based on the distributed computing platform Hadoop and its distributed processing framework Map Reduce is studied by using the related technologies of big data.Firstly,we use the distributed database Hbase to store the large-scale biological data after mapping transformation,and combine the distributed parallel computing framework Map Reduce to design corresponding large-scale biological protective device query algorithm to realize efficient processing of massive biological data.Then the indexing method of non-primary key based on Hbase is proposed,and then the performance of large-scale biological data query method is optimized.Based on this,we have developed a large-scale biological data management system that covers large-scale biological data storage,query preprocessing,query,and non-primary key indexing.The system uses a distributed database Hbase to store heterogeneous large-scale structuring and The semi-structured biological data,through the corresponding mapping transformation model,realizes unified query processing of heterogeneous biological data.A t the same time,the system makes full use of the advantages of the distributed parallel framework Map Reduce,and it is well adapted to the ever-increasing demand for large-scale biological data management and improves the processing efficiency of biological big data.Finally,through a series of comparative experiments,the proposed algorithm and system are verified.Experimental results show that compared with the traditional storage query processing methods,the related methods presented in this paper have obvious advantages in processing performance.
Keywords/Search Tags:Structured, semi-structured, biological data, big data, distributed computing platform, non-primary key indexing
PDF Full Text Request
Related items