Font Size: a A A

Research On Key Technologies Of Multi-Source Biodata Integration Based On MongoDB

Posted on:2020-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:M YangFull Text:PDF
GTID:2428330590473235Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of life science,more research problems and data needs have emerged in this field,which has enabled life science research to acquire powerful data production capabilities and promote the development of various omics,to form biological big data.Due to the differences among research methods,the data formats generated are also distinct.When faced with the processing of massive amounts of heterogeneous biological data,the bottleneck problem of relational databases appears.For relational databases,their relationship-based fixed patterns and scalability are poor.Therefore,the NoSQL database with flexible data model is proposed,which solves the data pattern change brought by the large amount of data through horizontal expansion.Among NoSQL databases,MongoDB is the most widely used database.Therefore,there is an urgent need to solve the problem of mapping multi-source heterogeneous databases from various data formats to JSON format,and then store them on MongoDB for query processing.This paper mainly studies the key technologies of multi-source bio-data integration based on MongoDB,integrating multi-source heterogeneous biological data,and managing it through the storage mechanism of MongoDB.This paper will detail the association pattern of multi-source data sources,data integration storage and data management.Firstly,the selected data source and its data format are determined.According to the multi-layer network theory knowledge combined with the automatic association pattern matching algorithm,the inter-layer node connection matrix is constructed.Then,we design mapping rules and algorithms for different data formats(including structured text files,XML,RDF,and OWL)to JSON format,and apply MongoDB to store JSON data.Based on the above-mentioned integrated key technologies,MongoDB-based management system are developed which covers multi-source biological data format mapping model,index construction,keyword query and advanced query function.Besides,the developed MongoDB-based management system combines with MongoDB's non-primary key index construction to improve data query efficiency.Finally,we test the proposed data format mapping algorithm through a series of experiments from actual data sources.The experimental results demonstrate that MongoDB has an advantage in storage,and its storage structure effectively reduces the redundant mark of the semi-structured format and saves storage space.
Keywords/Search Tags:data integration, multi-source heterogeneity, OWL, RDF, MongoDB
PDF Full Text Request
Related items