Font Size: a A A

Research And Application Of Proteomic Data Storage Technology Based On NoSQL Database

Posted on:2017-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2348330533450192Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the sequenced of Human Genome was completed, biological sciences research has stepped into the highly information-sharing post-gene era. In the process of exploration and analysis of huge amounts of biological data, research areas named after omics are emerging, in which proteomics is one of the hot spot issues. Proteomics research can not only provide the material basis for laws of life activities, but also provide theory basis and solutions for illustration of disease mechanisms. Due to the development of high-throughput techniques and quick start work in proteomics research, a large amount of data was produced. However, traditional relational database cannot meet the storage and management of vast amounts of data, which brings urgent demand of data storage, analysis and efficient use. New patterns of data storage and access must be studied to solve the problem of the sharp increase of proteomics data.NoSQL systems, aiming to support huge amounts of data, high availability, and high scalability, have solved the problems faced by relational databases, which make up for a lack of relational database. In this thesis, research and application are based on MongoDB database which is a typical representative of the NoSQL system. The main research contents are as follows:Firstly, Theoretical research, elaborating on the concept, characteristics, and theoretical basis of NoSQL was done in this thesis. Then the design principles and features of the mainstream NoSQL systems were analyzed, and NoSQL systems were compared with traditional relational database afterward. From the analyzing and comparing, No SQL system is convinced to be superior to the traditional relational database on mass data's storage.Then, framework and principles of MongoDB distributed cluster, focusing on balancing algorithm based on chunk in Auto-Sharding cluster was studied in this thesis. In order to solve the problem of uneven distribution of data, an improved algorithm was proposed. Also, the improved algorithm was tested to be valid.Finally, for the problems of existing Proteomics data Storage System, the feasibility of MongoDB modeling was analyzed based on theories and fulfillments. A highly available and highly scalable Proteomics data Storage System based on MongoDB was designed, which also passed a comprehensive assessment including function test and performance test.
Keywords/Search Tags:Proteome, NoSQL, MongoDB, load balancing
PDF Full Text Request
Related items