Font Size: a A A

The Design And Implementation Of Information Retrival System On Genomic Big Data

Posted on:2016-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:K XuFull Text:PDF
GTID:2308330479491064Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Due to large scale of genomic study and high throughput data generation, bioinformatics data is getting much larger. Traditonal data management method can’t meet the needs day by day. At this background, big data technology like Hadoop platform, provides distributed data management and data computing framework. This article mainly focuses on how to implement the storing, indexing and retrieving of genomics big data.An widely used indexing and retrieving mechanism is introduced in this article, it can be used on different kinds of databases. Since HDFS doesn’t support randomly write, HBase is used to store the data. HBase can read and write data very fast. But there is limitation in HBase,it doesn’t support indexing except the Row Key, so we import Solr, a distributive indexing and searching framework based on Lucene. When new data is updated into HBase,the system first creates a virtual node to finish the subscription of the data, then uses the customed message mechanism to asyncnomously submit the data to Indexer. Indexer used the customed indexing strategy to generate data that can be imported into Solr which build the indexes and store them into HDFS. When retrieving data, the request will be distributed into each node in the cluster and search the relative data in each node. Meanwhile user can submit sequence data and meta data into database. The backend will transform the data submitted into proper data format and import the data into HBase to build real-time indexing.SRA database is used in the system,including the sequence data and meta data. The final result shows that, the system can make very fast queries about the data.And the searching effiencicy is much faster than the traditional hbase searching method. Using the system,user can make effective and secure management of data.
Keywords/Search Tags:Hadoop, Information Retrival, Genomics Big Data, Genome
PDF Full Text Request
Related items