Font Size: a A A

The Development And Application Of Biological Compressed Database Tools

Posted on:2011-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:2120360308452780Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Various types of large-scale"Omics"approach have penetrated into all fields of modern molecular biology research since the completion of the Human Genome Project. The large and high-dimensional data bought by these techonologies is turning modern biology into an information science which used to build models and analyze data. So how to process and store these data becomes more and more important in such a context of disciplinary development.At present, much massive and stable biological data is mainteined in regular flat files, such as FASTA format. Usually an additional encoder is used to reduce file size by exploiting statistical redundancy of file content. However, it is inefficient to extract content from or search data within this compressed stream as it has to be completely decoded at the first place. To this end, we propose Gene Zip Query Tools (GZQ), a new biological data storage and management software. GZQ addresses both the compression and query performance problems of existing flat file encoder.The main ideal behind GZQ is to compress different data field independently with fixed-size blocks. On the one hand, by selecting an appropriate way of data block organizaion and compression algorithm for each data field, GZQ can achieve a better performance of compression ratio and time. On the other hand, GZQ can support a wide scope of fast query methods through compressor, indexer, and query module. GZQ also provides an additional functional part, view module, to deal with different file formats.
Keywords/Search Tags:biological database, database software, compression algorithm, block compression
PDF Full Text Request
Related items