Font Size: a A A

Genome data modeling and data compression

Posted on:2008-11-05Degree:M.SType:Thesis
University:University of Nevada, RenoCandidate:Radhakrishnan, RadhikaFull Text:PDF
GTID:2448390005472718Subject:Computer Science
Abstract/Summary:
Genome data modeling is an important area of research and different data models have been proposed for representing and storing data. Some of the challenges in biological data management are data storage, retrieval, data redundancy, and data integrity. In this thesis we propose two data models for representing and storing genome sequence data. In these models we propose that, instead of storing the whole gene sequence for each gene separately, we store common sub sequences only once, with a sequence ID or GenBank identification number. We also store the position number, so that the whole sequence can be retrieved correctly. This would significantly reduce storage space requirements and help maintain data integrity. In our second model a pre-coding routine is also included to further reduce storage requirements. A study of randomness in genome data is also included. Both data models were tested and the results were satisfactory. We were able to compress the sequence, when there was significant amount of commonality, and the retrieval algorithm was able to retrieve the sequence correctly.
Keywords/Search Tags:Data, Genome, Sequence
Related items