Font Size: a A A

Research And Application Of Index On Biological Sequences

Posted on:2010-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:B R QiuFull Text:PDF
GTID:2178360275491622Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the study of biological area,querying on similar biological sequences in a massive biological database is one of the regular works,and plays an important role in the field of researching biological knowledge and the law of life.However,to search similar biological sequences in large databases with huge amount of very long sequences,the naive method of completely searching and matching is not efficient.So researchers come to use query optimization methods to improve the query efficiency. An important method of query optimization is to build index structures.Index structures exchange some storage space for a rapid response to queries.The efficient index structures can organize biological sequence data effectively,and significantly improve the speed of retrieval.But there are some problems in exsiting methods, presenting new methods to solve these problems is a hot topic of current research.In this paper,we analyze the existing biological data index technology for similarity search,present new data index methods named BioIndex and SSQ_MF,and design and realize the Integrated Transcription Regulation Platform(ITREP).The major achievements of this paper are as follows:(1) Presenting a biological sequence index structure called BioIndex,and a corresponding query algorithm.The method of BioIndex provides an effective way of nearest query on biological sequences.This method of constructing index is based on sequence pattern mining in biological sequence set,and can effectively control the size of the index structure so that the structure can store in memory for fast query.The experiment results show that the query algorithm based on the new biological sequence index BioIndex improves query efficiency.(2) Presenting a similarity query algorithm of biological sequence,SSQ_MF, based on multiple index structures for filtering.To solve the problem of range query of biological sequences,SSQ_MF constructs three filters by building three different index structures,whose filtering level is further improved compared with those query algorithms with a single filter.SSQ_MF effectively estimates every filter's filtering size and builds a model in which optimal filtering order is determined by every filter's filtering size.Experimental results demonstrate that SSQ_MF is better than the algorithms with a single filter and the algorithms with multiple filters but executing in random order.(3) Designing and realizing transcription regulatory sequence data mining system, ITREP.Transcriptional regulation is one of important research issue in post-genomic era.In this paper,we apply the above-mentioned methods to the cis-regulatory elements(transcription factor binding sites) query in order to improve the efficiency of query,and provide a good platform for biologists to study transcription regulatory mechanism.
Keywords/Search Tags:Index, Biological sequence, Biological database, Query optimization
PDF Full Text Request
Related items