Font Size: a A A

Research On Algorithm For Similarity Search Of Biological Sequence Database

Posted on:2014-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:M H DingFull Text:PDF
GTID:2268330425456428Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the start of the international Human Genome Project, the data of sequence database, which contain a lot of information and knowledge, have increased explosively. How to tap and use the information and knowledge of the sequence database is a problem worthy of study. Similarity searches of biological sequence databases is a basic problem of biological information processing which can compare the similarity between a query sequence and the target sequence of the database and then infer their homology to meet the needs of the research and application of biological researchers. Although the biological sequence databases provide sequence data storage management and similarity search, the growth of biological sequence data of the database has arrived at an alarming rate, and it’s amount will double every14months. Because of the explosive growth of the data of biological sequence databases, the speed for the similarity search has a great of challenges. At present, most of the speed for the similarity search is obtained by sacrificing some sensitivity. Thus, how to research and design an algorithm with high sensitivity and speed at the same time to meet the requirements of similarity biological sequence search database is a question worth considering. Therefore, the work of this thesis is as follows:1. This thesis has summarized the method of dotpot and algorithm for sequence alignment based on dynamic programming, which is the theoretical basis for the similarity search of biological sequence databases and the reference of it’s sensitivity. This thesis has also introduced the algorithm for similarity searches of the biological sequence database such as BLAST, FASTA and PatternHunter.2. This thesis has studied the important parameters of similarity searches of the biological sequence database-the seeds which can adjust the speed and sensitivity of biological sequence database similarity search. However the successive seeds and spaced seeds are currently used with relatively low sensitivity. How to design an efficient seed is the key issue. Therefore, a new matching pattern seed-fuzzy matched seed has been proposed. The theory and experiments has showed that fuzzy matched seed with the same search length has a higher sensitivity than continuous seeds and spaced seeds. Thus, using the fuzzy matched seeds can improve the performance of similarity searches of the biological sequence database.3. This thesis has given the optimization calculation method of fuzzy matched seed. The length of the seeds of the traditional similarity searches of the biological sequence database is to obtain an empirical value, which is not satisfied with the optimal sensitivity and poor self-adaptive. This thesis has designed a fuzzy matched seed optimal sensitivity mathematical programming model and its solution method has been given. 4. The thesis has also studied the problem of chaining seeds of the local similarity regions. The basic operation of the gene sequences also include insertion and deletion, which is a problem of NP-hard. Chaining seeds of the local similarity regions can quickly and effectively identify potential seeds belonging to the same similarity regions, which can improve the speed and sensitivity of similarity searches of the biological sequence database.5. This thesis has designed a highly efficient algorithm FSSA for similarity searches of the biological sequence database. The features and process of FSSA has been described in detail. Experiments has showed that the FSSA is an algorithm with high sensitivity and speed for similarity searches of the biological sequence database.
Keywords/Search Tags:the biological sequence database, similarity searches, the optimization of seeds, fuzzy matched seed, chaining seeds of local similarity
PDF Full Text Request
Related items