Font Size: a A A

Based Genetic Annealing Bioinformatics Learn Multiple Sequence Alignment Algorithm Study

Posted on:2010-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2208360275483399Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Within decades biotechnology has conceived massive data and added increasing pressure on traditional data analysis, which then caused development of bio-informatics——a new discipline with combination of biotechnology, information-technology, mathmatics.etc. As one of the most fundamental tasks in bio-information research, sequence alignment serves a role as a basic tool to discover functional, structual and evoluntionary information in biological sequences. Now pairwise sequence alignment research has achieved some markstone while multiple sequence alignment algorithms now available cannot ensure optimal results all the time. This paper does some research on status quo of multiple sequence alignment and implements an algorithm, which uese genetic algorithm and simulated annealing algorithm, to solve problems in multiple sequence alignment.Firstly effects of scoring matrix, gap penalty and objective function on sequence alignment results are analyzed, and two popular objective functions including SP and COFFEE function are implemented. Then a profound study is made on popular sequence alignment algorithms, including a decent analysis on two classical pairwise sequence alignment algorithms——Needleman-Wunsch algorithm and Smith-Waterman algorithm. After that dynamic programming(DP) in multiple sequence alignment is studied. Although accurate, this algorithm provides an exponentially growing time complexity and is rarely used during massive data analysis. Then classical CLUSTAL——a famous progressive method, is analyzed in detail as an example of progressive alignment algorithms.Based on research above, this paper introduces genetic algorithm(GA). GA is an artificial intelligence method simulating natural evolution, which uses three main operators selection, crossover and mutation to produce individuals with better fitness. To implement a new method MSA-GA, three steps are made: model building, coding and operator design. Then MSA-GA is implemented based on SP function, and tested in BAliBASE3.0 with comparison to CLUSTAL. The tests show that although slightly inaccurate than CLUSTAL, results of MSA-GA is acceptable.As the same with other genetic algorithms, MSA-GA has shortcomings resulted from premature phenomena. Some main genetic operators add too much emphasis on certain individuals in population during iteration, thus searching range shrinks dramaticlly and the algorithm converges to some local optimal results. To solve this problem, this paper adds simulated annealing algorithm(SA) to genetic algorithm and implements MSA-GSA. SA is another kind of optimization algorithms simulating the progress of solid annealing and uses acceptance rule and tempreture reduction to avoid local optimal results. This part adds SA to three main operators of GA, that is selection, crossover and mutation, using Metropolis rule during population regeneration period of GA to adjust genetic evoluntion, which ensures population diversity and accelerates convergence to avoid premature phenomenon. The test result shows that the accuracy of MSA-GSA has improved and the algorithm is effective in multiple sequence alignment.
Keywords/Search Tags:bio-informatics, multiple sequence alignment, genetic algorithm, premature phenomenon, simulated annealing algorithm
PDF Full Text Request
Related items