Font Size: a A A

Research On Algorithms Of The Two-Species Small Phylogeny Problem

Posted on:2016-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:J W WangFull Text:PDF
GTID:2180330464453792Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of molecular biology and whole-genome sequencing technology, a large amount of gene data have been produced, which make it possible to study the evolution of gene families at the molecular level. Since reconstructing the genealogy of gene families plays a critical role in resolving many fundamental biological problems, it has received extensive attention in recent years and developed into one of the important research areas in comparative genomics. The two-species small phylogeny problem is an important sub-problem for constructing gene family phylogeny. In this thesis, the algorithms are studied for solving the problem, and a practical software package is developed. The concreate work is as follows:The two-species small phylogeny problem is studied based on the duplication-loss alignment model. Firstly, an alignment algorithm ALIGN is proposed by using the principle of maximum matching characters. An alignment is get for two given gene sequences from inserting a certain number of spaces into them. Secondly, a labeling algorithm LABLE is presented for a given sequence alignment. The LABLE algorithm lables the given alignment with a duplicaton-loss operation sequence from the two directions of "left-to-right" and "right-to-left" respectively, and choose the operation sequence with smaller evolution cost as the final lable sequence. Based the ALIGN and LABLE algorithms, a genetic algorithm G2SP is proposed to solve the two-species small phylogeny problem in the duplication-loss model. The algorithm G2SP produces initial solutions with the alignment algorithm ALIGN, and measures the fitness of a solution with the lable algorithm LABLE. In addition, three intelligent mutate operators, i.e., re-matching gene block, smart moving gene block and moving gene block, are introduced in the G2SP algorithm, which are used to improve the convergence for the population and make the G2SP algorithm evolve to the optimum more quickly. The tRNA and rRNA gene data of six kinds of real bacteria and simulated one were used to test the performance of the algorithms. The experimental results indicate that the G2SP algorithm can get fewer evolution cost than the PBLB algorithm, and the running time of which is also feasible in practical applicatious.The G2SP algorithm is an effective method for solving the 2-SPP-DL Problem.According to the G2SP algorithm, by using c# language and Visual Studio 2012, a practical software package is developed for solving the two-species small phylogeny problem in the duplication-loss model. The software package runs in such systems as Window XP and window 7 or above, with.net Framework4.0. The main functional modules include settings parameters, inputing biological data, inferring the ancestor, viewing the results and helping. Some experimental parameters, i.e., evolutionary cycle, population size, the number of iterations, crossover rate and mutation rate, can be set in the "Settings parameters" module according to specific situations. Two gene sequences can be read from a text file by using the "inputing biological data" module, and each of which is made up of the genes representing specific gene families. The software displays such information as running time and the optimal solution of each iteration during the process of solving problems. The final results are recorded in files, which contain a labled sequence alignment, an ancestral sequence and a evolution history of the two gene sequences.In this thesis, the two-species small phylogeny problem is studied based on the duplication-loss alignment model. An effective genetic algorithm G2SP is presented for solving this model, which can get good performance. The G2SP algorithm provides a new idea and method for solving the phylogenetic problems. In addition, the developed software package provides a practical tool for solving the two-species small phylogeny problem in the duplication-loss model.
Keywords/Search Tags:duplication, loss, small plylogeny problem, sequence alignment, genetic alogrithm
PDF Full Text Request
Related items