Font Size: a A A

Research On Gene Sequence Alignment With Geometric Methods

Posted on:2013-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:B XueFull Text:PDF
GTID:2248330395490806Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is an emerging discipline which use a computer to process and study of biological information. With the rapid development of bioinformatics, Various databases are constantly emerging, and different characteristics. Storage management and comparative analysis of gene sequence data on the average14months to double in order to meet the needs of the research and application of biological researchers, including sequence similarity is a basic biological information processing. How to draw a more accurate similarity comparison in effictive time and space is an important topic in biological sciences and computer science combined that we’re faced.This thesis focuses on the gene sequences alignment, the comparison method base on the geometry, Transition between in the character sequence and z curve, and, the biological information systems for gene sequences comparison. The main tasks are:(1) This thesis has developed a system based on B/S structure of the gene sequence analysis, which can be easily used. The system can be divided into three levels:database services layer, and analysis service layer and client interaction layer. B/S structure of the biggest advantages is that you can anywhere without having to install any special software. As long as there is a computer with Internet access can use, zero client maintenance, and easier to expand the system. This system has:user management, the file of gene sequences management, alignment, searching a.(2) System is able to transform between character sequence and z curve, it also can show the z curve. Some properties of the Z curve, such as the geometric center of the curve and the end of the Z curve are extracted.Fr’echet distance is the distance measure of similarity among curves, it is the first origin of man-dog distance model. Use the z curve property, with the similarity of the geometry the discrete Fr’echet distance, and the combination of DNA sequence mutation in the evolutionary process. It has been set out the model based on the geometry method to analysis similarity of two gene sequences, which is called three-step-man-dog model.(3) A search with alignment system which has a special second special species database has been developed. The sequence data is from Gen Bank, and the alignment algorithm inherits the classical dynamic algorithm and the algorithm based on geometric. When you submit a gene sequence, and then it can give you one or more sequences which are similar with your input in high degree, they can also sort the results.(4) The10species’E2B have been selected. The similar score has been calculated with NW algorithm, the tail point of z curve, the sum of discrete Fr’echet distance, the standard deviation based on three-step-discrete Fr’echet distance. And then normalized the scores, this result shows the curve similarity can also response two genes similarity. The gbvrt5flat file has45112kb downing load from Gene Bank, which describe the chordate biological information is used in this thesis. From the search and analysis tests, the delt value diverges one of4statistical properties is influential about the cost time of search and the accuracy about result.Through this innovative attempt, using an intuitive discrete curve to determine the method of similarity of two gene sequences, can be effective and faster to find the optimal sequence set from the database. A appropriate delt determines the reliability of the results and it implies a certain cost time.
Keywords/Search Tags:z curve, DNA, alignment, curve similarity
PDF Full Text Request
Related items