Font Size: a A A

Research On Several Problems For Phylogenetic Analysis And Structure Prediction

Posted on:2013-01-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y DingFull Text:PDF
GTID:1110330371496669Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The rapid advances in biological sequencing technologies result in the overwhelming amount of sequence data. It is necessary to consider how to organize, analyze and store these data. A large number of diverse biological data resources inevitably contain a large number of important biological principles, these laws are the key for us to solve many mysteries of life. The traditional ways and methods are not sufficient to analyze heterogeneous data, the use of computer science and network technologies to effectively manage and deal with the biolog-ical data is imperative. Thus, a new and developing inter-discipline named by bioinformatics emerged. This dissertation mainly focus on two research areas:phylogenetic analysis and struc-ture prediction, the major results are as follows:1. In chapter2, we propose two alignment-free methods and reconstruct phylogenetic trees of DNA sequences. The first feature vector is composed of elements which character-ize the relative difference of biological sequence from sequence generated by an independent random process and it is proposed to subtract random background from the k-word frequency. Phylogenetic trees of24transferrins and48Hepatitis E viruses are in good agreements with previous studies, it shows that our method is efficient and powerful. In addition, an indicator6k is proposed to direct the selection of word length k. The second feature vector named by k-word average interval is proposed to extract k-word structure distribution in DNA sequences. In addition, k-words are divided into n categories based on a new proposed indicator, where n is the number of DNA sequences. When k=5,6,7,8,9, the contribution of each k-word category is discussed. The phylogenetic trees of30Eutherian mammals are reconstructed, and the INDELible software is employed to show the reliability and robust of our method.2. In chapter3, we focus on the research of protein secondary structural classes prediction and propose a new structure prediction method based on Support Vector Machine. A total of11features based on the predicted secondary structure sequence and the corresponding E-H sequence are extracted. Each of the features is essential to obtain good prediction accuracy. To demonstrate this, calculations are carried out where one less features are applied each time. Among the11features,4novel features are newly designed to model the differences between α/β class and α+β class, and other7rational features are proposed by previous researchers. To examine the performance of our method, a total of5low-similarity datasets are used to design and test the proposed method. The results show that competitive prediction accuracies and higher values of MCC can be achieved by the proposed method compared to existing methods (SCPRED, MODAS, RKS-PPSC).
Keywords/Search Tags:Phylogenetic tree, Protein secondary structural classes prediction, k-word
PDF Full Text Request
Related items