Font Size: a A A

DNA Sequence Assembly Algorithm And New NcRNA Gene Finding

Posted on:2005-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y XuFull Text:PDF
GTID:2178360185995523Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the great advance in the Genomics, Bioinformatics makes a rapid progress synchronously. This thesis focuses on the two important branches in Bioinformatics, which are " DNA Sequence Assembly " and " Non-coding Gene Analysis ". Specifically, we develop a novel assembly algorithm, research on the theoretical problems related to Repeat Separation, develop a novel potential-ncRNA-gene-finding method and design the siRNA to SARS proteins.The main achievements are listed as follows.A new DNA Sequence assembly algorithm is developed. The key idea is that we embed the sequence assembly problem into the " Shortest Common Substring " (SCS) framework and the " Local Search " algorithm is applied to find the suboptimal solution of SCS. It is a totally new approach, which can be expected to improve the mis-assembly results generated by traditional Assemblers. In addition, two effective optimizing strategies, "Neighborhood Pruning " and "Complementary-validation", are adopted to significantly improve the performance of original algorithm, both in the speed and in the result quality.We consider the "K-Closest Substring Problem " and " K-Consensus Pattern Problem ", which are two different formulations of the repeat separation problem. In this article, we adopt and extend the " random sampling strategy ". As a result, we give a PTAS for the "O(1)-Closest Substring Problem " and for the " O(1)-Consensus Pattern Problem " separately. In addition, using a novel construction, we give a direct and neater proof of the NP-hardness of " (2 — ε)-approximation of the Hamming radius k-clustering problem ", a special case version of the " k-Closest Substring Problem " restricted to L=m.The above theoretical results are original and can be expected to provide some insights guiding the design of the practical algorithm to solve the real repeat related problems in assembly.A novel potential-ncRNA-gene-finding method is developed based on the EST database. As a result, we find nine sequences are confirmed ncRNA gene. One sequence may be a novel human ncRNA gene. The obtained positive results confirm that there are ncRNA genes in the EST database. Also, it verifies the validity of our ncRNA-Gene-finding method.Focusing on the five genes, which code five crucial proteins of SARS-CoV respectively, we obtain 348 siRNA candidate targets following Bioinformatic methods. Potent siRNA duplexes specifically suppress expression of its corresponding SARS-CoV target gene, while have no influence on the normal expression of human gene. It would lay a foundation for the further experimental researches on the siRNA-like drug design for the SARS-CoV.
Keywords/Search Tags:DNA Sequence Assembly, Algorithm optimization, ncRNA gene finding, K-Closest Substring, siRNA design
PDF Full Text Request
Related items