Font Size: a A A

Bioinformatics Analysis And Algorithm Development Of Consensus Ranking For Biological High Throughput Data

Posted on:2015-08-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:B YanFull Text:PDF
GTID:1310330467982938Subject:Biochemistry and molecular biology
Abstract/Summary:PDF Full Text Request
It is thought to be more and more important to solve biological questions using Bioinformatics approaches in the post-genomic ear. This thesis focuses on the Bioinformatics analysis and algorithms development of consensus ranking for biological high throughput data.In molecular biology and genetics, RNA splicing is a modification of the nascent pre-messenger RNA (pre-mRNA) transcript in which introns are removed and exons are joined. The U2AF heterodimer has been well studied for its role in defining functional3'splice sites in pre-mRNA splicing, but multiple critical problems are still outstanding, including the functional impact of their cancer-associated mutations. Through genome-wide analysis of U2AF-RNA interactions, we report that U2AF has the capacity to define?88%of functional3'splice sites in the human genome. Numerous U2AF binding events also occur in other genomic locations and metagene and minigene analysis suggests that upstream intronic binding events interfere with the immediate downstream3'splice site associated with either the alternative exon to cause exon skipping or competing constitutive exon to induce inclusion of the alternative exon. We further build up a U2AF65scoring scheme for prediction its target sites base on the high throughput sequencing data using a Maximum Entropy machine learning method, and the scores on the up and down regulated cases are consistent with our regulation model. These findings reveal the genomic function and regulatory mechanism of U2AF, which facilitates us understanding those associated diseases.Ranking biological data is a crucial need. Instead of developing new ranking methods, Cohen-Boulakia and her colleagues proposed to generate a consensus ranking to highlight the common points of a set of rankings while minimizing their disagreements to combat the noise and error for biological data. However, it is a NP-hard question even for only four rankings based on the Kendall-tau distance. In this thesis, we propose a new variant of pivot algorithms named as Consistent-Pivot. It uses a new strategy of pivot selection and other elements assignment, which performs better both on computation time and accuracy than previous pivot algorithms.
Keywords/Search Tags:Bioinformatics analysis, High throughput sequencing, U2AF, RNAsplicing, Algorithm, Consensus ranking
PDF Full Text Request
Related items