Bioinformatics Analysis And Algorithm Development Of Consensus Ranking For Biological High Throughput Data

Posted on:2015-08-17

Degree:Doctor

Type:Dissertation

Country:China

Candidate:B Yan

Full Text:PDF

GTID:1310330467982938

Subject:Biochemistry and molecular biology

Abstract/Summary:

It is thought to be more and more important to solve biological questions using Bioinformatics approaches in the post-genomic ear. This thesis focuses on the Bioinformatics analysis and algorithms development of consensus ranking for biological high throughput data.In molecular biology and genetics, RNA splicing is a modification of the nascent pre-messenger RNA (pre-mRNA) transcript in which introns are removed and exons are joined. The U2AF heterodimer has been well studied for its role in defining functional3’splice sites in pre-mRNA splicing, but multiple critical problems are still outstanding, including the functional impact of their cancer-associated mutations. Through genome-wide analysis of U2AF-RNA interactions, we report that U2AF has the capacity to define～88%of functional3’splice sites in the human genome. Numerous U2AF binding events also occur in other genomic locations and metagene and minigene analysis suggests that upstream intronic binding events interfere with the immediate downstream3’splice site associated with either the alternative exon to cause exon skipping or competing constitutive exon to induce inclusion of the alternative exon. We further build up a U2AF65scoring scheme for prediction its target sites base on the high throughput sequencing data using a Maximum Entropy machine learning method, and the scores on the up and down regulated cases are consistent with our regulation model. These findings reveal the genomic function and regulatory mechanism of U2AF, which facilitates us understanding those associated diseases.Ranking biological data is a crucial need. Instead of developing new ranking methods, Cohen-Boulakia and her colleagues proposed to generate a consensus ranking to highlight the common points of a set of rankings while minimizing their disagreements to combat the noise and error for biological data. However, it is a NP-hard question even for only four rankings based on the Kendall-tau distance. In this thesis, we propose a new variant of pivot algorithms named as Consistent-Pivot. It uses a new strategy of pivot selection and other elements assignment, which performs better both on computation time and accuracy than previous pivot algorithms.

Keywords/Search Tags:

Bioinformatics analysis, High throughput sequencing, U2AF, RNAsplicing, Algorithm, Consensus ranking

Related items

1	Bioinformatics Analysis Of Microbial Identification, Evolution And Drug Resistance Based On High-throughput Sequencing
2	Analysis Of Human And Animal Viral Metagenomes And Whole Genomes Using High-throughput Sequencing And Bioinformatics
3	Using High-throughput Sequencing To Detect The Heteroplasmy Of COI Gene In Fig Wasps And Their Effects On Molecular Identification
4	Data Analysis And Quality Control Of High-throughput Single Cell Sequencing
5	Non-Coding Rna Research Based On High-Throughput Sequencing Technology
6	Comprehensive Analyses Of RNA Editing Events In Rat Based On Large High-throughput RNA-Seq Data
7	Analysis Of Error Model For High-Throughput Sequencing And Decoding Solution Design
8	Analysis Of Transposable Elements In The Genome Of Asparagus Officinalis Based On The High Throughput Sequence Date
9	High-throughput Analysis Of Biomolecular Data Using Multiple Hierarchical Consensus Clustering
10	Differential Expression And Bioinformatics Analysis Of Circular RNA In Osteosarcoma And Normal Bone Tissue