Research And Implementation Of Transcription Regulatory Sequences Data Mining

Posted on:2009-07-26

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zhou

Full Text:PDF

GTID:2178360272959188

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In functional genomics, a fundamental challenge of current research is to understand the regulatory mechanism of eukaryotes organism. Transcription Factor is a kind of special protein, it regulate gene expression through binding the cis-regulatory elements which usually locate at the upstream of gene. Consequently, identifying transcription factor and cis-regulatory elements is a pre-condition of understanding gene expression. In the past, biologists always use experimental methods to identify transcription regulatory sequences (including transcription factors and cis-regulatory elements), but it is expensive and time-comsuming. As a result, nowadays researchers come to use computational methods to predict transcription regulatory sequences, and then do experiments to verify these predictions. This way is more efficient than traditional ways. But there are some problems in exsiting prediction methods, presenting new methods to solve these problems is a hot topic of current research.In this paper, we analyze the faults of existing transcription regulatory sequences prediction algorithms, study biological characteristics of transcription regulatory sequences; Present new transcription factors prediction algorithm and cis-regulatory elements prediction algorithm incorporating the domain knowledge of transcription; Design and realize the TBMiner (Transcription Factor Binding Site and Transcription Factor Miner) system. The major achievements of this paper are as follow:1. Presenting a transcription factor data mining algorithm based on support vector machine (SVM). The algorithm use protein functional domains as vector to represent transcription factors, and construct a training set composed by both positive and negative samples, then use SVM to train a classification model on the training set. The classification model is then used to predict whether a protein is transcription factor, and if it is, predict which group it belong to. Experiments result show it improves the bad generalization performance of current methods.2. Presenting a semi-supervised SVM incorporating polynomial kernel to predict cis-regulatory elements. Most of exsiting methods just use the information of single residue frequency; actually the residues in cis-regulatory elements always have complex dependency relationship. This method use polynomial kernel to capture the dependency between residues of cis-regulatory elements. It greatly imporves the prediction results. The character of avoiding explicitly space transform of kernel method have contributed to the efficient algorithm greatly.3. Designing and realizing transcription regulatory sequence data mining systemâ€”TBMiner. TBMiner unify two frequently used motif finding algorithms MEME and AlignACE into the system, and realize the new transcription factor and cis-regulatory elements predicting algorithm mentioned above. Users can try different parameters to obtain the best result, it provides a good platform for biologists to study transcription regulatory mechanism.

Keywords/Search Tags:

Transcription factor, Cis-regulatory elements, Support vector machine, Polynomial kernel, Data mining, Bioinformatics

PDF Full Text Request

Related items

1	Research And Implementation Of Text Mining For Transcription Regulatory Information
2	Based Support Vector Machine Prediction Of Regulatory Networks Within The Genome-wide Study
3	Research On Feature Analysis And Computational Identification Of Transcriptional Regulatory Elements In Genomes
4	Support Vector Machine With Input Uncertainty And Its Application To Bioinformatics
5	Fast Polynomial Kernel Based Algorithms For Classification
6	Study On Some Issues Of Kernel Machine Learning Method
7	Study On Some Data Mining Methods For Biological Information And Their Application
8	Research On Classification Algorithm Of Data Mining Based On Improved Support Vector Machine
9	Research On Chinese Text Categorization Based On Support Vector Machine
10	The Research And Application Of Wavelet Support Vector Machines In Data Modeling