Font Size: a A A

Research And Implementation Of Transcription Regulatory Sequences Data Mining

Posted on:2009-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhouFull Text:PDF
GTID:2178360272959188Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In functional genomics, a fundamental challenge of current research is to understand the regulatory mechanism of eukaryotes organism. Transcription Factor is a kind of special protein, it regulate gene expression through binding the cis-regulatory elements which usually locate at the upstream of gene. Consequently, identifying transcription factor and cis-regulatory elements is a pre-condition of understanding gene expression. In the past, biologists always use experimental methods to identify transcription regulatory sequences (including transcription factors and cis-regulatory elements), but it is expensive and time-comsuming. As a result, nowadays researchers come to use computational methods to predict transcription regulatory sequences, and then do experiments to verify these predictions. This way is more efficient than traditional ways. But there are some problems in exsiting prediction methods, presenting new methods to solve these problems is a hot topic of current research.In this paper, we analyze the faults of existing transcription regulatory sequences prediction algorithms, study biological characteristics of transcription regulatory sequences; Present new transcription factors prediction algorithm and cis-regulatory elements prediction algorithm incorporating the domain knowledge of transcription; Design and realize the TBMiner (Transcription Factor Binding Site and Transcription Factor Miner) system. The major achievements of this paper are as follow:1. Presenting a transcription factor data mining algorithm based on support vector machine (SVM). The algorithm use protein functional domains as vector to represent transcription factors, and construct a training set composed by both positive and negative samples, then use SVM to train a classification model on the training set. The classification model is then used to predict whether a protein is transcription factor, and if it is, predict which group it belong to. Experiments result show it improves the bad generalization performance of current methods.2. Presenting a semi-supervised SVM incorporating polynomial kernel to predict cis-regulatory elements. Most of exsiting methods just use the information of single residue frequency; actually the residues in cis-regulatory elements always have complex dependency relationship. This method use polynomial kernel to capture the dependency between residues of cis-regulatory elements. It greatly imporves the prediction results. The character of avoiding explicitly space transform of kernel method have contributed to the efficient algorithm greatly.3. Designing and realizing transcription regulatory sequence data mining system—TBMiner. TBMiner unify two frequently used motif finding algorithms MEME and AlignACE into the system, and realize the new transcription factor and cis-regulatory elements predicting algorithm mentioned above. Users can try different parameters to obtain the best result, it provides a good platform for biologists to study transcription regulatory mechanism.
Keywords/Search Tags:Transcription factor, Cis-regulatory elements, Support vector machine, Polynomial kernel, Data mining, Bioinformatics
PDF Full Text Request
Related items