Font Size: a A A

Study Of Identifying Transcription Factor Binding Sites Based On Graphical Cluste

Posted on:2011-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2120360305489390Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Transcription factor binding site (TFBS) is a short segment of DNA sequence. The length of TFBS is between 10 and 30 base pairs. It is usually located in the upstream of gene that is regulated by it. Transcription regulation proteins must bind to these sequences in order to regulate the transcription of the gene. So identifying the TFBS is the first step of constructing transcription regulation net. There are usually many binding sites to which one kind of transcription regulation protein binds. These sequences are usually similar but not same in the sequence mode. So identifying all binding sites of one transcription regulation protein becomes one of the most challenging problems today in the bioinformatics.The paper presents an approach that identify unknown binding site to which the transcription factor bind through a cluster method based on graph using all known binding sites to which the transcription factor bind. We can put the sequences that have the highest similarity into a group through clustering the known TFBS. Then we build a position-specific scoring matrix (PSSM) for each group of sequences so we will get some PSSMs. These PSSMs are put together to form a model called mix PSSM. We use this model to score each sequence in the training set and get a score vector of each sequence. Last we use these score vectors to train a classifier. The classifier that has been trained well has the ability to identify the binding site of the transcription factor.In the theory, our clustering method need not decide the count of cluster prior but modify adaptively the count according to the similarity of sequences. So the result of our clustering method increases more than the classic method. Otherwise, the information of mix PSSMs derived from cluster is higher than classic PSSM. So the score derived from mix PSSMs is disrupted less by random even. The score is more reliable and the result of training classifier is better. So our method has more advantages than classic method.In the process of experiment, first, we test our method by the TFBSs of Ecli-12. The results increase a lot relative to classic method of PSSM. Then, we do experiments for 4 kinds of TFBSs of yeast. The results indicate that the performance of identifying TFBS through our method increase more than classic method in sensitivity and specificity. So our method is effective in the identification of TFBS.
Keywords/Search Tags:bioinformatics, transcription regulation protein, TFBS, PSSM, sensitivity, specificity
PDF Full Text Request
Related items