Font Size: a A A

Study On Clustering Of Position Frequency Matrices For Transcription Factor Binding Site

Posted on:2012-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:H L YueFull Text:PDF
GTID:2248330395955419Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Transcription factor binding sites is an important gene regulatory element, theprediction and identification of transcription factor binding sites is the key tounderstanding gene regulatory networks.In this thesis, we focus on the similarity function and clustering of transcriptionfactor binding sites in Bioinformatics.The main works are as follows.1. To solve similarity function the problem, the thesis construct noise matrix usingtranscription factor position frequency matrix in JASPAR database to mimic noise.Comparison between the noise matrix and the original matrix is made by a varietyof similarity function. Evaluate the identify capacity of all kinds of similarityfunction, and we propose a modified Euclidean distance formula. By comparing theexperimental results, indicating that the proposed improvements Euclidean distanceformula is valid.2. For the clustering of position frequency matrix problem, proposed theMUPGMA algorithm and construct consensus sequence algorithm based onposition frequency matrix. Cluster the JASPAR and TRANSFAC database usingthe MUPGMA algorithm. Using different kinds of similarity function to calculatesthe proximity among data objects, calculating the similarity of different class bydifferent class adjacent. According to the experimental results, we compare allkinds of similarity measurement functions and different classes adjacent, andproved that the MUPGMA algorithm and construct consensus sequence algorithmis indeed effective and feasible.
Keywords/Search Tags:Transcription factor binding site, Similarity function, Clustering, UPGMA, ROC curve
PDF Full Text Request
Related items