Font Size: a A A

Research On Detecting Helix-turn-helix Motif Using Sequence Information

Posted on:2009-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:W W XiongFull Text:PDF
GTID:2120360242478366Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
In recent years, the rapid development of the life sciences has brought many new challenges. With the human genome sequencing work completed, the flood of "Non-character book from heaven" has been presented to people. How to extract meaningful information from a broad array of data will become crucial. To interpret this "Non- character book from heaven" and understand the complex regulation of gene expression mechanism, we must first identify all the transcription factors, which are composed of the major part of DNA-binding protein. In many of the known DNA-binding proteins, there are a number of regular structure motifs, and the Helix-turn-helix motif is the most common one being studied, and composes the majority of the motifs.With the advent of genomics projects, an increasing number of protein structures with little or no sequence similarity to current PDB entries and little function information are being solved. So if protein structure and function domain information (including HTH motif) can be predicted or illustrated only on the basis of protein sequence information, we will be in a larger measure understanding the relationship between the protein sequence and protein structure, function and the understanding of the biological genetic information transmission mechanism in all the processes.So in recent years, the HTH predicting methods were concerned about as a hotspot, and a number of predicting methods have been proposed. There is no doubt that the inheritance and development on the basis of our predecessors in HTH motif predicting methods, will enable us to explore more in-depth about the principle of information transferring from gene to protein and protein structure-function annotation, and will drive us to further understanding of the mechanism of the mystery of life.We firstly launched a comprehensive analysis and research of predecessors' excellent work in HTH predicting and related fields. Then we retrieved our necessary raw data from SMART and UniProtKB/TrEMBL database access, completed the data classification, and built our own HTH database. On the basis of HTH data statistics, we formed a training-predicting based working model and trialed a variety of sequence encoding methods. Eventually we develop a pattern-variable based encoding method, which achieved very satisfactory results.It's a new variable-transforming method that we designed based on pattern variables. By feature extraction of the master set, we got pattern collections associated with a certain threshold. These patterns have not only retained the statistical features of protein sequences, but also take account of long-range interaction of residues, and avoid an excessive number of variables. To some extent, pattern variables represented the residue groups at specific locations, in other words, the contribution factors of the forming of HTH-motif in proteins. Meanwhile, the relationship in certain pattern groups may determine whether a protein includes a HTH-motif potentially.The framework we constructing is some inspiring to resolve the similar problems in bioinformatics field. This approach provides a well-done idea for the non-length-equally encoding method of samples, and the variable optimization.
Keywords/Search Tags:Helix-turn-helix motif, DNA-binding proteins, pattern recognition, biostatistics encoding methods, pattern-based variable
PDF Full Text Request
Related items