Font Size: a A A

Feature Mining For The Sequence Structure Of Peptides Encoded SORFs In Prokaryotic Genome

Posted on:2022-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:B W QianFull Text:PDF
GTID:2480306557971069Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
In recent years,small open reading frames(s ORFs)found in non-coding regions have attracted great attention because their peptide coding capabilities.Past studies thought that these short ORFs cannot encode proteins due to their short sequences(less than 100 amino acids)and low expression levels,but the discovery of peptide-encoded s ORFs has broken inherent understanding of the genome and it has become an important research field in life sciences.In traditional gene prediction algorithms,false positive prediction results were always reduced by setting length thresholds to exclude s ORFs,which leads to long-term neglect of peptide-encoded s ORFs.At present,a large number of studies have shown that peptide-encoded s ORFs are ubiquitous in eukaryotic and prokaryotic organisms,but related researches mainly focus on several eukaryotic model organisms such as humans and mice.There is still a lack of a comprehensive understanding of features of sequence and structure,and fewer studies focus on s ORFs in prokaryotes.It is urgent to perform systematic research to provide research basis for the accurate identification of peptide-encoded sORFs.Herein,this work aims to perform a systematic and in-depth study on sequence feature mining of peptide-encoded s ORFs in the prokaryotic genome by integrating current s ORFs data,which would provide theoretical basis and new ideas for future peptide encoding work.The specific work progress includes the following three aspects.1.Study on the structural characteristics of peptide-encoded s ORFs.Due to the short sequence,the spatial structure formed by peptide-encoded s ORFs is limited,and in-depth exploration of peptide-encoded s ORFs is important for revealing their biological functional mechanisms.We first conducted a statistical study on the genome distribution characteristics of 382,186 prokaryotic peptide-encoded s ORFs(less than 60 amino acids)according to Refseq database.Simultaneously,the inherent characteristics of s ORFs less than 100 amino acids in many eukaryotic model organisms(such as humans,mice,and Arabidopsis)were compared and studied,mainly including sequence complexity,sequence preference,secondary structure,inherent disorder,and codon preference.We found that 33.24% of prokaryotic peptide-encoded s ORFs did not use ATG as the initiation codon.Cross-species analysis showed that s ORFs prefer to use hydrophilic amino acids,and the structural characteristics showed that the proportion of helical and disordered structures were the highest in secondary structure and total ORFs,respectively.In addition,the peptide-encoded sORFs in various species had high common information in sequence structural features,which can provide an important basis for the future development of prediction and recognition algorithms.2.Evaluation on prediction efficiency of peptide-encoded s ORFs.Traditional research methods are not effective for short sequences,and it is difficult to predict peptide-encoded s ORFs.Most of the published works always predict s ORFs using directly using the traditional methods of long sequence,but the effectiveness and reliability of these methods still lack a unified evaluation due to the lack of unified data standards.This study established four test sets by randomly disrupting the intermediate sequence of the positive sample by ensuring the consistent start and stop codons.Simutaneously,we evaluated and analyzed 9 softwares using the experimental verification data set to assess the prediction efficiency.The results showed that the prediction efficiencies were not very well,and the traditional prediction model for long ORFs mistakenly recognized most positive s ORFs as negative samples,while the prediction program specially designed for s ORFs could mistakenly recognized most negative s ORFs as positive samples.We designed a prediction model that only used codon usage frequency as a feature,and it had a better prediction effect than other programs.This part of the work will provide important ideas for the identification and prediction of peptide-encoded s ORFs in the future.3.Construction of data platform for prokaryotic peptide coding s ORFs.In order to facilitate the use of researchers,this study developed an online data platform for prokaryotic peptide coding s ORFs,and users can log in to http://biophy.dzu.edu.cn:8888/ for free use.In the current version,the platform collected 382,186 s ORFs less than 60 amino acids in length from more than 8,000 species.Except for sequence and function information,the platform also provided feature information such as secondary structure,inherent disordered regions,and the code for the peptide-encoded s ORFs prediction method developed in this study.In summary,peptide-encoded s ORFs have important roles in biological processes,which have provided a great challenge for the genome annotation.The study of peptide-encoded s ORFs has important scientific significance and application value.This study systematically study relative features of peptide-encoded s ORFs in prokaryotic genome by integrating bioinformatics,and then perform in-depth analysis to predicit peptide-encoded s ORFs and develop data platform,which will provide relative theory and new ideas for sORFs study.
Keywords/Search Tags:sORFs(Small Open Reading Frames), sequence structure, gene prediction, database, genome, protein
PDF Full Text Request
Related items