Font Size: a A A

Theoretical Prediction Of Nucleosome Position And Online Software Development

Posted on:2015-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:S H GuoFull Text:PDF
GTID:2308330473952779Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The basic unit of eukaryotic chromatin is nucleosome. Nucleosome participates in many cellular activities such as DNA replication and DNA repair. Especially, it plays important roles in the regulation of gene expression. With the avalanche of genome sequences generated in the post-genomic age, developing automated methods to rapidly and effectively identify nucleosome positioning is a hot research topic in epigenetics. Although some computational methods have been proposed by researchers for identifying nucleosome positioning, most of them neglected the intrinsic local structural properties and long-range correlation properties that might play important roles in determining the nucleosome positioning in genome.In this thesis, a predictor called ‘iNuc-PseKNC’ was developed for predicting nucleosome positioning in Homo sapiens, Caenorhabditis elegans and Drosophila melanogaster genomes, respectively. In the predictor, we developed a novel feature-vector called ‘pseudo k-tuple nucleotide composition’ to formulate the samples of DNA sequences. This feature-vector includes two parts: one is the information from k-tuple nucleotide which can reflect the local sequence information; another is the structural properties of DNA dinucleotide based on the six degree of freedom of any objects in physical space. It provides an effective method to extract DNA sequences features in both short-range and long-range correlation. Subsequently, the support vector machine was used to discriminate between nucleosome and non-nucleosome. It was observed by the rigorous cross-validation tests on the three stringent benchmark datasets that the overall success rates achieved by iNuc-PseKNC in predicting the nucleosome positioning of the afore-mentioned three genomes were 86.27%, 86.90% and 79.97%, respectively. Meanwhile, to demonstrate the superiority of iNuc-PseKNC, we compared the prediction accuracies of our method with those of precious investigators by using the same benchmark datasets. The results indicated that the current predictor remarkably outperformed its counterparts. For the convenience of the majority of experimental scientists, we built a web-server, which can be freely accessible at http://lin.uestc.edu.cn/server/iNuc-PseKNC. This free software will give be helpful for web lab researchers who focus on epigenetics.Furthermore, the distribution of nucleosomes around transcription start site(TSS) was investigated. Firstly, we investigated the distribution of nucleosomes around 5015 experimental-confirmed TSS in Saccharomyces cerevisiae genome. Results showed that the scores to form nucleosome around TSS are very low, suggesting that the promoter sequences form nucleosomes difficultly. The reason of this phenomenon is that the promoter sequence must interact with RNA polymerase or bind by other regulation protein. In addition, the iNuc-PseKNC was used to predict the nucleosome-forming scores around TSSs in H. sapiens and D. melanogaster promoter sequences. Results showed that the probability scores to form nucleosome around TSSs are very low. These results coincide with existing experiment result. Besides, these results also demonstrated that the iNuc-PseKNC software has a very stable performance in predicting nucleosomes. Finally, in order to investigate the universality of the proposed method, the pseudo k-tuple nucleotide composition method was extended to meiotic recombination hot/cold spot prediction. The prediction accuracy reached to 82.2%, demonstrating again that k-tuple nucleotide composition method can extract the features of DNA sequences effectively.
Keywords/Search Tags:nucleosome position, pseudo nucleotide composition, support vector machine, machine learning, online service
PDF Full Text Request
Related items