Font Size: a A A

Prediction Of Nucleosome Positioning Based On Sequence Energy Score Difference

Posted on:2021-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:J Y FuFull Text:PDF
GTID:2370330620476589Subject:Physics
Abstract/Summary:PDF Full Text Request
As the basic structural unit of chromatin in eukaryotes,the special packing form of nucleosomes has brought about many changes in the contact between genetic information on DNA sequences and the regulation information carried by proteins.The precise localization of nucleosomes and the way of their assembly in genomes have become important factors in the biological processes such as transcriptional regulation mechanism,DNA replication and repair,which directly or indirectly regulate the process of gene expression.With the rapid development of high throughput sequencing technology and the arrival of the era of big data,the establishment of prediction algorithm for highly accurate,concise and effective nucleosome positioning has become a new challenge in the field of bioinformatics.In this paper,we took the positioning database of Saccharomyces cerevisiae nucleosome as the main object of study,proposed a new prediction algorithm to achieved good prediction results.During the establishment of the data set,we not only extracted two kinds of Saccharomyces cerevisiae nucleosome position databases used in previous studies,but also took the average length of nucleosome Linker-DNA sequences as the benchmark.The new nucleosome sequences were used as a positive set to construct a new dataset.When the prediction has achieved good results,the new algorithm was also applied to the prediction of H.sapiens,C.elegans and D.melanogaster in order to evaluated the performance of the prediction algorithm more objectively.This article mainly defined the following three scoring methods:1.According to the frequencies of the neighboring dinucleotides at various site of the sequences of nucleosome DNA and Linker-DNA,the position weight matrices(PWM)were established based on the establishment of the position probability matrices.The position correlation information of the nearest neighbor dinucleotides in the two kinds of sequences was counted by mathematical method,the first scoring function is defined.2.For the total energy of the sequence can be regarded as the sum of the base energy of each site along the sequence,according to the sequence preference of nucleosome positioning,it is speculated that the total energy of the nucleosome DNA and Linker-DNA sequences is different.The six flexible parameters are introduced to analyze the energy of the neighboring dinucleotides on each site.The position weight matrices(PWM)was multiplied with the flexible parameters of each site in the sequence,the second scoring function was defined.3.In order to further explore the influence of the neighboring dinucleotides interaction energy on the nucleosomes positioning,we combined six kinds of structural information to defined the third scoring function.The nucleosome DNA and Linker-DNA sequences in the six datasets form positive and negative sets respectively.The three scoring functions were used to score each sequence in the datasets to predict the nucleosome positioning in six datasets.The prediction results of the three scoring methods were evaluated by 10 fold cross validation.The prediction success rates of the nucleosome positioning for three datasets of Saccharomyces cerevisiae reached to 98.35%,99.61% and 83.49%,and the success rates of nucleosome positioning for H.sapiens,C.elegans and D.melanogaster datasets were 70.65%,87.02% and 71.69% respectively.In addition,comparing with the similar prediction algorithms,our calculation results have achieved better results than the results of previous studies for the prediction of nucleosome positioning in Saccharomyces cerevisiae.For the predictions of nucleosome positioning in other three kinds of model organisms(H.sapiens,C.elegans and D.melanogaster)have been made,the sensitivity of H.sapiens is increased by 11.96%,the specificity of D.melanogaster is increased by 5.52%.The sensitivity of C.elegans is raised by 11.96%,and its prediction success rate is improved by 3.47%.It also explains the sequence preference of nucleosome and the determinants of the spatial structure from the energy of the sequence,to reveal a more biological prediction direction.
Keywords/Search Tags:nucleosome localization, position correlation rating function, sequence energy rating function, sequence preference
PDF Full Text Request
Related items