Font Size: a A A

IDHS-LR: Identify DNase?Hypersensitive Sites

Posted on:2021-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2370330614953554Subject:Statistics
Abstract/Summary:PDF Full Text Request
DNase?hypersensitive sites(DHSs)provide important information on the status of chromatin in animals and plants cells.Accurately identifying DHSs is an effective method to discover enrichment area of transcriptional regulatory elements,including Promoter,enhancer,inhibitor and insulator.Which has the function of cis-regulatory elements control intensity and specificity of the biological gene expression,to have a great impact on human disease related research.Identifying DHSs will contribute to scientists for better exploring the transcriptional regulatory mechanism of DNA,deepening our understanding on the accessibility of chromatin,and raising awareness of disease,genetic evolution,and aging,etc.Benefit from the development of high-throughput sequencing technology,some new biological technologies are applied to detect DHSs,but they will not only consume a lot of time and energy to achieve a complete sequencing,but also cost a lot of money,which hinders the progress of subsequent experiments.So it is necessary to develop quickly and effectively computer methods to identify such sites.Based on DNA sequence information and machine learning model,this thesis proposes a method based on logistic regression to predict DHSs,called i DHS-LR.This method selects the optimal feature subset from the feature set containing dinucleotide spatial autocorrelation,k-mers,and trinucleotide physical and chemical properties TPCP.Finally this method can obtain an AUC of 0.915 and the accuracy of 88.79%.The cross-validation results show that this method was superior to other existing methods.
Keywords/Search Tags:DNase?hypersensitive site, K-mer, Logistic regression, Dinucleotide-based spatial autocorrelation(DSA)
PDF Full Text Request
Related items