Font Size: a A A

The Prediction Of Human Dnase? Hypersensitive Sites By Using DNA Sequence Information

Posted on:2021-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2370330611455144Subject:Biophysics
Abstract/Summary:PDF Full Text Request
In genetics,the hypersensitive site is a relatively short chromatin region,which can be found on all active genes.DNase? hypersensitive sites(DHSs)are special chromosomal regions that lose their advanced structures and can be recognized,combined and cut by DNase? enzymes.As a result,these DNA regions become exposed and more easily contacted by enzymes to play their biological roles.The research on DHSs in DNA sequence is of great significance to understand the mechanism of transcription regulation and to locate some cis-regulatory elements(such as promoter,enhancer,insulator,silencer and locus control region).Therefore,the recognition of DHSs has become an effective way to find functional DNA elements from non-coding sequences.Although many experimental methods have been proposed to identify DHSs in the post-genomic era,these experimental methods need a lot of labors,raw materials,equipment and time.At the same time,the emergence of these experimental methods also provides valuable experimental data for the follow-up research.Therefore,it is not only of great significance to develop a calculation method for predicting DHSS sequence,but also an indispensable step to promote scientific development.In this paper,we proposed a prediction classifier based on DHSs sequence information to identify human DHSs.The benchmark data including 1017 sample sequences was established through reliable experimental methods.These sequences are about 240 bp long and have been excluded the sequence redundancy.Our model used six feature extraction methods including k-mer,dinucleotide physicochemical value,type II pseudo nucleotide component,two window based type II pseudo nucleotide component,g-gap of kmer,natural vector of DNA sequence and k-mer combined physical and chemical property matrix to construct feature vectors.The best feature extraction algorithm and mRMR algorithm were used to filter the optimal features.The F-score was utilized to to get the best feature collection.Finally,by comparing SVM with random forest,we found that SVM-based classification model could produce the best prediction.In view of the fact that the recognition of DHSs can promote the discovery of regulatory factors in the subsequent non encoding sequence,our proposed prediction classification model has important reference significance.The AUC and finalaccuracy of the model reach 0.85 and 0.87,respectively,which is superior to the existing DHSs prediction model for convenience.At the same time,we provide an online website for researchers in related fields.
Keywords/Search Tags:DNase? hypersensitive site, machine learning, feature selection, support vector machine
PDF Full Text Request
Related items