Font Size: a A A

The Prediction Of Gene Modification Sites Based On Sequences

Posted on:2022-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:X LinFull Text:PDF
GTID:2480306746973919Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The modification of DNA and RNA is closely related to many life processes,especially in the biological processes such as cell function,structural stability and translation.Therefore,it is of great significance to accurately identify the modification sites of DNA and RNA.This paper focuses on the identification of DNA N4-methylcytosine modification sites(4m C)and RNA pseudouridine modification sites(?).By taking the DNA sequence and RNA sequence as the research object,and the DNA or RNA sequence features extracted manually or by deep learning as the input,a prediction model of DNA or RNA modification sites has been constructed.Through the effective combination of machine learning,deep learning and data mining techniques,the interrelationships between DNA and RNA sequences and specific sites are explored and revealed,with the main research results as follows:(1)Taking the RNA pseudouridine modification site(?)sequence as the research object,a fusion predictor of genoside modification sites based on feature selection method of particle swarm optimization and ensemble learning method is proposed.The predictor selected a total of six feature representation methods,and used binary particle swarm optimization(BPSO)to capture the optimal subsets of the six feature representations to improve the ability of sequence feature representation.Then,the support vector machine(SVM)was used to train and construct the prediction model for the six optimal feature subsets respectively.Finally,the six prediction models were fused by a parallel fusion strategy to finally construct the Pso EL-Pse U predictor.The experimental results show that the Pso EL-Pse U predictor proposed in this paper outperforms the state-of-the-art methods on the evaluation datasets of three species.Meanwhile,we have also built a free and publicly available online prediction platform for RNA pseudouridine site identification.(2)Taking DNA N4-methylcytosine modification site(4m C)sequence as the research object,an end-to-end gene modification site predictor based on attention mechanism and long short-term memory network(LSTM)is proposed.The predictor represented each nucleotide in the DNA sequence by the K-spacer nucleotide coding proposed in this paper combined with the trigonometric function position coding.Then combined with the idea of residual network,the multi-head attention mechanism and forward propagation network were used to construct the DNA attention distribution feature extraction module.To further capture the nucleotide dependencies in the DNA sequence,the method also employed a long short-term memory network(LSTM)for high-order feature extraction,which ultimately greatly improved the recognition accuracy of DNA 4m C modification sites.Judging from the results of the constructed benchmark dataset and independent test set,the Trans4 mc Pred predictor proposed in this paper has achieved significant improvements in five indicators and six species compared with the current state-of-the-art methods.
Keywords/Search Tags:DNA, RNA, Modification site recognition, Machine learning, Deep learning
PDF Full Text Request
Related items