| Methylation is one of the important mechanisms of regulator gene expression,which is of great significance for understanding gene regulation,discovering new therapeutic methods and diagnostic markers.The use of ONT(Oxford Nanopore Technologies)platform for nanopore sequencing and methylation recognition has advantages such as high resolution and sensitivity,but there are problems such as high data noise and low classification accuracy.Therefore,this article aims to efficiently classify nanopore methylation through deep learning and machine learning algorithms,in order to improve the accuracy and efficiency of data analysis.In addition,this article evaluates the impact of different factors on classification performance,including training data gradients and Batch_Size size to reveal the method of optimizing the classification performance of the model.This article introduces 5mC with specific localization points into Lambda DNA,uses PCR method to introduce methylation,and performs nanopore sequencing against PCR products and methylation PCR products.In order to effectively identify DNA methylation in nanopore sequencing data,this paper constructs a correlation model based on SVM,random forest,Transformer and LSTM algorithm for prediction and classification.The experimental results show that when 60% of the nanopore sequencing methylation data training set is used,the random forest algorithm can obtain 93.75% of the classification accuracy of the test set in 7 minutes.The classification accuracy of the Transformer model and LSTM model on the test set is 91.84% and 84.83%,respectively,while the SVM model is 77.53%.The experimental results in this paper show that the random forest model has better classification performance for small batch,noisy DNA methylation data.In order to explore the effects of different factors on the performance of machine learning and deep learning in the classification of nanopore methylation data,this paper studied different proportions of training data gradients and Batch_The impact of size on classification performance.The experimental results show that the random forest model performs best under the gradient division of methylation data of 70%training set,15% verification set and 15% test set;Under the gradient partitioning of methylation data in 80% training set,10% validation set,and 10% testing set,the deep learning model performs best.In addition,when facing methylation data under nanopore sequencing,set a larger batch_Size will be more advantageous for the training performance of deep learning models.These research results can provide useful references for the classification of nanopore methylation data. |