Font Size: a A A

A Classification Research For Metagenomics Sequences Based On Deep Learning

Posted on:2020-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:D R TangFull Text:PDF
GTID:2370330575994248Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of gene sequencing technology has led to a index decline in the cost of sequencing,and next-generation sequencing technology is widely used.At the same time,the genomes of different microorganisms in a complex environment can be sequenced to obtain a large amount of microbial gene data.Metagenomics directly obtains the entire DNA sequence of the microorganism through 16 S rRNA amplification technology,and analyzes the abundance of species in the microbial community with the sequence information,and then obtains the functions of the community with the abundance information.The fragments generated by 16 S rRNA sequencing have both conservatism and universality.The conservatism can be used to trace the origin of the species,and the universality can identify different species.Studies have shown that human intestinal flora is closely related to disease and metabolism,and metagenomics analysis has become an important auxiliary method for studying microbial communities.An important step in metagenomics is to identify microbial taxonomy.Many methods have been proposed to solve this problem,but there is still much room for improvement in the classification accuracy of these methods.Aiming at the problem of metagenomics classification,this paper proposes a classification model of hybrid deep convolutional neural network and fully connected neural network.The model can reduce the dimensionality of data feature during the convolutional neural network phase and learns the nonlinear relationship between various features in the latter phase.The model was trained and tested by three data sets from the RDP and Greengenes databases,which contained the 16 S sequence of bacteria and archaea and the ITS sequence of fungi.The trained model can assign a label of the database to a given query sequence,and use the GPU to realize the parallel assignment of multiple query sequences without any reference database.The main contributions of this paper are:(1)Extracting feature of metagenomics sequences.Two different feature extraction methods are used,one based on k-mer,which divides the entire sequence by k bases to form a feature space.The other is based on alignment,which first processes sequences of unequal length into a sequence of the same length by global alignment.The gene sequence is a string of information that is encoded prior to training,and the actual biological significance of the sequence is taken into account in the encoding process.(2)A hybrid deep neural network model based on deep learning is designed for the classification of metagenomics sequences.The deep neural network model learns the nonlinear features in the gene data layer by layer,and then uses these hierarchical feature data to represent metagenomics sequences.The trained model can be saved and visualized.(3)Processing the three data sets in the two databases into a consistent format.Three different methods was trained and tested in each data set,where the RDP classifier uses default parameters.For the model designed in this paper,the parameters of the model are determined through multiple sets of experiments.The classification performance of three different methods was evaluated using classification criteria such as Precision,Recall,and F1-score.
Keywords/Search Tags:deep learning, convolutional neural network, classification, metagenomics
PDF Full Text Request
Related items