Font Size: a A A

Phylogeny Inference Based On Autoencoder And Monte Carlo Tree

Posted on:2022-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:M LiuFull Text:PDF
GTID:2480306521964369Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Phylogeny is a discipline that studies how to construct a phylogenetic tree between species.It is the core of understanding biodiversity,evolutionary history,ecology,etc.The only information available in paleontological phylogeny inference is the morphological data extracted from fossils.Due to the difficulty of fossil formation and mining,the data contains a certain amount of missing data and inapplicable data,which will affect the construction of the phylogenetic tree.In response to the above problems,the main work of this paper is as follows:1)In the morphological data simulation experiment,we explored how the proportion of missing data and the pattern of missing data affect the construction of a phylogenetic tree,and compared the effect of ignoring or deleting missing data in different situations.Provide guidance on how to deal with missing data in paleontological phylogeny inference.2)In order to solve the influence of missing data on the phylogeny inference,a two-stage missing imputation method based on autoencoders is proposed.First,use Multiple Imputation by Chained Equations to obtain the pre-interpolation value,and then perform multiple rounds of correction on the pre-interpolation value through an autoencoder trained from the known data,at the same time,the linear relationship between the data and the correlation between the hidden layer latitude are considered.both the UCI continuous data set and the morphological discrete data set have a high imputation accuracy rate,which better solves the problem of missing data affecting the construction of the phylogenetic tree.3)In order to solve the influence of inapplicable data,the Monte Carlo tree search algorithm combined with the inapplicable Fitch algorithm is used to perform phylogenetic tree inference on the inapplicable data set.Compared with generally treating the inapplicable data as missing data or new feature status,unapplicable Fitch algorithm is based on the assumption of maximum homology,which can deal with unapplicable data more reasonably.Using the Monte Carlo tree search algorithm to search for the phylogenetic tree can effectively avoid the problem of local optimum.At the same time,the branch length is used in the Monte Carlo tree as the pruning of the Monte Carlo tree,which narrows the search range.A good phylogenetic tree can be searched quickly.First,this article provides guidance on how to deal with missing data through simulation experiments;then,for missing data,a two-stage missing imputation method based on autoencoders is proposed;finally,for the phylogenetic tree construction problem with inapplicable data,The Monte Carlo tree search method combined with the inapplicable Fitch algorithm is used to construct the phylogenetic tree.Compared with the current phylogenetic methods mainly based on molecular data,the treatment of missing data and inapplicable data in this article is more reasonable.
Keywords/Search Tags:Morphological phylogeny, missing data, inapplicable data, autoencoder, Monte Carlo tree search
PDF Full Text Request
Related items