| Phylogenetic tree construction is the final form of paleontological phylogenetic analysis and is a very important computational biology research field.Phylogenetic trees help humans understand the origin,evolution,and future directions of organisms.In paleontological phylogenetic analysis,morphological data collected from fossils are usually used to construct a developmental tree that can reflect the evolutionary relationship between species according to the morphological characteristics of species.At present,paleontological phylogenetic analysis research is facing the following problems: First,due to the incomplete preservation of fossils,the differences between species,and the existence of hierarchical relationships between features,there are missing and inapplicable data in paleontological morphological data.Second,as the number of species increases,the number of topological structures that can be formed increases exponentially.At present,the tree-building methods based on a single optimization principle have conflicts in selecting the optimal topological structure,which will affect the accuracy of phylogenetic tree construction.Finally,in the absence of a standard tree,there is currently no gold standard to evaluate developmental tree construction results.Aiming at the above problems,the main work of this paper is as follows:(1)Simulate the characteristics of paleontological morphological data,use simulated data containing standard trees,introduce missing data in different proportions into the data,and treat inapplicable data as missing data.Through simulation experiments,the effects of the number of species and features of different sizes in the feature matrix of the simulated data and the missing ratio of the data on the optimization target selection when building a phylogenetic tree were explored.According to the topological difference between the single-objective optimization tree-building result set and the standard tree corresponding to the simulated data,the different optimization objectives are sorted,and different types of data are obtained to rank the construction pros and cons of different optimization objectives.It provides guidance for the selection of optimization targets for phylogenetic analysis with missing data.This simulation study can also provide support for the selection of optimization targets for multi-objective optimization in subsequent work.(2)In the process of building a single optimization objective,the results of different optimization objectives conflict with each other.This paper proposes a method for constructing a phylogenetic tree based on dynamic multi-objective optimization.Combined with the conclusions in the previous chapter,the establishment target of multi-objective optimization is dynamically selected according to the size of the data and the proportion of missing data.A multi-objective optimization method is used to construct a relatively optimal phylogenetic tree set that satisfies multiple objectives.The experimental results show that compared with the maximum parsimony method,the maximum likelihood method,the ratchet method and the random method,the constructed phylogenetic tree set has a smaller RF distance(smaller topological difference)from the standard tree.The method in this paper can ensure the relative optimality under dynamic multiple objectives,solve the problem of conflict between objectives in the single objective optimization tree construction,and can also obtain a phylogenetic tree that is closer to the real tree.Informative tools for paleontologists to perform phylogenetic analysis.(3)There is often no standard tree in paleontological phylogenetic analysis,and there is currently no gold standard to evaluate developmental trees.In this paper,for the optimization-based phylogenetic tree construction method,the tree results are proposed from the perspective of topological diversity.The evaluation model of the topological diversity of the set-constructs the topological diversity evaluation model of the result set based on the phylogenetic tree of the hypervolume.The hypervolume can evaluate the extensiveness and uniformity of the tree set and get a more diverse tree topology.In this paper,the result tree set generated by the optimized phylogenetic tree construction method is mapped into the solution set space,and the hypervolume can make those solution sets that are more dispersed in the space have higher scores.The hypervolume can get more diverse tree topologies,providing paleontologists with more choices for building phylogenetic trees. |