| Gobioidei(Actinopterygii: Perciformes) is the largest clade of perciforms with the most number of species, comprising more than 270 genera and 2000 species that contribute 23% of all perciform species. The gobioids are widely distributed in fresh water or coastal water of tropical, sub-tropical and temperate regions. Most of the gobioids are benthic or burrow dwelling. Because many gobioid clades differentiated into many species in a very short period of time, it is very difficult to resolve their phylogenetic relationship, a phenomenon often seen in phylogenetic study of other species. Previous phylogenetic studies on gobioid fishes either were based on morphological characters or on a single or a few gene markers(mitochondrial or nuclear), which is particularly problematic in resolving the interrelationships in such a fast-evolving group. With the development of high-throughput sequencing, more and more phylogenetic researches are carried out on the basis of genome-wide data rather than confined within a few molecular markers. Although genome-scale data can be easily obtained, the huge number of short reads makes the processing and analysis of data very difficult. Because of the large number of gene markers, it is extremely important to decide how to select the best molecular markers with strong phylogenetic signal suitable for various phylogenetic questions. How to select molecular marker has always been the center of research, but previous study were mainly based on the simulation data or few empirical loci rather than testing their methods in large number of empirical data. Thus, a thorough study on the strategy of selecting multiple loci molecular marker should be carried out using thousands of empirical gene loci.The content of this thesis was grouped into two chapters. In the first chapter, tens of thousands single-copy nuclear coding sequences were collected for reconstructing the phylogenetic interrelationship among Gobioidei and its outgroups, applying three different tree reconstructing strategies. The divergence time of each gobioidei clade was estimated calibrated on five fossils. In the second chapter, a large number of emperical data were examined for testing a suite of indices for molecular marker selection, in order to solve the existing problems in analyzing genome-scale data. The main results of this thesis are listed as follow:1. I have sampled 36 representative species in each family of suborder Gobioidei, and 7 other perciform species as outgroup. Applying target enrichment and Illumina high-throughput sequencing techniques, I obtained large number of raw reads, from which averagely ~9,795(6,099~14,567 CDS) targeted single-copy nuclear coding sequences were assembled after quality control, short reads assembly, orthologous gene sequence identification, multiple sequences alignment and other data preprocessing methods. Both species tree(MP-ESTã€STAR) and gene tree(RaxML) methods were adopted to build the phylogenetic interrelationship of the Gobioidei and outgroups. Meanwhile, the relaxed molecular clock method was used to estimate the divergence time of each clade calibrating on five fossil records. And Shimodaira-Hasegawa(SH) test was used to test the alternative hypotheses. The results show that: 1) The species trees and gene trees, reconstructed on the basis of massive molecular markers, are similar to each other; 2) The Gobioidei is a monophyletic group, which can be classified into 7 families: Gobiidae, Gobionellidae, Eleotridae, Butidae, Milyeringidae, Odontobutidae and Rhyacichthyidae. Gobiidae and Gobionellidae are derived groups, while Rhyacichthyidae is the most basal one; 3) Both traditionally defined Eleotrinae and Butinae are monophyletic, I propose to raise their status family, Eleotridae and Butidae; 4) The blind cave gobies are monophyletic, and they should be classified as their own family, Milyeringidae, instead of assigning to Eleotridae as in the traditional classification; 5) The Oxudercinae and Amblyopinae nested in Gobionellidae are paraphyletic group; 6) In the five putative outgroup species of gobioids using in this study, family Apogonidae is the sister group of Gobioidei, and then they together join with family Kurtidae; 7) suborder Gobioidei originates from early Eocene(~61Mya).2. From the data collected for the 43 samples above mentioned, I selected 694 coding sequences(CDS) that had no missing data to reconstruct individual gene trees and concatenated gene tree based on the 694 CDS as well. Taking the concatenated tree as a reference tree, I calculated tree distance between every gene tree and reference tree, and calculated five different indices for molecular marker selection that include pairwise distance(p-distance),relative composition variability(RCV), stemminess, phylogenetic informativeness(PI) and molecular clock-likeness(MCL). According to the values of the index, I sorted the gene markers, selected the best 100 markers and the worst 100 genes markers to build consensus tree to compare the usefulness of each index for selecting gene markers for phylogenetic inference. The results are: 1) by comparing the consensus trees constructed on the markers selected using the five indices to the reference tree, I found no difference between the reference trees and consensus trees on family level, while differences of varying degrees exist below family level; 2) MCL was positively correlated with sequence length, and the longer the gene sequence was, the higher the MCL value become. The MCL parameters should be calibrated on sequence length; 3) It is obvious that tree distance did not correlate with p-distance, RCV or PI; no obvious correlation existed between tree distance and stemminess; MCL1(calibrated MCL) obviously was correlated with treedist. As the MCL1 value of gene got higher, the distance between gene tree and reference tree increased; 4) In different time range, the phylogenetic information quantity contained in gene differs with each other. The higher the PI value was and the larger the quantity of phylogenetic information held, the more helpful it was for solving problems in phylogeny of a certain period; 5) The higher the MCL1 value was, the more clock-like the phylogenetic tree became. |