Abstract

Species are fundamental units in biological research and can be defined on the basis of various operational criteria. There has been growing use of molecular approaches for species delimitation. Among the most widely used methods, the generalized mixed Yule-coalescent (GMYC) and Poisson tree processes (PTP) were designed for the analysis of single-locus data but are often applied to concatenations of multilocus data. In contrast, the Bayesian multispecies coalescent approach in the software Bayesian Phylogenetics and Phylogeography (BPP) explicitly models the evolution of multilocus data. In this study, we compare the performance of GMYC, PTP, and BPP using synthetic data generated by simulation under various speciation scenarios. We show that in the absence of gene flow, the main factor influencing the performance of these methods is the ratio of population size to divergence time, while number of loci and sample size per species have smaller effects. Given appropriate priors and correct guide trees, BPP shows lower rates of species overestimation and underestimation, and is generally robust to various potential confounding factors except high levels of gene flow. The single-threshold GMYC and the best strategy that we identified in PTP generally perform well for scenarios involving more than a single putative species when gene flow is absent, but PTP outperforms GMYC when fewer species are involved. Both methods are more sensitive than BPP to the effects of gene flow and potential confounding factors. Case studies of bears and bees further validate some of the findings from our simulation study, and reveal the importance of using an informed starting point for molecular species delimitation. Our results highlight the key factors affecting the performance of molecular species delimitation, with potential benefits for using these methods within an integrative taxonomic framework.

Source link