Abstract

Bayesian analysis of morphological data is becoming increasingly popular mainly (but not only) because it allows for time-calibrated phylogenetic inference using relaxed morphological clocks and tip dating whenever fossils are available. As with molecular data, recent studies have shown that modeling among-character rate variation (ACRV) in morphological matrices greatly improves phylogenetic inference. In a likelihood framework this may be accomplished, for instance, by employing a hidden Markov model to assign characters to rate categories drawn from a (discretized) $Gamma$ distribution and/or by partitioning data sets according to rate heterogeneity and estimating per-partition branch lengths, conditioned on a single topology. While the first approach is available in many phylogenetic analysis software, there is still no clear consensus on how to partition data, except perhaps in the simplest cases (e.g., “by codon” partitioning of coding sequences). Additionally, there is a trade-off between improvement in likelihood scores and the number of free parameters in the analysis, which rises quickly with the number of partitions. This trade-off may be dealt with by employing statistics that penalize overfitting of complex models, such as Akaike or Bayesian information criteria, or the more recently introduced stepping-stone method for marginal likelihood approximation. We applied the latter to three distinct matrices of discrete morphological data and demonstrated that sorting characters by homoplasy scores (obtained from implied weighting parsimony analysis) outperformed other partitioning strategies (anatomically-based and PartitionFinder2). The method was in fact so efficient in segregating characters by rates of evolution that no within-partition ACRV modeling was necessary, while among-partition rate variation was adequately accommodated by rate multipliers. We conclude that partitioning by homoplasy is a powerful and easy-to-implement strategy to address ACRV in complex data sets. We provide some guidelines focusing on morphological matrices, although this approach may be also applicable to molecular data sets.

Source link