Orthology identification and phylogenetic inference
We inferred PCG sequence orthology across eleven arthropod species: one crustacean (D. magna ), one dipluran (Catajapyx aquilonaris ), four insects (Zootermopsis nevadensis , T. castaneum , A. mellifera , D. melanogaster ), and five collembolans (Sinella curviseta , Orchesella cincta ,Holacanthella duospinosa , FCDK, FCSH). Protein sequences ofC. aquilonaris and H. duospinosa were downloaded from i5K, those of S. curviseta were obtained from FigShare (https://doi.org/10.6084/m9.figshare.7286231.v2), and other data were procured from NCBI. After removing redundant isoforms, orthogroups (gene families) were inferred using OrthoFinder v2.5.2 (Emms & Kelly, 2019), and Diamond was employed for sequence alignment in ultrasensitive mode (‘-S diamond_ultra_sens’).
Single-copy orthologues estimated with OrthoFinder were used to infer phylogeny and divergence times. We aligned the protein sequences of each orthologue using MAFFT v7.394 (Katoh & Standley, 2013) with the high-accuracy L-INS-I method, trimmed unreliable homologous sites using BMGE v1.12 (Criscuolo & Gribaldo, 2010) with stringent parameters (‘-m BLOSUM90 -h 0.4’), and concatenated individual alignments into a matrix. We then estimated substitution models and partitioning schemes and reconstructed the phylogeny using IQ-TREE v2.0.7 (Minh et al., 2020); genes that violated SRH (stationary, reversible and homogeneous) assumptions were excluded (‘–symtest-remove-bad –symtest-pval 0.10’); to reduce the computational burden, the model was restricted to LG (‘–mset LG’), and the top 10% of partitioning schemes were considered (‘–rclusterf 10’); ultrafast bootstrap and SH-like approximate likelihood ratio tests were calculated to assess node support (‘-B 1000 –alrt 1000’). We estimated divergence times using MCMCTree within the PAML v4.9j package (Yang 2007); the JC69 substitution model, the independent rate clock model, and the approximate likelihood calculation and ML estimation of branch lengths were applied. We repeated the runs at least twice to ensure convergence, and each ran for 60,000 generations, with the first 10,000 considered burn-in. Five fossils from the PBDB database (https://www.paleobiodb.org/navigator/) were applied for node calibration: one Branchiopoda (<541 Mya), one Hexapoda (<485.4 Mya), the most recent common ancestor (MRCA) of Diplura and Insecta (>407.6 Mya), one Holometabola (315.2‒382.7 Mya), and the MRCA of Coleoptera and Diptera (>295.5 Mya).