Orthology inference, phylogeny and gene family evolution
We inferred PCG orthology across ten insects and one crustacean and clustered 90.1% of the genes into 21,942 orthogroups (gene families) (Figure 3a, Table S7). Among these groups, 4,009 orthogroups, including 1,372 single-copy orthogroups, were present in all species, and 4,640 orthogroups, containing 20,304 genes, were species-specific. A total of 526 orthogroups, containing 4,601 genes, were unique to Collembola. FCDK exhibited 13,970 gene families, 572 (2,285 genes) of which were unique; FCSH exhibited 13,675 gene families, including 204 unique families (820 genes) (Table S7).
After aligning, trimming and filtering, a final matrix of 499,899 amino acid sites from 1,304 single-copy genes was used for phylogenetic inference and dating estimates. The identified phylogenetic relationships were consistent with recent phylogenomic studies (Misof et al., 2014): Collembola was located at the base of Hexapoda, and Diplura was sister to insects (Figure 3a). Folsomia (Isotomoidea) separated from Entomobryoidea in the Late Triassic-Early Jurassic (186.6‒216.8 Mya). FCDK and FCSH diverged in the middle Neogene (11.7‒14.7 Mya), indicating that the two strains had been separated for long enough to be considered independent species.
Gene family evolution analyses with CAFÉ identified a large number of significantly expanded families in most collembolan species (i.e., Entomobryomorpha). Among these families, a total of 224 and 81 were expanded in FCDK and FCSH, respectively (Figure 3a, Table S8). The two strains shared large expanded cytochrome P450, ABC transporter, zinc finger, Sec14, pickpocket, lactase-phlorizin hydrolase, chitin-binding type-2 domain-containing protein, F-box, and ionotropic receptor families (Figure 3b, c; Table S9), which have generally been found to be expanded in two other Entomobryomorpha species (Faddeeva-Vakhrusheva et al., 2017; Zhang et al., 2019) and play an important role in the adaptive evolution of Collembola. Relative to FCSH, FCDK showed additional expansions of histone, glutathione S-transferase, lytic polysaccharide monooxygenase (LPMO), exoskeleton protein, bacillopeptidase F, beta-lactamase, tenascin, ATP-dependent DNA helicase, down syndrome cell adhesion molecule-like protein, chymotrypsinogen, gustatory receptor and neuroligin sequences, which are related to genetic modification, detoxification, cuticle and nervous system development, digestion, chemosensation, antibiotic biosynthesis and lignocellulose degradation (Figure 3b, c). In addition to terms related to the regulation of translation and transcription factors, GO and KEGG enrichment analyses of expanded and species-specific families involved in symbiotic interactions and related biological processes was performed (Figure S3a‒d). However, these terms were absent or generally received few annotations in FCSH (Figure S3e‒h).