Orthology inference, phylogeny and gene family evolution
We inferred PCG orthology across ten insects and one crustacean and
clustered 90.1% of the genes into 21,942 orthogroups (gene families)
(Figure 3a, Table S7). Among these groups, 4,009 orthogroups, including
1,372 single-copy orthogroups, were present in all species, and 4,640
orthogroups, containing 20,304 genes, were species-specific.
A total of 526 orthogroups,
containing 4,601 genes, were unique to Collembola. FCDK exhibited 13,970
gene families, 572 (2,285 genes) of which were unique; FCSH exhibited
13,675 gene families, including 204 unique families (820 genes) (Table
S7).
After aligning, trimming and filtering, a final matrix of 499,899 amino
acid sites from 1,304 single-copy genes was used for phylogenetic
inference and dating estimates. The identified phylogenetic
relationships were consistent with recent phylogenomic studies (Misof et
al., 2014): Collembola was located at the base of Hexapoda, and Diplura
was sister to insects (Figure 3a). Folsomia (Isotomoidea)
separated from Entomobryoidea in the Late Triassic-Early Jurassic
(186.6‒216.8 Mya). FCDK and FCSH diverged in the middle Neogene
(11.7‒14.7 Mya), indicating that the two strains had been separated for
long enough to be considered independent species.
Gene family evolution analyses with CAFÉ identified a large number of
significantly expanded families in most collembolan species (i.e.,
Entomobryomorpha). Among these families, a total of 224 and 81 were
expanded in FCDK and FCSH, respectively (Figure 3a, Table S8). The two
strains shared large expanded cytochrome P450, ABC transporter, zinc
finger, Sec14, pickpocket, lactase-phlorizin hydrolase, chitin-binding
type-2 domain-containing protein, F-box, and ionotropic receptor
families (Figure 3b, c; Table S9), which have generally been found to be
expanded in two other Entomobryomorpha species (Faddeeva-Vakhrusheva et
al., 2017; Zhang et al., 2019) and play an important role in the
adaptive evolution of Collembola. Relative to FCSH, FCDK showed
additional expansions of histone, glutathione S-transferase, lytic
polysaccharide monooxygenase (LPMO), exoskeleton protein,
bacillopeptidase F, beta-lactamase, tenascin, ATP-dependent DNA
helicase, down syndrome cell adhesion molecule-like protein,
chymotrypsinogen, gustatory receptor and neuroligin sequences, which are
related to genetic modification, detoxification, cuticle and nervous
system development, digestion, chemosensation, antibiotic biosynthesis
and lignocellulose degradation (Figure 3b, c). In addition to terms
related to the regulation of translation and transcription factors, GO
and KEGG enrichment analyses of expanded and species-specific families
involved in symbiotic interactions and related biological processes was
performed (Figure S3a‒d). However, these terms were absent or generally
received few annotations in FCSH (Figure S3e‒h).