Annotation of xenobiotic detoxification-related gene families
In contrast to insects, xenobiotic detoxification-related gene families
are greatly expanded in Collembola, possibly due to their adaptations to
complex soil environments (Faddeeva-Vakhrusheva et al., 2017; Manni et
al., 2020). The copy numbers of these families of FCDK and FCSH may be
different, since the parthenogenetic strains show a wider distribution
and better adaptability than the sexual strains of F. candida . We
annotated the genes of five detoxification-related families, including
the cytochrome P450 (CYP), ATP-binding cassette transporter (ABC),
carboxyl/cholinesterase (CCE), UDP-glycosyltransferase (UGT), and
glutathione-S-transferase (GST) families, using the BITACORA v1.3
(Vizueta et al., 2020) pipeline, and we further manually checked them.
BITACORA performed initial BLASTP searches of the annotated proteins
generated via the automatic MAKER pipeline and TBLASTN analyses in the
genome assembly and confirmed the gene models with protein domains in
each family via HMMER searches (Altschul, 1997; Eddy, 2011). Reference
protein sequences of D. melanogaster , B. mori and F.
candida for the ABC, CCE, GST and UGT families were obtained from the
NCBI RefSeq database, whereas CYP sequences were mined from Dermauw et
al. (2020). HMM profiles of each family were downloaded from the PFAM
database: ABC (PF00005), CCE (PF00135), GST (PF14497, PF02798), CYP
(PF00067), and UGT (PF00201). A cut-off e-value of 1e-5 was applied for
BLAST and HMM searches. A close proximity algorithm was used to predict
novel genes from TBLASTN alignments with a maximum intron length of
15,000 bp. The resulting CYP sequences were manually examined based on
conserved protein structures, which were characterized by a four-helix
bundle (D, E, I and L), helices J and K, two sets of β sheets and a coil
‘meander’. The functions of predicted proteins were checked via online
BLASTP analysis in the nonredundant protein database (nr). The
classification of each family and possible sequence errors were assisted
by constructing phylogenetic trees. To construct the phylogenies of five
gene families, the amino acid sequences of each family were aligned
using MAFFT via the L-INS-I method and trimmed using trimAl v1.4.1
(Capella-Gutiérrez et al., 2009) with the ‘gappyout’ mode strategy.
Phylogenetic trees were constructed using IQ-TREE, with automatic model
selection and 1,000 ultrafast bootstrap replicates. Tree figures were
enhanced using online EvolView v3 (Subramanian et al., 2019).