Chromosome-level genome assembly and annotation
We produced 121.19 Gb and 62.21 Gb of total sequencing data for the
genome assemblies of FCDK and FCSH, respectively (Table 1). After
primary assembly, polishing, redundancy removal, Hi-C scaffolding, and
contaminant detection, we generated two highly contiguous, nearly
complete chromosome-level genomes. Detailed assembly statistics are
summarized in Table 1. FCDK and FCSH had genome sizes of 219.08 Mb and
153.90 Mb, scaffold N50 lengths of 38.47 Mb and 25.75 Mb, and GC
contents of 37.49% and 38.54%, respectively. More than 97% of the
genome was anchored to seven pseudochromosomes in both strains (Figure
1i, Table 1). Each chromosome of FCDK was 24.65% ‒ 54.11% longer than
the corresponding homologous chromosome of FCSH (Figure 1j, Table S1).
High integrity was revealed by high ratios of single-copy BUSCO genes
(97.3% and 97.0%), very low ratios of duplicated genes (0.8% and
0.9%) indicated no obvious redundancy in the assemblies, and high
mapping ratios of long and short reads (> 94%) confirmed
the high quality of the two assemblies. In addition, a scaffold of
~0.5 Mb in size corresponding to a Wolbachiaendosymbiont was detected in the FCDK assembly, showing great similarity
(99.9%) to Wolbachia sequences assembled from the FCBL strain
(Faddeeva-Vakhrusheva et al., 2007).
We masked repetitive regions with 22.61% (49.53 Mb) and 10.03% (15.43
Mb) of the genomes of FCDK and FCSH, respectively (Table 1, Table S2 and
Table S3). DNA, LINE, LTR and unclassified transposon elements were
significantly enriched at the FCDK-specific regions of Chr1, 3, 4, and 7
(Figure 1i). Relative to FCSH, many repeat families of FCDK were
obviously expanded (Figure 1k), particularly families such as TcMar-Tc1,
CMC-EnSpm, Penelope, Gypsy, and Pao.
Using the Infernal and tRNAscan-SE automatic prediction pipelines, 396
and 334 ncRNAs were identified in FCDK and FCSH, respectively (Table 1,
Table S4 and Table S5). Both strains possessed 21 isotypes of tRNA and
lacked Supres. With regard to snRNAs, FCDK/FCSH exhibited 33/21
spliceosomal RNAs (U1, U2, U4, U5, U6, and U11), 3/3 minor spliceosomal
RNAs (U4atac, U6atac, and U12), 6/7 C/D box small nucleolar RNAs
(snoRNAs), 2/3 H/ACA box snoRNAs, and 1/1 other snoRNA (SCARNA8).
We predicted 25,139 and 21,609 PCGs for FCDK and FCSH, respectively
(Table 1 and Table S6). The annotation statistics of the two strains
were very similar in terms of the mean lengths of genes
(~4,000 bp), exons (~250 bp) and CDSs
(~200 bp) and the mean numbers of exons
(~8) and introns (~6.5) per gene.
However, FCDK showed a longer mean intron length than FCSH (312.1 vs.
263.5 bp). The BUSCO completeness of predicted proteins exceeded 97%
for both strains. In addition, the distribution patterns of PCGs and
transposons on chromosomes showed the opposite trends; i.e., chromosomal
regions of high gene density usually showed a low transposon density and
vice versa (Figure 1i). Protein domains of approximately 2/3 of the
genes of both strains were identified by InterProScan, and nearly 1/2 of
the predicted genes were annotated in GO and KEGG pathways (Table S6).