Chromosome-level genome assembly and annotation
We produced 121.19 Gb and 62.21 Gb of total sequencing data for the genome assemblies of FCDK and FCSH, respectively (Table 1). After primary assembly, polishing, redundancy removal, Hi-C scaffolding, and contaminant detection, we generated two highly contiguous, nearly complete chromosome-level genomes. Detailed assembly statistics are summarized in Table 1. FCDK and FCSH had genome sizes of 219.08 Mb and 153.90 Mb, scaffold N50 lengths of 38.47 Mb and 25.75 Mb, and GC contents of 37.49% and 38.54%, respectively. More than 97% of the genome was anchored to seven pseudochromosomes in both strains (Figure 1i, Table 1). Each chromosome of FCDK was 24.65% ‒ 54.11% longer than the corresponding homologous chromosome of FCSH (Figure 1j, Table S1). High integrity was revealed by high ratios of single-copy BUSCO genes (97.3% and 97.0%), very low ratios of duplicated genes (0.8% and 0.9%) indicated no obvious redundancy in the assemblies, and high mapping ratios of long and short reads (> 94%) confirmed the high quality of the two assemblies. In addition, a scaffold of ~0.5 Mb in size corresponding to a Wolbachiaendosymbiont was detected in the FCDK assembly, showing great similarity (99.9%) to Wolbachia sequences assembled from the FCBL strain (Faddeeva-Vakhrusheva et al., 2007).
We masked repetitive regions with 22.61% (49.53 Mb) and 10.03% (15.43 Mb) of the genomes of FCDK and FCSH, respectively (Table 1, Table S2 and Table S3). DNA, LINE, LTR and unclassified transposon elements were significantly enriched at the FCDK-specific regions of Chr1, 3, 4, and 7 (Figure 1i). Relative to FCSH, many repeat families of FCDK were obviously expanded (Figure 1k), particularly families such as TcMar-Tc1, CMC-EnSpm, Penelope, Gypsy, and Pao.
Using the Infernal and tRNAscan-SE automatic prediction pipelines, 396 and 334 ncRNAs were identified in FCDK and FCSH, respectively (Table 1, Table S4 and Table S5). Both strains possessed 21 isotypes of tRNA and lacked Supres. With regard to snRNAs, FCDK/FCSH exhibited 33/21 spliceosomal RNAs (U1, U2, U4, U5, U6, and U11), 3/3 minor spliceosomal RNAs (U4atac, U6atac, and U12), 6/7 C/D box small nucleolar RNAs (snoRNAs), 2/3 H/ACA box snoRNAs, and 1/1 other snoRNA (SCARNA8).
We predicted 25,139 and 21,609 PCGs for FCDK and FCSH, respectively (Table 1 and Table S6). The annotation statistics of the two strains were very similar in terms of the mean lengths of genes (~4,000 bp), exons (~250 bp) and CDSs (~200 bp) and the mean numbers of exons (~8) and introns (~6.5) per gene. However, FCDK showed a longer mean intron length than FCSH (312.1 vs. 263.5 bp). The BUSCO completeness of predicted proteins exceeded 97% for both strains. In addition, the distribution patterns of PCGs and transposons on chromosomes showed the opposite trends; i.e., chromosomal regions of high gene density usually showed a low transposon density and vice versa (Figure 1i). Protein domains of approximately 2/3 of the genes of both strains were identified by InterProScan, and nearly 1/2 of the predicted genes were annotated in GO and KEGG pathways (Table S6).