Genome assembly
PacBio long reads were assembled using Flye v2.7.1 (Kolmogorov et al., 2019), with a minimum overlap between reads of 1,000 and two rounds of self-polishing (‘-m 1000 -i 2’). Primary contigs were polished with two iterations of Illumina short reads using NextPolish v1.3.1 (Hu et al., 2020). Quality control was performed for short reads prior to polishing using the ‘bbduk.sh’ script in BBTools package v38.82 (Bushnell, 2014): quality trimming (> Q20), length filtering (> 15 bp), polymer trimming (> 10 bp) and correction of overlapping paired reads. Redundant haplotypic duplications were removed using Purge_Dups v1.0.1 (Guan et al., 2020) with the default settings. All sequence alignment tasks were performed using Minimap2 v2.17 (Li, 2018) within the above polishing and purging progress. For Hi-C scaffolding, read alignment to the assembly, duplicate removal, and Hi-C contact extractions were executed using Juicer v1.6.2 (Durand et al., 2016) employing BWA v0.7.17 (Li & Durbin, 2009) as the aligner. We then used the 3D-DNA v180922 pipeline (Dudchenko et al., 2017) to anchor contigs to generate pseudochromosomes. Possible assembly errors, such as misjoins, translocations, and inversions, were manually corrected using the Assembly Tools module within Juicebox v1.11.08 (Durand et al., 2016). Potential contaminants were detected using MMseqs2 v11 (Steinegger & Söding, 2017) to perform BLASTN-like searches against the NCBI nucleotide (nt) and UniVec databases. Genome quality was further evaluated based on genome completeness and the mapping rate of raw reads. Genome completeness was assessed using BUSCO v3.0.2 (Waterhouse et al., 2018) against the arthropod gene set (arthropoda_odb10, n = 1,013). Raw PacBio and Illumina reads were aligned to the assembly using Minimap2, with the mapping rate calculated with SAMtools v1.9 (Danecek et al., 2021).