Genome sequencing, assembly and annotation
One living individual of T. polyphylla was collected from the
Chongdugou scenic spot in Henan, China (111°39’41.64’ ‘E, 33°56’23.87 ‘’
N) for whole genome sequencing. We sequenced and assembled the genome
using a combination of Illumina short-read sequencing and Nanopore
long-read sequencing. The completeness of the genome assembly was
assessed with sets of both the Core Eukaryotic Genes Mapping Approach(CEGMA; Parra et al., 2007) and benchmarking universal
single-copy orthologs (BUSCO; Simao et al., 2015) . For
repetitive element annotation, simple sequence repeats (SSRs), tandem
repeats and transposable elements (TEs) were identified in the T.
polyphylla genome. We combined de novo , homology-based, and RNA
sequencing-aided methods for gene prediction. For details, see
Supporting Information Methods S1.
Hi-C library construction and
chromosome assembly
To generate a chromosome-level assembly of the T. polyphyllagenome, a Hi-C library was constructed following Rao’s protocol
(Rao et al., 2014 ). Fresh leaf cells were fixed in 1%
formaldehyde for cross-linking. The cross-linked DNA was homogenized by
tissue lysis, digested with DpnII restriction endonuclease,
labelled with biotin-14-dCTP, and ligated using T4 DNA Ligase. After
reversal of the cross-links, the ligated DNA was purified and sheared
into 300–600 bp fragments. Biotinylated DNA fragments were extracted
using streptavidin beads to construct the Hi-C fragment library. After
PCR enrichment, high-quality libraries were sequenced on an Illumina
NovaSeq 6000 platform to produce approximately 160.46 Gb data.
The cleaned Hi-C data were mapped to the initial genome assembly using
BOWTIE2 v2.3.2 (Langmead & Salzberg, 2012) with the end-to-end
model (-very-sensitive -L 30), and only unique mapped read pairs were
retained in further analysis. Then, the valid mate pair reads were used
for chromosome-level genome assembly, and the contigs of the draft
genome were sorted, oriented, and divided into different chromosomal
groups using the LACHESIS pipeline (Burton et al., 2013) with
the following parameters: CLUSTER MIN RE SITES = 100, CLUSTER MAX LINK
DENSITY = 2.5, CLUSTER NONINFORMATIVE RATIO = 1.4, ORDER MIN N RES IN
TRUNK = 60, and ORDER MIN RES IN SHREDS = 60.