2.2 | Data for the target species and references
Raw data (fastq files) of the target species, C. batrachus ,T. bimaculatus, T. flavidus , and T. buxtoni were
downloaded from the ENA database website
(https://www.ebi.ac.uk/ena/browser/home, SRR7440020, SRR8285222,
SRR7881551, SRR6913452, SRR6913453, SRR6913455). PCR duplicates were
deleted using Prinseq (Schmieder & Edwards, 2011). Adapters and
low-quality bases were removed using Trim Galore
(https://github.com/FelixKrueger/TrimGalore). Next, the reads were
corrected using k-mers with BFC (Li, 2015). Multiplicity distribution of
the 23-mers was counted using Jellyfish2 (Marçais & Kingsford, 2011)
and genome coverage was estimated using KrATER
(https://github.com/mahajrod/KrATER). After processing, the final genome
coverage of C. batrachus , T. bimaculatus , T.
buxtoni , and simulated ancient DNA clean reads were all more than 30 x
(Table S2). The insert sizes of paired-end reads were 180 bp, 300 bp,
250 bp, 350 bp, for C. batrachus , T. bimaculatus , T.
flavidus , and T. buxtoni , respectively.
Reference genome assemblies of C. macrocephalus ,A. melas , T.
rubripes , T. flavidus , T. nigroviridis , T.
bimaculatus , M. mola , T. scriptus , T.
strepsiceros , B. grunniens , and M. moschiferus were
downloaded from the National Center for Biotechnology Information
(NCBI); (Table S3-S5). The repeat contents of these genomes were masked
using RepeatMasker (http://repeatmasker.org/).