2.3 | Generating in silico mate-pair libraries using the original pipeline
Multiple sets of in silico mate pairs were generated using the original in silico pipeline “cross-mates” (Fig. 2); (Grau et al., 2018). First, reads of the target organism were mapped onto the repeat-masked reference genome using BWA-MEM (Li, 2013) and default settings. A consensus was then computed using samtools/bcftools with the samtools legacy variant calling model (Li, 2011). Read pairs (mate pairs) were sampled from the consensus in systematic mode, that is, using exact insert sizes and sampling fragments at regularly spaced offsets, and skipping regions of coverage lower than three. For the test assemblies, in silico mate pairs were generated with at least 30x coverage each, with multiple insert sizes ranging from 500 bp to 200 Kb (500 bp, 1 Kb, 1.5 kb, 2 Kb, 5 Kb, 10 Kb, 20 Kb, 50 Kb, 100 Kb, 200 Kb). The in silico mate pairs generated using reference genomes from different grades of taxonomy were named as ‘species name*’.