Abstract
A combination of next-generation sequencing technologies and mate-pair
libraries of large insert sizes is used as a standard method to generate
genome assemblies with high contiguity. The third-generation sequencing
techniques also are used to improve the quality of assembled genomes.
However, both mate-pair libraries and the third-generation libraries
require high-molecular-weight DNA, making the use of these libraries
inappropriate for samples with only degraded DNA. An in silicomethod that generates mate-pair libraries using a reference genome was
devised for the task of assembling target genomes. Although the
contiguity and completeness of assembled genomes were significantly
improved by this method, a high level of errors manifested in the
assembly, further to which the methods for using reference genomes were
not optimized. Here, we tested different strategies for using reference
genomes to generate in silico mate-pairs. The
results showed that using a
closely related reference genome from the same genus was more effective
than using divergent references.
Conservation of in silicomate-pairs by comparing two references and using those to guide genome
assembly reduced the number of misassemblies (18.6% – 46.1%) and
increased the contiguity of assembled genomes (9.7% – 70.7%), while
maintaining gene completeness at a level that was either similar or
marginally lower than that obtained via the current method. Finally, we
compared the optimized method with another reference-guided assembler,
RaGOO. We found that RaGOO produced longer scaffolds (17.8 Mbp vs 3.0
Mbp), but resulted in a much higher misassembly rate (85.68%) than our
optimized in silico mate-pair method.