Discussion
Many SV and CNV tools for exome data rely on depth of coverage signals
to identify likely candidates for structural changes in the genome in
short read Illumina data. For both, exome and genome data, the
effectiveness of this approach is limited by the availability of good
normalized control data from other genomic regions in the same
individual or other individuals of the same sequencing run. In case of
the trio-exome sequencing experiment from our patient, this baseline was
formed by other unrelated samples sequenced in parallel. Depth and
variability of the coverage in certain genomic regions also has an
influence on the ability of those callers to detect structural change to
the genome. Other CNV detection methods rely on a mix of other factors
to find likely candidates for variation. Pindel incorporates signals
from split reads. These are read pairs in which one of the two reads
cannot be aligned to the reference genome and is assumed to carry the
precise breakpoint information of insertion or deletion events. Similar
metrics are used also by other callers that were used for subsequent
genome sequencing data analysis (e.g. manta, delly, lumpy).
The initial negative result using other CNV calling methods is due to
the suboptimal coverage distribution at some of the KANSL1 exons
and intronic regions and the fact that the deletion reaches only 46 bp
into the exon. The variant in question is mainly in the end of intron 7
making coverage-based detection of structural changes based on exome
data substantially more difficult than in genome sequencing data. As a
result, from sequence analysis, 130 pathogenic or likely pathogenic
variants have been reported for KANSL1 in the database ClinVar
(Landrum et al., 2020). In contrast, the 4.7 kb deletion that we
identified, is the first entry in ClinVar for a variant length in
between 51 bp and 50 kb.
In conclusion, we reported a 4.7 kb deletion in KANSL1 that is
mainly non-coding and was therefore first detected by genome sequencing.
However, retrospectively it could also be confirmed in exome sequencing
data with fine-tuning of the filter settings. Since high accuracy in CMA
analysis is limited to a resolution of 50 kb or higher, and in exome
analysis to a resolution of 50 bp or lower, deletions in the order of
few kilobases are not detected in the diagnostic tests most often used
today. In genome sequencing data, on the other hand SV and CNVs in this
size range can be identified more easily, but are usually more difficult
to interpret, if they are non-coding.
Therefore, our case exemplifies, how computer-assisted analysis of the
portrait can make a significant contribution to the diagnostic process.
First, NGP has the potential to speed up data analysis. If our Koolen-de
Vries patient would have carried the recurrent microdeletion, a SNV or
indel, the high gestalt score would have made the molecular confirmation
of the suspected clinical diagnosis straightforward using protocols such
as the PEDIA workflow (Hsieh et al., 2019). Second, highly suggestive
results of NGP can be used to request genome sequencing if exome or CMA
analysis were inconclusive. Third, NGP can help with the classification
of the pathogenicity of novel variants found in the genome.
According to the guidelines from 2015, a matching phenotype is only
considered as supporting evidence for pathogenicity of a sequence
variant (PP4) (Richards et al., 2015). However, experienced
dysmorphologists may attribute a higher level of evidence to the
pathogenicity of a variant in a gene if the associated phenotype is
highly specific (Zhang et al., 2020). Most clinicians that are
confronted for the first time with such a specific diagnosis will be
hesitant to apply these higher weights. Here, computer-assisted analysis
could help, since syndromic distinctiveness can be measured and the
similarity of a portrait to other molecularly confirmed cases can be
quantified (Hsieh et al., 2022). By this means, NGP makes the visual
inspection of a patient applicable to a Bayesian classification
framework (Tavtigian et al., 2018). Interestingly, the specificity of
the facial gestalt of Koolen-de Vries Syndrome ranges only in the upper
half of dysmorphic phenotypes and is exceeded for example by the
distinctiveness of Baraitser-Winter syndrome or Seckel syndrome. For
disorders in this category high gestalt scores should therefore be
handled with even greater attention and could justify more comprehensive
tests such as genome sequencing if molecular confirmation is still
pending.