Discussion
Many SV and CNV tools for exome data rely on depth of coverage signals to identify likely candidates for structural changes in the genome in short read Illumina data. For both, exome and genome data, the effectiveness of this approach is limited by the availability of good normalized control data from other genomic regions in the same individual or other individuals of the same sequencing run. In case of the trio-exome sequencing experiment from our patient, this baseline was formed by other unrelated samples sequenced in parallel. Depth and variability of the coverage in certain genomic regions also has an influence on the ability of those callers to detect structural change to the genome. Other CNV detection methods rely on a mix of other factors to find likely candidates for variation. Pindel incorporates signals from split reads. These are read pairs in which one of the two reads cannot be aligned to the reference genome and is assumed to carry the precise breakpoint information of insertion or deletion events. Similar metrics are used also by other callers that were used for subsequent genome sequencing data analysis (e.g. manta, delly, lumpy).
The initial negative result using other CNV calling methods is due to the suboptimal coverage distribution at some of the KANSL1 exons and intronic regions and the fact that the deletion reaches only 46 bp into the exon. The variant in question is mainly in the end of intron 7 making coverage-based detection of structural changes based on exome data substantially more difficult than in genome sequencing data. As a result, from sequence analysis, 130 pathogenic or likely pathogenic variants have been reported for KANSL1 in the database ClinVar (Landrum et al., 2020). In contrast, the 4.7 kb deletion that we identified, is the first entry in ClinVar for a variant length in between 51 bp and 50 kb.
In conclusion, we reported a 4.7 kb deletion in KANSL1 that is mainly non-coding and was therefore first detected by genome sequencing. However, retrospectively it could also be confirmed in exome sequencing data with fine-tuning of the filter settings. Since high accuracy in CMA analysis is limited to a resolution of 50 kb or higher, and in exome analysis to a resolution of 50 bp or lower, deletions in the order of few kilobases are not detected in the diagnostic tests most often used today. In genome sequencing data, on the other hand SV and CNVs in this size range can be identified more easily, but are usually more difficult to interpret, if they are non-coding.
Therefore, our case exemplifies, how computer-assisted analysis of the portrait can make a significant contribution to the diagnostic process. First, NGP has the potential to speed up data analysis. If our Koolen-de Vries patient would have carried the recurrent microdeletion, a SNV or indel, the high gestalt score would have made the molecular confirmation of the suspected clinical diagnosis straightforward using protocols such as the PEDIA workflow (Hsieh et al., 2019). Second, highly suggestive results of NGP can be used to request genome sequencing if exome or CMA analysis were inconclusive. Third, NGP can help with the classification of the pathogenicity of novel variants found in the genome.
According to the guidelines from 2015, a matching phenotype is only considered as supporting evidence for pathogenicity of a sequence variant (PP4) (Richards et al., 2015). However, experienced dysmorphologists may attribute a higher level of evidence to the pathogenicity of a variant in a gene if the associated phenotype is highly specific (Zhang et al., 2020). Most clinicians that are confronted for the first time with such a specific diagnosis will be hesitant to apply these higher weights. Here, computer-assisted analysis could help, since syndromic distinctiveness can be measured and the similarity of a portrait to other molecularly confirmed cases can be quantified (Hsieh et al., 2022). By this means, NGP makes the visual inspection of a patient applicable to a Bayesian classification framework (Tavtigian et al., 2018). Interestingly, the specificity of the facial gestalt of Koolen-de Vries Syndrome ranges only in the upper half of dysmorphic phenotypes and is exceeded for example by the distinctiveness of Baraitser-Winter syndrome or Seckel syndrome. For disorders in this category high gestalt scores should therefore be handled with even greater attention and could justify more comprehensive tests such as genome sequencing if molecular confirmation is still pending.