Validation of the phasing approach using WES data
The parent-of-origin for the DNMs was initially assessed from the short-read WES data. Short-reads, 100bp in length, encompassing DNM and iSNPs in the WES data were available for 14 of the 109 DNMs, but only 9 (8%) (Figure 1.b1) had acceptable coverage (>10x per allele, Table 1). These DNMs were also target amplified and sequenced using ONT long-read sequencing to provide a control cohort for our long-read phasing method. For all 9 DNMs with acceptable coverage in the WES data, the parent-of-origin assignment agreed between the two approaches and the DNM allele frequencies obtained were comparable (Table 1). Interestingly, this was also true for the 5 WES phased samples with limited sequencing coverage, suggesting that the stringent requirement of >10x coverage for WES phasing could be reduced to as little as 4x. For both short and long-read sequencing, 11 out of 14 DNMs showed allele frequencies around 50% (+/- 10%), with no significant third allelic form observed (supplementary Table 11), indicating that these were germline DNM events. Of the three DNMs deviating from prezygotic allele frequencies, two were determined as postzygotic (SORCS2 and C10orf71), and one (SIGLEC10) may result from allelic sequencing bias (supplementary Table 12). Coverage of the ONT long-read sequencing for SIGLEC10 was significantly reduced compared to average ONT sequencing coverage, and while both WES and ONT allele data showed DNM allele frequency deviations from 50%, no significant third allelic form was observed. In addition, the DNM base frequencies of both WES and ONT data were within the prezygotic range of 50% +/- 10%.
For the DNM affecting the C10orf71 gene in patient 01247, the de novo mutated allele (T-A) had a much lower allele frequency of 9% and 17% when detected with both WES and ONT approaches respectively, with similar percentages observed in DNM base frequencies. Apart from the wt allele and the DNM allele, we clearly observed a third allelic form that represented the wt version of the DNM allele. The presence of this third allelic form suggests that the DNM likely occurred as a postzygotic event. The postzygotic mutation is observed in both the WES and ONT data in this case. Postzygotic mutations can, however, be missed when WES has low coverage, as seen for the DNM in SORCS2 in patient 01209. The postzygotic DNM in SORCS2 presents an average discrepancy of 15% from prezygotic norms in the WES base frequencies. A third allelic form is not shown in the WES allele data due to only having 1x coverage. Greater base discrepancies are observed in the ONT base frequencies, with an average deviation from prezygotic norms of 38%, and with several thousand times more coverage in the ONT data, a wt of the DNM allele is observed at 18%.