DISCUSSION
This study aimed to identify loci and possible candidate genes associated with the color phenotype of tamar stage (dry) date fruit. In our association study, we used genotypic and image data of 188 QC filtered dry date fruit samples. Using the FarmCPU GWAS method and R/B color phenotype, we identified six significant SNPs from the GWAS result (based on the FDR adjusted p-value cut-off) associated with the color phenotype (Table 2). These SNPs span over linkage groups LG3, LG4, LG5, LG10 and one unplaced scaffold MU008982. Among these, the highly significant SNP LG4s65268794 is linked to the previously discovered VIR gene responsible for yellow or red fresh fruit color. The remaining SNPs (five as new loci from this study) are also significantly associated (Wilcoxon statistical test) with fruit color though likely either only for the dry fruit stage or finer color detail in fresh fruit (figure 3a). Possible candidate genes were mapped from the regions surrounding significant SNPs. As expected from previous studies (K. M. Hazzouri et al., 2015; Khaled M. Hazzouri et al., 2019), the R2R3-MYB transcription factor gene is present on LG4 and located 16 kb away from the significant SNP LG4s65268794. We confirmed that the this previously identified VIR genotype also has a major effect even on dry fruit color in our samples likely stemming from the starting color at the fresh fruit stage of yellow or red (Supplementary Figure 10).
Beyond the fruit color classification provided by the genotype of the R2R3-MYB transcription factor gene, we investigated how the genotypic variation of other SNPs (Table 2) associated with the color phenotype of dry fruit. That is, do these SNPs provide further genetic contribution to the dry date fruit color phenotype? To do that end we assessed our SNP associations within the homozygous VIR wild type and homozygous VIRIM groups. The results revealed that the newly identified SNPs, excluding SNP LG4s19036701, could give further resolution to the light and dark color fruit on top of fruit color classification by the VIR genotype (Figure 3b & 3c). For some SNPs we think that low numbers of samples in the two groups (wild type group=25 and VIRIM group=61 ) might be the reason for the lack of the statistically significant association. However, the overall picture is that the newly identified loci are associated with the color phenotype and can distinguish the dry fruit color beyond simply the color observed in fresh fruit. This association may relate to genetic control during the fruit ripening process.
Based on literature search and Blast2GO gene ontology analysis, 6 genes were identified in the regions surrounding significant SNPs (Table 2) that play a role in fruit ripening and pigmentation in other plants (Table 3). RNA-Seq analysis reveals that many of these genes are differentially expressed early or late in the development stages of both light (Khalas) and dark (Kenezi) color fruit (Figure 5). The genes identified, such as the Ethylene-responsive transcription factor12 gene and RING U-Box superfamily protein, are present in the candidate genomic region of SNP LG10s12886617 (LG 10). Ethylene-responsive transcription factor-12 gene is similar to the DORNROSCHEN-like protein in Arabidopsis thaliana. It contains the AP2 domain and has a significant role in the ethylene-activated signalling pathway and cytokinin signalling pathway (Das et al., 2012; Phukan, Jeena, Tripathi, & Shukla, 2017). In Arabidopsis, cytokinin signalling increases the sugar-induced anthocyanin biosynthesis (Das et al., 2012). The candidate region from LG10 also contains an uncharacterised protein (gene id: PDK50.r1.LG10G00073880) which contains Myb DNA-binding 3 domain (Ambawat, Sharma, Yadav, & Yadav, 2013). The Protochlorophyllide reductase gene is present withing the region surrounding LG3s906369 SNP. This gene plays a vital role in chlorophylls’ biosynthetic pathway (Garrone, Archipowa, Zipfel, Hermann, & Dietzek, 2015; Yamazaki, Nomata, & Fujita, 2006).
SNPs that were filtered out due to high FDR, yet remained near the top of our list also identified regions with many genes related to fruit ripening and pigmentation (Kaler, Gillman, Beissinger, & Purcell, 2020; Y. M. Zhang, Jia, & Dunwell, 2019) . We used the unadjusted p-value 10e-7 as a cut-off value for identifying those lists of significant SNPs (Supplementary Table 2) however other SNPs may just be below the threshold of significance based on sample numbers used here. The candidate region of SNP LG13s8766984 (LG13) contains an AP2-like ethylene-responsive transcription factor. Other genes include 4-coumarate-CoA ligase and 4-coumarate:coA ligase 3, Myb family transcription factor family protein, AP2/B3-like transcriptional factor family protein, and Chalcone-flavanone isomerase that are present around the region of SNP LG5s4683788 (LG5). Chalcone Isomerase is a critical enzyme for the anthocyanin biosynthesis (J. H. Kang et al., 2014; Sun et al., 2019). The 4-coumarate: CoA ligase is a key enzyme in phenylpropanoid metabolism in plants (Y. Li, Kim, Pysh, & Chapple, 2015; C. H. Wang et al., 2016). Metabolome study of dates detected countable enrichment of phenylpropanoids in the early development of dates (Diboun et al., 2015). Our gene expression analysis shows 4-coumarate: CoA ligase genes ( gene id: PDK50.r1.LG5G00393990 and PDK50.r1.LG5G00393890) are highly expressed in early in dark color fruit compared with light color, peaking at 45-75 days post pollination (Supplementary Figure 13). These genes may provide candidates for further study if larger sample numbers reveal them to be indeed be significantly associated with fruit color.
By combining the genotypic data of extensively diverse samples collected from 14 countries and the color phenotype of dry fruit (tamar stage fruit), we successfully performed a GWAS using the FarmCPU method. We identified multiple significant loci and possible candidate genes associated with the color variation of fruit. The new SNPs association with the color of dry date fruit will help add resolution to our understanding of genetic control of commercially important phenotypes in this fruit crop.