Phylogenetics
Sequence diversity in our soil samples was high: 559 soil clones yielded 393 unique sequences. ML analysis recovered the four canonicalFrankia ‘Clusters’ that are used as working taxonomic groups based on phylogenetic patterns recovered across multiple loci (Figure 1, Figure S2) (Normand et al ., 1996; Nouioui et al ., 2011; Pozzi et al . 2018a; Swensen & Benson, 2008), including Ghodhbane-Gtari et al . (2010), the source for our reference IGS alignment. All four Frankia Clusters were well supported statistically (>75% bootstrap) in our tree; however, Cluster 3 rather than Cluster 4 was basal within Frankia , and Cluster 2 fell within the large group that contained both reference sequences derived from Alnus nodules and our soil clones. The position of Cluster 2 reflects the strongest ambiguity in our tree, as it differs from the standard position of these reference sequences, falling within rather than adjacent to a largerAlnus -infective/Casuarina -infective cluster that is broadly accepted.
All of our soil-derived sequences fell into four strongly supported clades that were well within the group of previously describedFrankia at the genus level (Figure 1, Figure S2). However, within the genus none of our sequences were closely related to any of theFrankia sequences in our reference alignment (Figure 1, Figure S2). Only a small minority of our sequences (37 sequences, 6.6%) clustered near previously-described Alnus -infectiveFrankia (‘Cluster 1’), forming a poorly-supported group (BS=52%) with well-supported affinity for the canonical ‘Cluster 1’Frankia (BS=89%). This small group included nodules fromA. viridis , which contain genotypes of Frankia we have previously observed to have high specificity for this host species (Anderson et al . 2009). The majority of our soil-derived sequences (374 sequences, 67.0%) fell into two well-supported clades that formed a larger group with the ‘typical’ Alnus -infective strains, but were clearly differentiated from them. One clade, which we call the ‘AT clade’ (BS=100), includes nearly all the sequence variation we have observed in A. tenuifolia nodules in our Alaskan field sites. This group includes 162 soil-derived sequences, or 29.0% of the total from this study. A second clade, which we call ‘Clade B’, is sister to this group (though this placement is not statistically supported), and only contains soil-derived sequences observed in this study (211 sequences, 37.7%). A large portion of our sequences (148 sequences, 26.5%) form a unique fourth clade we call ‘Clade A’, that fell into the ambiguous portion of our tree, near previously-described divergent sequences we have observed occasionally in our prior studies (unpublished data).
Sequences from the mid-succession-only alignment displayed a broadly similar phylogenetic pattern (Figure S3, Figure S4). Most soil-derived sequences (128 sequences, 59.3%) fell into the AT clade with nodule-forming reference sequences. The rest formed two clades, one sister to the AT clade, which we called ‘clade MID_1’, using numeric rather than letter designation to emphasize the different derivation of the mid-succession sequences in our method, and one large clade that we split into two smaller groups: ‘MID_2’ and ‘MID_3’). 165 sequences were shared between the large rIGS and mid-succession-only alignments. Comparison of the clade assignments of these sequences in the two alignments indicated that mid-succession clades MID_2 and MID_3 are equivalent to clade A from the larger alignment, and MID_1 is equivalent to clade B.
Both our clades A and B contained several well-supported sub-clades (Figure S2). The AT clade also contained three discernable sub-groups: a basal group (‘RF1_2_16’) composed of genotypes typical of late-succession nodules (9 clones, 1.6%), a derived sub-clade (‘RF1_2_3_14’) that included most of the sequence types we have found in nodules of this host species (126 clones, 22.5%), and a sub-clade (‘RF7’) within this that includes sequences only found in early-succession nodules of A. tenuifolia (27 clones, 4.8%). All three sequence groups have been previously observed to have high specificity for A. tenuifolia hosts in our sites (Andersonet al . 2009). Our sub-clade designations reflect this close affinity to previously observed nodule sequences by including the ‘RF’+number’ designation we have utilized in previous studies of nodule Frankia (Anderson et al . 2009, 2013). All of the above-described patterns were robust to removal of indels from the alignment using GBlocks.
Comparison of OTUs defined on the basis of sequence similarity thresholds with clade-based OTUs illustrated a clear difference in sequence diversity between OTUs matching sequences previously found in nodules, and those only found in soils. Clade-based nodule OTUs mostly contained sequences with >99% similarity, but often included sequences with <95% similarity for non-nodule OTUs. When all clone sequences were assigned to OTUs based on a 99% threshold, the number of OTUs defined, compared to clade-based classification, decreased from four to three for nodule-typical OTUs, but increased from nine to 49 for soil-only OTUs.