Phylogenetics
Sequence diversity in our soil samples was high: 559 soil clones yielded
393 unique sequences. ML analysis recovered the four canonicalFrankia ‘Clusters’ that are used as working taxonomic groups
based on phylogenetic patterns recovered across multiple loci (Figure 1,
Figure S2) (Normand et al ., 1996; Nouioui et al ., 2011;
Pozzi et al . 2018a; Swensen & Benson, 2008), including
Ghodhbane-Gtari et al . (2010), the source for our reference IGS
alignment. All four Frankia Clusters were well supported
statistically (>75% bootstrap) in our tree; however,
Cluster 3 rather than Cluster 4 was basal within Frankia , and
Cluster 2 fell within the large group that contained both reference
sequences derived from Alnus nodules and our soil clones. The
position of Cluster 2 reflects the strongest ambiguity in our tree, as
it differs from the standard position of these reference sequences,
falling within rather than adjacent to a largerAlnus -infective/Casuarina -infective cluster that is
broadly accepted.
All of our soil-derived sequences fell into four strongly supported
clades that were well within the group of previously describedFrankia at the genus level (Figure 1, Figure S2). However, within
the genus none of our sequences were closely related to any of theFrankia sequences in our reference alignment (Figure 1, Figure
S2). Only a small minority of our sequences (37 sequences, 6.6%)
clustered near previously-described Alnus -infectiveFrankia (‘Cluster 1’), forming a poorly-supported group (BS=52%)
with well-supported affinity for the canonical ‘Cluster 1’Frankia (BS=89%). This small group included nodules fromA. viridis , which contain genotypes of Frankia we have
previously observed to have high specificity for this host species
(Anderson et al . 2009). The majority of our soil-derived
sequences (374 sequences, 67.0%) fell into two well-supported clades
that formed a larger group with the ‘typical’ Alnus -infective
strains, but were clearly differentiated from them. One clade, which we
call the ‘AT clade’ (BS=100), includes nearly all the sequence variation
we have observed in A. tenuifolia nodules in our Alaskan field
sites. This group includes 162 soil-derived sequences, or 29.0% of the
total from this study. A second clade, which we call ‘Clade B’, is
sister to this group (though this placement is not statistically
supported), and only contains soil-derived sequences observed in this
study (211 sequences, 37.7%). A large portion of our sequences (148
sequences, 26.5%) form a unique fourth clade we call ‘Clade A’, that
fell into the ambiguous portion of our tree, near previously-described
divergent sequences we have observed occasionally in our prior studies
(unpublished data).
Sequences from the mid-succession-only alignment displayed a broadly
similar phylogenetic pattern (Figure S3, Figure S4). Most soil-derived
sequences (128 sequences, 59.3%) fell into the AT clade with
nodule-forming reference sequences. The rest formed two clades, one
sister to the AT clade, which we called ‘clade MID_1’, using numeric
rather than letter designation to emphasize the different derivation of
the mid-succession sequences in our method, and one large clade that we
split into two smaller groups: ‘MID_2’ and ‘MID_3’). 165 sequences
were shared between the large rIGS and mid-succession-only alignments.
Comparison of the clade assignments of these sequences in the two
alignments indicated that mid-succession clades MID_2 and MID_3 are
equivalent to clade A from the larger alignment, and MID_1 is
equivalent to clade B.
Both our clades A and B contained several well-supported sub-clades
(Figure S2). The AT clade also contained three discernable sub-groups: a
basal group (‘RF1_2_16’) composed of genotypes typical of
late-succession nodules (9 clones, 1.6%), a derived sub-clade
(‘RF1_2_3_14’) that included most of the sequence types we have found
in nodules of this host species (126 clones, 22.5%), and a sub-clade
(‘RF7’) within this that includes sequences only found in
early-succession nodules of A. tenuifolia (27 clones, 4.8%). All
three sequence groups have been previously observed to have high
specificity for A. tenuifolia hosts in our sites (Andersonet al . 2009). Our sub-clade designations reflect this close
affinity to previously observed nodule sequences by including the
‘RF’+number’ designation we have utilized in previous studies of nodule
Frankia (Anderson et al . 2009, 2013). All of the above-described
patterns were robust to removal of indels from the alignment using
GBlocks.
Comparison of OTUs defined on the basis of sequence similarity
thresholds with clade-based OTUs illustrated a clear difference in
sequence diversity between OTUs matching sequences previously found in
nodules, and those only found in soils. Clade-based nodule OTUs mostly
contained sequences with >99% similarity, but often
included sequences with <95% similarity for non-nodule OTUs.
When all clone sequences were assigned to OTUs based on a 99%
threshold, the number of OTUs defined, compared to clade-based
classification, decreased from four to three for nodule-typical OTUs,
but increased from nine to 49 for soil-only OTUs.