Data analysis
The serotype, virulence genes, and stx subtypes were identified
with SerotypeFinder 1.1 tool (Joensen,
Tetzschner, Iguchi, Aarestrup, & Scheutz, 2015) and VirulenceFinder
1.2 (Joensen et al., 2014) of the Center
for Genomic Epidemiology (CGE) website, and BLASTN of NCBI, using the
assembled genomes. For the CGE server, 85% was selected as identity
threshold, and 60% to be the percentage of minimum overlapping gene
length (Ferdous et al., 2016).
Based on the assembled genomes, Parsnp
(Treangen, Ondov, Koren, & Phillippy,
2014) in the Harvest package was used to align the core genomes of the
STEC strains, followed by the creation of a maximum likelihood (ML)
tree. The parameters for the ML tree generation were defaulted with the
evolutionary model of General Time Reversible (GTR) and 1,000 resamples
for bootstrapping. The tree and the molecular features of each isolate
were visualized using iTOL (Letunic &
Bork, 2016).
The seven housekeeping genes (i.e., adk , fumC ,gyrB , icd , mdh , purA , and recA ) for
MLST were defined according to the E. coli MLST website
(http://mlst.warwick.ac.uk/mlst/dbs/E.coli )
(Larsen et al., 2012). STs for each
isolate were assigned based on the allelic profile of the seven
housekeeping genes, which was then used to analyze the phylogenetic
relationships among bacterial isolates. MLST sequence information about
HUSEC isolates was retrieved from the online database (www.ehec.org)
(Mellmann et al., 2008). A minimum
spanning tree was created based on STs of our isolates and HUSEC strains
using BioNumerics (Meng et al., 2014).
Detailed information is available in Supplemental Fig. S1.