Data analysis
The serotype, virulence genes, and stx subtypes were identified with SerotypeFinder 1.1 tool (Joensen, Tetzschner, Iguchi, Aarestrup, & Scheutz, 2015) and VirulenceFinder 1.2 (Joensen et al., 2014) of the Center for Genomic Epidemiology (CGE) website, and BLASTN of NCBI, using the assembled genomes. For the CGE server, 85% was selected as identity threshold, and 60% to be the percentage of minimum overlapping gene length (Ferdous et al., 2016).
Based on the assembled genomes, Parsnp (Treangen, Ondov, Koren, & Phillippy, 2014) in the Harvest package was used to align the core genomes of the STEC strains, followed by the creation of a maximum likelihood (ML) tree. The parameters for the ML tree generation were defaulted with the evolutionary model of General Time Reversible (GTR) and 1,000 resamples for bootstrapping. The tree and the molecular features of each isolate were visualized using iTOL (Letunic & Bork, 2016).
The seven housekeeping genes (i.e., adk , fumC ,gyrB , icd , mdh , purA , and recA ) for MLST were defined according to the E. coli MLST website (http://mlst.warwick.ac.uk/mlst/dbs/E.coli ) (Larsen et al., 2012). STs for each isolate were assigned based on the allelic profile of the seven housekeeping genes, which was then used to analyze the phylogenetic relationships among bacterial isolates. MLST sequence information about HUSEC isolates was retrieved from the online database (www.ehec.org) (Mellmann et al., 2008). A minimum spanning tree was created based on STs of our isolates and HUSEC strains using BioNumerics (Meng et al., 2014). Detailed information is available in Supplemental Fig. S1.