Phylogenetic and antigenic site analysis of H9N2 HA
gene/protein
Collection of sequence data: The HA gene sequences of H9N2-AIV
isolated in recent years (2014-2020) were collected from the Global
Initiative on Sharing All Influenza Data (GISAID) platform. A total of
4935 sequences were downloaded. A total of 312 records were removed that
were duplicates or had a sequence length < 1600 bp. The
remaining sequences used for the subsequent analyses covered 4623 taxa.
Phylogenetic analysis : Due to the large amount of data, we cut
down the number of sequence for the construction of phylogenetic tree.
Next, up to 70 sequences from collection years with greater than 70
sequences per year were randomly sampled while maintaining all sequences
from collection years with less than 70 sequences per year. This was
performed to prevent over representation of certain years.
Each taxa was aligned using MUSCLE (v3.8.4) (Edgar, 2004) and the
proportion of unique mutations were identified by subclade using
Geneious Prime. A time-scaled phylogenetic tree was generated using
BEAST (v1.8.4) (Drummond, Nicholls, Rodrigo, & Solomon, 2002; Pybus &
Rambaut, 2002; Suchard et al., 2018). Parameter setting of the
evolutionary model was performed as previously reported (Xia et al.,
2020). Briefly, a nucleotide GTR + I + Γ4 substitution model was
selected, with an uncorrelated log-normal prior molecular clock model
over a strict clock, and a non-parametric Bayesican skyline demographic
tree (Baele et al., 2012; Baele, Li, Drummond, Suchard, & Lemey, 2013;
Drummond, Rambaut, Shapiro, & Pybus, 2005). A total of 50 million
Markov Chain Monte Carlo generations were specified for sampling every
10,000 steps, and assessed for sufficient mixing and convergence using
tracer (v1.6) after considering the first 10% of samples as burn‐in
(Rambaut, Drummond, Xie, Baele, & Suchard, 2018). A maximum clade
credibility (MCC) tree was generated in treeannotator (v1.8.4) and
visualized in figtree (v1.4.3). The specific amino acid mutations of
each subclade were also counted by comparing the amino acid sequences in
Geneious Prime.
Antigenic site counting : Keyword searches in PubMed, Google
Scholar, and China national knowledge internet (CNKI) databases were
used to count the number of reported H9N2-AIV antigenic sites (Table
S1). The mutation and frequency of antigenic sites in years 2014, 2016,
2018 and 2019-2020 were analyzed using BioAider (v1.334) (Zhou, Qiu, Pu,
Huang, & Ge, 2020). To demonstrate that the single high-frequency
mutation site near the RBS could drive antigenic drift of H9N2-AIV
circulating in three recent years in China, the high-frequency mutation
sites in/near the RBS, and the substitution accounting for ≥ 20% were
selected as the pre-selection substitutions for subsequent antigenicity
verification.