Phylogenetic and antigenic site analysis of H9N2 HA gene/protein
Collection of sequence data: The HA gene sequences of H9N2-AIV isolated in recent years (2014-2020) were collected from the Global Initiative on Sharing All Influenza Data (GISAID) platform. A total of 4935 sequences were downloaded. A total of 312 records were removed that were duplicates or had a sequence length < 1600 bp. The remaining sequences used for the subsequent analyses covered 4623 taxa.
Phylogenetic analysis : Due to the large amount of data, we cut down the number of sequence for the construction of phylogenetic tree. Next, up to 70 sequences from collection years with greater than 70 sequences per year were randomly sampled while maintaining all sequences from collection years with less than 70 sequences per year. This was performed to prevent over representation of certain years.
Each taxa was aligned using MUSCLE (v3.8.4) (Edgar, 2004) and the proportion of unique mutations were identified by subclade using Geneious Prime. A time-scaled phylogenetic tree was generated using BEAST (v1.8.4) (Drummond, Nicholls, Rodrigo, & Solomon, 2002; Pybus & Rambaut, 2002; Suchard et al., 2018). Parameter setting of the evolutionary model was performed as previously reported (Xia et al., 2020). Briefly, a nucleotide GTR + I + Γ4 substitution model was selected, with an uncorrelated log-normal prior molecular clock model over a strict clock, and a non-parametric Bayesican skyline demographic tree (Baele et al., 2012; Baele, Li, Drummond, Suchard, & Lemey, 2013; Drummond, Rambaut, Shapiro, & Pybus, 2005). A total of 50 million Markov Chain Monte Carlo generations were specified for sampling every 10,000 steps, and assessed for sufficient mixing and convergence using tracer (v1.6) after considering the first 10% of samples as burn‐in (Rambaut, Drummond, Xie, Baele, & Suchard, 2018). A maximum clade credibility (MCC) tree was generated in treeannotator (v1.8.4) and visualized in figtree (v1.4.3). The specific amino acid mutations of each subclade were also counted by comparing the amino acid sequences in Geneious Prime.
Antigenic site counting : Keyword searches in PubMed, Google Scholar, and China national knowledge internet (CNKI) databases were used to count the number of reported H9N2-AIV antigenic sites (Table S1). The mutation and frequency of antigenic sites in years 2014, 2016, 2018 and 2019-2020 were analyzed using BioAider (v1.334) (Zhou, Qiu, Pu, Huang, & Ge, 2020). To demonstrate that the single high-frequency mutation site near the RBS could drive antigenic drift of H9N2-AIV circulating in three recent years in China, the high-frequency mutation sites in/near the RBS, and the substitution accounting for ≥ 20% were selected as the pre-selection substitutions for subsequent antigenicity verification.