Protein Quantification
Mass spectra were processed using “MassPike”, which is a SEQUEST-based software for quantitative proteomics, developed by Professor Steven Gygi and colleagues at Harvard Medical School. In MassPike, MS spectra were converted to mzXML format using an extractor built upon Thermo Fisher’s RAW File Reader library (version 4.0.26). The standard mzXML format has been augmented during extraction and conversion, with additional customisations that are specific to ion trap and Orbitrap mass spectrometry and essential for TMT quantitation. These customisations consider ion injection times for each scan, Fourier Transform-derived baseline and noise values calculated for every Orbitrap scan, isolation widths for each scan type, scan event numbers, and elapsed scan times.
Mass spectra acquired were searched against a combined protein sequence database that includes human proteins, HCMV proteins, and possible protein contaminants that might be introduced to samples. The human protein Uniprot database was downloaded on 26thJanuary, 2017. An HCMV protein database was assembled from the HCMV strain Merlin Uniprot database, non-canonical human cytomegalovirus ORFs described by Stern-Ginossar et al (23180859), and a six-frame translation of HCMV strain Merlin filtered to include all potential ORFs of ≥8 residues (delimited by stop-stop rather than requiring ATG-stop). The database also included common contaminants (bovine serum albumin and porcine trypsin, and annotated human protein contaminants such as keratins). Searches were performed using a 20 ppm precursor ion tolerance. Fragment ion tolerance was set to 1.0 Th.
TMT tags on lysine residues and peptide N termini (229.162932 Da) and carbamidomethylation of cysteine residues (57.02146 Da) were set as static modifications, while oxidation of methionine residues (15.99492 Da) was set as a variable modification.
Peptide identification was executed in the order of the ranks using cross-correlation score (XCorr), as the correctness of peptide spectral matches (PSMs) decreased along the ranks. A target-decoy strategy was employed to ensure the quality of peptide identification (Elias and Gygi, 2007). A decoy database was generated by reversing the sequence of the composite protein database detailed above. Assignment of peptides from this decoy database were considered as a “false discovery”, and peptide identification terminated before the false discovery rate reached 1%. Correct and incorrect spectral matches were distinguished from one another using linear discriminant analysis based on several different parameters including XCorr, the XCorr difference between top and second possible peptide (ΔCn), precursor mass error, and charge state.
Protein assembly was performed by principles of parsimony to produce the smallest set of proteins necessary to account for all observed peptides, meaning in cases of redundancy, shared peptides were assigned to the protein sequence with the greatest number of matching unique peptides.
Following fragmentation, each TMT tag produces reporter ions with specific mass, which were surveyed in low m/z area of the MS3 spectrum. The maximum intensity nearest to the theoretical m/z of each reporter ion was used. Proteins were quantified by summing TMT reporter ion counts across all matching peptide-spectral matches. If a TMT experiment uses n (number) types of TMT tags, more than n-1 TMT channels missing and/or a combined signal-to noise ratio of less than 25n across all TMT reporter ions were considered poor quality of MS3 spectra. PSMs with poor or no MS3 spectra were excluded from quantitation. Protein quantitation values were exported for further analysis in Excel. The method of significance A was used to estimate the p-value that each ratio was significantly different to 1 using Perseus version 1.5.1.6. Values were adjusted for multiple hypothesis testing using the method of Benjamini-Hochberg (Cox and Mann, 2008).