2.4 Mass Spectra Data Processing for Principal Component Analysis
Bruker MicrOTOF output files were parsed and converted to “mzML” file type24 using the open source data conversion software, ProteoWizard.25  Principal component analysis (PCA) was completed using two open source software utilities. The first was ms-alone, a python-based utility for preprocessing and peak extraction used on the raw mzML files prior to importing the data to the second utility, multiMS-toolbox, an R based software for PCA.26 Preprocessing was accomplished using ms-alone on the raw data including baseline subtraction and peak smoothing. Within the ms-alone setup, the signal to noise ratio threshold of data was set to 0.5, and a Savitzky-Golay smoothing method was used to reduce noise.27 Intensity based (i.e., no data normalization) PCA was run on the entire spectrum (m/z 0-2500) and then winnowed down according to raw loadings plots. The range of m/z values with the highest contribution to variance were chosen as the reduced data window size. Without winnowing, the large amount of low intensity noise at the low (m/z 0-500) and high (m/z 1500-2500) ends of the spectra masked the contribution of higher signal to noise ratio features in the middle ranges (Fig. S3). The mass ranges which were removed during winnowing are unlikely to have featured critical biomarkers because lower m/z ranges are not probed with DSP due to the removal of small molecules during sample treatment, while the high m/z range would contain contributions from high molecular weight media additives that are not part of cell secretome. 19, 20