Results and Discussions

Current CAMEO results

Since 2012, CAMEO has been leveraging the pre‐release of structures to be published in the upcoming release of the PDB Protein Data Bank to conduct weekly, blind, fully automated benchmarking experiments. Every Saturday, we download the pre-release data, which contains the sequences of polymer entities, as well as InChI codes of non-polymer entities contained in the PDB structures to be released on the following Wednesday. We selected a set of 20 interesting protein modeling targets which were submitted to registered participants, who have 4 days to predict the 3D structure of those targets. We collect those predictions and, upon release of the structures by the PDB on Wednesdays, compare the predictions with the experimental ground truth.
The CAMEO evaluation provides a wide variety of scores measuring different aspects of protein structure prediction accuracy, and accordingly does not establish a single unique ranking between the methods. However, some of the scores are featured more prominently on the web site, as we consider them more useful estimations of the model quality. The focus of CAMEO has always been on all-atom scores to capture the ability of participants to accurately model proteins including biologically relevant protein side chain conformations. In addition, as CAMEO is a fully automated workflow without human intervention, we have been focusing on superposition-free scores which alleviate the need to manually split proteins into evaluation units (Kinch et al. 2019, 2011) to account for domain movements. Therefore, CAMEO has been showcasing scores like lDDT (Mariani et al. 2013) and CAD-score (K. Olechnovič, Kulberkytė, and Venclovas 2013), both of which are all atom scores and superposition independent. In addition, our server summary page features the lDDT-BS score which measures the accuracy of predictions in the region of ligand binding sites, as well as a measure for model confidence, which evaluates the ability of participants to estimate the accuracy of their own predictions. Additional scores are displayed on the target details page and available in the downloads.
Since 2016, CAMEO (Jürgen Haas et al. 2018) has been evaluating the ability of modeling servers to correctly predict the oligomeric state of a target protein and model the correct assembly, based solely on the amino acid sequence. As targets are submitted as a single protein sequence, participants need to predict whether the protein is likely to assemble into a homo-oligomer and, if that’s the case, to predict the exact stoichiometry as well as the correct interfaces. The complex models are evaluated with the oligo-lDDT score (Juergen Haas et al. 2019), which is a modified version of lDDT that looks at the whole complex and accounts for missing or extra chains; the MM-align-based (Mukherjee and Zhang 2009) TM-score and RMSD, which are superposition-dependent; and the QS-score (Bertoni et al. 2017), which looks specifically at the conservation of interface residues.
In 2020, we performed 52 prediction rounds and provided targets to 15 public modeling servers (from 9 groups) and 25 development servers (from a total of 18 groups). After filtering problematic targets of low or uncertain quality, or targets causing technical issues to scoring tools for formatting reasons, we evaluated and scored 812 targets, 453 of which were oligomeric. Compared with 84 3D modeling targets of CASP14, CAMEO enables participants to accurately assess the accuracy of their prediction servers on a wide variety of targets in much shorter time intervals.

Protein Complexes

With the new version of CAMEO we are extending the scope of the assessment to structures and complexes. Instead of considering every protein sequence separately, a prediction target is now defined as a complete experimental structure with all the chemical entities it contains. In the case of monomeric and homo-oligomeric protein entries, this would be identical to the current CAMEO-3D targets and contain only one unique protein sequence. However, for hetero-oligomeric targets, evaluation is only performed in the context of the whole complex, and no longer as individual iosolated protein chains taken out of context. Methods registered to receive hetero-oligomeric complexes as targets thus receive all sequences of the proteins that form a complex, and are expected to predict the oligomeric structure of the complex. All participating methods receive the sequences of monomeric or homo-oligomeric targets. This allows establishing a common baseline where all participating servers can be compared with each other on a subset of common targets.
In order to select interesting targets for this category, we search for the presence of homologous complexes (Figure 1). Closely related homologs are first identified with BLAST for every protein sequence with 30 or more amino acid residues separately. Complexes containing DNA, RNA, or peptide sequences shorter than 30 amino acids are excluded at this stage, and handled separately (see following sections). For every target, we consider the complete set of proteins that compose it, and search for a homologous template that covers all the protein entities. We ignore templates that only cover some of the target sequences, or that contain extra polymer entities (proteins, peptides, DNA or RNA). We consider targets to be interesting if such a closely related homologous complex cannot be found. This includes cases of novel complexes (where all the proteins can be modeled separately easily, but where the complex has never been observed experimentally in its entirety, and therefore the interface(s) is unknown) or if at least one of the protein sequences in the complex is a non-trivial modeling target on its own.