Results and Discussions
Current CAMEO results
Since 2012, CAMEO has been leveraging the pre‐release of structures to
be published in the upcoming release of the PDB Protein Data Bank to
conduct weekly, blind, fully automated benchmarking experiments. Every
Saturday, we download the pre-release data, which contains the sequences
of polymer entities, as well as InChI codes of non-polymer entities
contained in the PDB structures to be released on the following
Wednesday. We selected a set of 20 interesting protein modeling targets
which were submitted to registered participants, who have 4 days to
predict the 3D structure of those targets. We collect those predictions
and, upon release of the structures by the PDB on Wednesdays, compare
the predictions with the experimental ground truth.
The CAMEO evaluation provides a wide variety of scores measuring
different aspects of protein structure prediction accuracy, and
accordingly does not establish a single unique ranking between the
methods. However, some of the scores are featured more prominently on
the web site, as we consider them more useful estimations of the model
quality. The focus of CAMEO has always been on all-atom scores to
capture the ability of participants to accurately model proteins
including biologically relevant protein side chain conformations. In
addition, as CAMEO is a fully automated workflow without human
intervention, we have been focusing on superposition-free scores which
alleviate the need to manually split proteins into evaluation units
(Kinch et al. 2019,
2011) to account for domain movements. Therefore, CAMEO has been
showcasing scores like lDDT
(Mariani et al. 2013) and
CAD-score (K. Olechnovič,
Kulberkytė, and Venclovas 2013), both of which are all atom scores and
superposition independent. In addition, our server summary page features
the lDDT-BS score which measures the accuracy of predictions in the
region of ligand binding sites, as well as a measure for model
confidence, which evaluates the ability of participants to estimate the
accuracy of their own predictions. Additional scores are displayed on
the target details page and available in the downloads.
Since 2016, CAMEO (Jürgen
Haas et al. 2018) has been evaluating the ability of modeling servers
to correctly predict the oligomeric state of a target protein and model
the correct assembly, based solely on the amino acid sequence. As
targets are submitted as a single protein sequence, participants need to
predict whether the protein is likely to assemble into a homo-oligomer
and, if that’s the case, to predict the exact stoichiometry as well as
the correct interfaces. The complex models are evaluated with the
oligo-lDDT score (Juergen
Haas et al. 2019), which is a modified version of lDDT that looks at
the whole complex and accounts for missing or extra chains; the
MM-align-based (Mukherjee and
Zhang 2009) TM-score and RMSD, which are superposition-dependent; and
the QS-score (Bertoni et al.
2017), which looks specifically at the conservation of interface
residues.
In 2020, we performed 52 prediction rounds and provided targets to 15
public modeling servers (from 9 groups) and 25 development servers (from
a total of 18 groups). After filtering problematic targets of low or
uncertain quality, or targets causing technical issues to scoring tools
for formatting reasons, we evaluated and scored 812 targets, 453 of
which were oligomeric. Compared with 84 3D modeling targets of CASP14,
CAMEO enables participants to accurately assess the accuracy of their
prediction servers on a wide variety of targets in much shorter time
intervals.
Protein Complexes
With the new version of CAMEO we are extending the scope of the
assessment to structures and complexes. Instead of considering every
protein sequence separately, a prediction target is now defined as a
complete experimental structure with all the chemical entities it
contains. In the case of monomeric and homo-oligomeric protein entries,
this would be identical to the current CAMEO-3D targets and contain only
one unique protein sequence. However, for hetero-oligomeric targets,
evaluation is only performed in the context of the whole complex, and no
longer as individual iosolated protein chains taken out of context.
Methods registered to receive hetero-oligomeric complexes as targets
thus receive all sequences of the proteins that form a complex, and are
expected to predict the oligomeric structure of the complex. All
participating methods receive the sequences of monomeric or
homo-oligomeric targets. This allows establishing a common baseline
where all participating servers can be compared with each other on a
subset of common targets.
In order to select interesting targets for this category, we search for
the presence of homologous complexes (Figure 1). Closely related
homologs are first identified with BLAST for every protein sequence with
30 or more amino acid residues separately. Complexes containing DNA,
RNA, or peptide sequences shorter than 30 amino acids are excluded at
this stage, and handled separately (see following sections). For every
target, we consider the complete set of proteins that compose it, and
search for a homologous template that covers all the protein entities.
We ignore templates that only cover some of the target sequences, or
that contain extra polymer entities (proteins, peptides, DNA or RNA). We
consider targets to be interesting if such a closely related homologous
complex cannot be found. This includes cases of novel complexes (where
all the proteins can be modeled separately easily, but where the complex
has never been observed experimentally in its entirety, and therefore
the interface(s) is unknown) or if at least one of the protein sequences
in the complex is a non-trivial modeling target on its own.