Introduction
The 2020 CASP14 experiment saw an unprecedented improvement in the
performance of 3D protein structure prediction. One method (AlphaFold2)
was able to generate highly accurate predictions even for the most
challenging de novo targets. Beyond the CASP community, this
breakthrough has implications for the entire field of structural
biology: accurately predicting the structure of a single protein chain
has never been closer to being considered a solved problem. But far from
being the end of structure prediction, this might instead be the
beginning of a new era in the 3D modeling of biomolecular structures.
Areas that have been limited so far due to the inability to produce
sufficiently accurate de novo protein models in the first place,
such as the prediction of protein-ligand interactions, large
macromolecular complexes and assemblies, or variant effects, might now
be within reach of the next generation of structural prediction methods.
Independent blind assessment of these techniques will be more than ever
required in order to support the development of reliable and
reproducible methods. In order to assist the community to tackle those
challenges, we are introducing an extension of CAMEO (available at
beta.cameo3d.org) with the aim to shift the focus from the prediction of
individual protein chains to the prediction of macromolecular complexes
as determined experimentally by X-ray crystallography or increasingly
cryo-EM techniques and deposited to the PDB
(wwPDB consortium et al.
2018).
In this new CAMEO category, participating methods receive the sequences
of all unique polymer chains, as well as the InChI codes of non-polymer
entities composing the complex as prediction targets. The challenges of
the modeling task are to: 1) predict the stoichiometry of the complex;
2) predict the 3D structure of all the components: proteins, peptides,
DNA, RNA and ligands, including their orientation and interfaces; and 3)
provide per-residue confidence estimates of the model. This CAMEO
category is based on an opt-in model: participants only receive the
target type(s) their method is able to model. This means that a method
that only predicts single protein chains can still participate and will
receive the targets composed of only one protein sequence, which can be
either monomers or homo-oligomers, while another method by the same
group might be designed to predict e.g. complexes of proteins with
drug-like small molecules.
In this manuscript, we describe the different types of prediction
targets that CAMEO enables in the new category, and estimate the number
of expected validation targets for each category based on PDB statistics
observed in 2020. One major challenge will be the scoring of the new
type of predictions with regard to the actual experimental structures.
Wherever appropriate, we comment on scores that are foreseen to be
applied to the various prediction types. We are welcoming feedback from
the community regarding complementary scoring approaches.