Figure 2: Target 2020-05-09_00000305 (PDB ID 7BRP) is a structure of the SARS-CoV-2 main protease in complex with Boceprevir (Fu et al. 2020). At the time of pre-release, the structure of the protease had already been solved, and was therefore a trivial modeling target on its own. However it had not been observed in complex with Boceprevir, and therefore the complex was deemed interesting for ligand modeling.
To score these predictions, we will follow the procedure developed by the CELPP community, and evaluate ligand poses with a symmetry-corrected RMSD (Wagner et al. 2019) .

Peptides

Accurately predicting the structures of short proteins or peptides has always been challenging for comparative modeling. As a consequence, many protein prediction servers have limits on the minimal length of protein sequences that they attempt to predict. CAMEO has so far taken a conservative approach and submitted targets containing at least 30 amino acids to the participants. In the future, participants will be able opt-in to also receive peptides with less than 30 residues as targets. These targets are relevant in areas of research such as for instance host-pathogen interactions.
In order to identify interesting novel targets, we considered a conservative cut-off of 100% sequence identity to a template. In 2020, the PDB released 536 novel structures containing at least one amino acid sequence of less than 30 residues in 2020. In 453 structures, such peptides were in complex with a protein or DNA/RNA, making those structures suitable for instance for peptide-protein docking methods. In 83 structures, the peptides were observed in monomeric or homo-oligomeric forms, mainly with NMR. Advances in AI and de novo modeling technologies may very well make it feasible to predict the structure of those peptides.
The interface (QS-score) and complex (oligo-lDDT) scores can be used to score protein-peptide complexes. However additional scores like those used in the CAPRI experiment (Lensink et al. 2020), and others geared towards protein-peptide docking, will also be considered.

DNA and RNA

Predicting the 3D structure of nucleic acids remains a challenge. To the best of our knowledge, no fully automated prediction server is publicly available, although several standalone approaches have been published. (Wirecki et al. 2020; Orengo et al. 2020; Miao et al. 2020)
Considering a conservative cut-off of 100% sequence identity with previously known structures to identify interesting novel targets, 323 new structures containing RNA were released by the PDB in 2020, and 390 containing DNA. Most of them were in complex with proteins, and only 42, respectively 57 targets contained only nucleic acids. This low number of modeling targets might prove a challenge for blind benchmarking of nucleic acid structure prediction methods.
The CAD-score was reported to be an appropriate score to evaluate DNA and RNA predictions (Kliment Olechnovič and Venclovas 2014). Other all-atom scores are also being considered.

Mixed Complexes

Finally, CAMEO can submit targets containing a mixture of all of the above: complexes with proteins, peptides, nucleic acids and ligands (Figure 3). While this prediction task is to date extremely challenging for most methods, we believe it should be the ultimate goal in 3D structure prediction: the ability to predict any biologically relevant macromolecular structure, regardless of its composition.