Figure 2: Target 2020-05-09_00000305 (PDB ID 7BRP) is a structure of
the SARS-CoV-2 main protease in complex with Boceprevir
(Fu et al. 2020). At the
time of pre-release, the structure of the protease had already been
solved, and was therefore a trivial modeling target on its own. However
it had not been observed in complex with Boceprevir, and therefore the
complex was deemed interesting for ligand modeling.
To score these predictions, we will follow the procedure developed by
the CELPP community, and evaluate ligand poses with a symmetry-corrected
RMSD (Wagner et al. 2019) .
Peptides
Accurately predicting the structures of short proteins or peptides has
always been challenging for comparative modeling. As a consequence, many
protein prediction servers have limits on the minimal length of protein
sequences that they attempt to predict. CAMEO has so far taken a
conservative approach and submitted targets containing at least 30 amino
acids to the participants. In the future, participants will be able
opt-in to also receive peptides with less than 30 residues as targets.
These targets are relevant in areas of research such as for instance
host-pathogen interactions.
In order to identify interesting novel targets, we considered a
conservative cut-off of 100% sequence identity to a template. In 2020,
the PDB released 536 novel structures containing at least one amino acid
sequence of less than 30 residues in 2020. In 453 structures, such
peptides were in complex with a protein or DNA/RNA, making those
structures suitable for instance for peptide-protein docking methods. In
83 structures, the peptides were observed in monomeric or
homo-oligomeric forms, mainly with NMR. Advances in AI and de
novo modeling technologies may very well make it feasible to predict
the structure of those peptides.
The interface (QS-score) and complex (oligo-lDDT) scores can be used to
score protein-peptide complexes. However additional scores like those
used in the CAPRI experiment
(Lensink et al. 2020), and
others geared towards protein-peptide docking, will also be considered.
DNA and RNA
Predicting the 3D structure of nucleic acids remains a challenge. To the
best of our knowledge, no fully automated prediction server is publicly
available, although several standalone approaches have been published.
(Wirecki et al.
2020; Orengo et al. 2020; Miao et al. 2020)
Considering a conservative cut-off of 100% sequence identity with
previously known structures to identify interesting novel targets, 323
new structures containing RNA were released by the PDB in 2020, and 390
containing DNA. Most of them were in complex with proteins, and only 42,
respectively 57 targets contained only nucleic acids. This low number of
modeling targets might prove a challenge for blind benchmarking of
nucleic acid structure prediction methods.
The CAD-score was reported to be an appropriate score to evaluate DNA
and RNA predictions (Kliment
Olechnovič and Venclovas 2014). Other all-atom scores are also being
considered.
Mixed Complexes
Finally, CAMEO can submit targets containing a mixture of all of the
above: complexes with proteins, peptides, nucleic acids and ligands
(Figure 3). While this prediction task is to date extremely challenging
for most methods, we believe it should be the ultimate goal in 3D
structure prediction: the ability to predict any biologically relevant
macromolecular structure, regardless of its composition.