Figure 3: Target 2020-05-30_00000276 (PDB ID 6LQF) is an ARID-PHD protein cassette in complex with a peptide, DNA and zinc ions (Tan et al. 2020). The protein only has remote similarity (< 30% sequence identity) to known structures, and none of them are in complex with DNA or the H3K4me3 peptide, making it an extremely challenging target. We are not aware of any methods that would currently be able to model this type of complex with acceptable accuracy. It should be noted that the peptide contains a non-canonical residue (N-Trimethyllysine, derived from Lysine).
In 2020, following the criteria outlined in the previous sections, we observed 983 structures containing more than one type of polymer entities. All of them were proteins in complex with peptides (421), DNA (279), RNA (199), DNA and RNA (52) or both peptides and nucleic acids (32).
With appropriate extensions, we believe that some of the scores selected for the individual target types such as the oligo-lDDT and CAD-score will be applicable to evaluate all these targets in a consistent manner.

Non-canonical amino acids and bases

Macromolecular structures frequently contain amino (or nucleic) acid residues which are not part of the 20 (respectively 8) standard residues. Traditionally for modeling purposes, the target sequences are canonicalized, that is modified residues are represented by their “parent” or closest canonical amino acid residue. However this may result in suboptimal models which wouldn’t accurately represent the region containing the modification. Post-translational modifications such as phosphorylations can result in significant conformational changes of the protein structure, which would be impossible to correctly model without knowledge of the modification.
As this information is available at the time of pre-release, CAMEO can provide sequences containing non-canonical residues on an opt-in basis (Figure 3). In this case, sequences will contain the PDB component identifier (typically 3 letters) enclosed in round brackets, in place of the parent amino acids. Models correctly representing those residues are expected to obtain higher scores for the all-atom measures such as the lDDT or the CAD-score.
In 2020, 444 of the 4323 protein, DNA, RNA and mixed structures and complexes we observed contained non-canonical residues. We observed these non-canonical residues in proteins (286), peptides (112), DNA (35) and RNA (27). 16 of them were observed in mixed complexes.

Current implementation status of CAMEO

At the time of writing, the CAMEO “Structures & Complexes” functionality is available as a beta version athttps://beta.cameo3d.org/ and is open for registrations. It has been providing targets containing proteins, DNA and RNA to registered servers on a weekly basis since October 2020. Participants can currently choose to receive the non-polymer ligands contained in these targets as InChI codes or PDB component IDs, as well as non-canonicalized sequences including modified residues. Predictions can be returned in PDB or mmCIF format, and are assessed with a fully automated pipeline including the oligo-lDDT and QS-scores. A weekly download of models, reference structures and assessment results is made available for offline analysis.
Our next steps will be to refine the target selection process, especially with respect to selecting relevant ligand targets as described in the previous sections. We are exploring ways to increase the diversity of the target selection, while ensuring that as many participants as possible receive a common subset of targets in order to make comparisons between servers possible for some aspects of the evaluation. We aim to improve the scoring by providing more diverse scores as described in the previous sections. Most groups developing novel methods have implemented their own scoring workflows locally. We therefore consider at this point the raw data downloads of the prediction results as a crucial service to the community developing specialized prediction methods as it allows including independent blind prediction data in publications describing the new method.