According to the comparison of the model evaluation results presented inFig. 4(a) , the backbone Model with GINE aggregation method and
AIR residual shows the best performance among the several directly
trained ones, where the accuracy, F1-score and ROC-AUC were 0.897, 0.898
and 0.951, respectively. It indicates that the backbone model, equipped
with the GINE layers and the AIR residual, is effective in extracting
more robust features for distinguishing reactions superiority. The
testing set ROC curves of different models are shown in Fig.
4(b) , which indicate the generalization ability and prediction accuracy
of the model pre-trained using contrastive learning method are
significantly better than those of the direct training ones. The
accuracy, F1-score and ROC-AUC of the model with contrastive learning
pre-training are 0.903, 0.903 and 0.965, respectively, which shows that
the pre-training via contrastive learning method can effectively improve
the generalization ability of the model.
To visualize the effect of
feature extraction by the pre-trained backbone model and the
corresponding data space, the features extracted from the reactions are
projected into 2 dimensions using Uniform Manifold Approximation and
Projection (UMAP)48 which is plotted in Fig.
4(c) . By analyzing the distributions of superior and inferior reaction
data points, a clear distinction can be observed between the main
distributions of superior and inferior reactions, indicating that the
pre-trained with contrastive learning model performs better in
distinguishing the reaction superiority and providing suitable advice
for reaction selections.