where \(h_{i}^{(l+1)}\) means the i ’s node feature after\(l+1\) times aggregation, \(f_{\theta}\) means the function used in
updating node features, \(e_{j,i}^{(l)}\) is the edge feature between
node i and j after \(l\) times aggregation, \(\epsilon\)is learnable parameter to control last times aggregation features
influence during the message passing operation.
The decoder part utilizes the MLP for data transformation of the
reaction information extracted from the backbone model. In this model
framework, the data augmented from the same samples are considered as
positive samples, while the different ones are considered as negative
samples. To maximum the difference between positive and negative
examples, and minimize the difference between positives ones, the
normalized temperature-scaled cross-entropy (NT-Xent)
loss42 is used as contrastive learning loss. The
formula of NT-Xent loss is shown in Eq. (2) and (3).