where \(h_{i}^{(l+1)}\) means the i ’s node feature after\(l+1\) times aggregation, \(f_{\theta}\) means the function used in updating node features, \(e_{j,i}^{(l)}\) is the edge feature between node i and j after \(l\) times aggregation, \(\epsilon\)is learnable parameter to control last times aggregation features influence during the message passing operation.
The decoder part utilizes the MLP for data transformation of the reaction information extracted from the backbone model. In this model framework, the data augmented from the same samples are considered as positive samples, while the different ones are considered as negative samples. To maximum the difference between positive and negative examples, and minimize the difference between positives ones, the normalized temperature-scaled cross-entropy (NT-Xent) loss42 is used as contrastive learning loss. The formula of NT-Xent loss is shown in Eq. (2) and (3).