where \(\alpha_{t}\) is the balanced factor to solve category
imbalances, \(\gamma\) is the focusing factor to control the model focus
on hard negative samples, \(p_{t}\) is the positive probability
predicted by the model.
After dividing the training set, validation set, and test set in
accordance with 8:1:1 for model fine-tuning operations. The parameters
of the backbone model and the newly constructed MLP layers are optimized
using supervised learning methods. After mapping the output values to
the interval from 0 to 1 using the Sigmoid activation function, the
superiority probability of the reaction is generated to evaluate the
reactions.
2.3.4 Details of model implementation
The pre-training model (Fig. 1(c) ) uses the
SGD47 optimizer to optimize the parameters of backbone
encoder model and projection head. The initial learning rate for
backbone model pre-training process is set to 0.01 with cosine learning
rate decay. It includes one warm-up epoch. The weight decay is set to
0.0005 and the momentum is set to 0.9 which can improve the prediction
accuracy and learning efficiency. The total epoch number for model
pre-training is 8 with a batch size 512, providing initial parameters
for backbone model.
The fine-tuning model (Fig. 1(d) ) uses the
Adam47 optimizer for gradient decent optimization. The
initial learning rate for model fine-tuning is set to 0.001 with cosine
learning rate decay, and the weight decay is set to 0.00001. The total
number of epochs is controlled by the early stop strategy. Training is
terminated when there is no improvement in accuracy in the validation
set for 20 consecutive times with a batch size 256.
The backbone model is constructed with a depth of 5 GINE layers with 0.1
possibility of dropout. The hidden dimension of this backbone model is
set to 300 and its readout dimension is 512. Through fully connected
layers constructed in the fine-tune process with hidden dimension 256
and a dropout rate 0.5, the model could be trained to predict reaction
superiority possibility easily.