2. Method
2.1 Overview
In this section, a model framework for generating the probability of
reaction superiority (RSscore) is developed. As shown in Fig.
1 , the proposed framework is divided into 4 parts: (a) Reaction Total
Atom-Atom Mapping algorithm; (b) Reaction condense hypergraph
generation; (c) Contrastive learning pre-training process and (d)
Supervised learning fine-tuning process.
As some unimportant products and some atom mapping relationship are not
recorded in Open Reaction Database (ORD)39, to
complement the missing products and the atom mapping relationship,
Reaction Total Atom-Atom Mapping (RTAAM) algorithm is developed to
compensate these information (Fig 1(a) ), which provides more
reaction features for model and improves prediction accuracy. In part
(b), a new condensed hypergraph descriptor is proposed to describe
chemical reactions through connecting reaction graph and agent graph
using molecular/reaction node (Fig. 1(b) ). With this
descriptor, the information of reactions and agents are integrated into
the summary node, at the same time, the influence between reactions and
agents can be considered. In part (c), the contrastive learning model is
utilized to pre-train the initial parameters of the backbone model in
unlabeled reaction data. After data augmentation and contrastive
learning model training, the initial parameters of the backbone model
are optimized to bring similar reaction features closer together and
push different reaction features further apart (Fig. 1(c) ). In
part (d), the parameters of the backbone model and multilayer perceptron
(MLP) layers are fine-tuned using supervised learning. After mapping the
output values to the interval from 0 to 1 using Sigmoid activation
function, the superiority probability of the reaction is generated to
evaluate the reactions. (Fig. 1(d) )