2. Method

2.1 Overview

In this section, a model framework for generating the probability of reaction superiority (RSscore) is developed. As shown in Fig. 1 , the proposed framework is divided into 4 parts: (a) Reaction Total Atom-Atom Mapping algorithm; (b) Reaction condense hypergraph generation; (c) Contrastive learning pre-training process and (d) Supervised learning fine-tuning process.
As some unimportant products and some atom mapping relationship are not recorded in Open Reaction Database (ORD)39, to complement the missing products and the atom mapping relationship, Reaction Total Atom-Atom Mapping (RTAAM) algorithm is developed to compensate these information (Fig 1(a) ), which provides more reaction features for model and improves prediction accuracy. In part (b), a new condensed hypergraph descriptor is proposed to describe chemical reactions through connecting reaction graph and agent graph using molecular/reaction node (Fig. 1(b) ). With this descriptor, the information of reactions and agents are integrated into the summary node, at the same time, the influence between reactions and agents can be considered. In part (c), the contrastive learning model is utilized to pre-train the initial parameters of the backbone model in unlabeled reaction data. After data augmentation and contrastive learning model training, the initial parameters of the backbone model are optimized to bring similar reaction features closer together and push different reaction features further apart (Fig. 1(c) ). In part (d), the parameters of the backbone model and multilayer perceptron (MLP) layers are fine-tuned using supervised learning. After mapping the output values to the interval from 0 to 1 using Sigmoid activation function, the superiority probability of the reaction is generated to evaluate the reactions. (Fig. 1(d) )