Fig. 2: Atom remapping step in RTAAM algorithm
Step 3: Rationality checking. After completing the
supplementation of the missing products and the atom remapping, the
properties of the atom connections and reaction rationality checks are
required to prevent the generation of unreasonable molecules. For atomic
connection check, since some atom’s connection situation does not
satisfy the rationality during the reaction atomic remapping process,
the connection of each atom is checked to make sure all atoms are
reasonably connected. When there are only two remaining atoms require an
additional single bond connection, the algorithm will directly carry out
the connection between two atoms to complete the remapping relationship.
For other cases, if the algorithm fails to perform the complement, the
reaction will be abandoned. The algorithm also detects whether the redox
agent is involved in the reaction to ensure that the mapping
relationship of the atoms in the reaction is constructed reasonably.
2.2.2 Labels generation
After deleting the reactions without SMARTS information and/or the
reactions failed to perform atom mapping complementation algorithm,
2,397,092 reactions are remained for generating modelling dataset. Since
labels are necessary for model construction in supervised learning
methods. Thus, labels are assigned to reactions based on thresholds inTable 2 to classify superior and inferior reactions. Here,
20,000 reactions are selected to assign labels and 1,400 reactions for
external testing. The reactions with high reaction yields, short
reaction times and mild reaction temperatures are regarded as positive
examples, while the reactions with low reaction yields, long reaction
times and tough reaction temperature are regarded as negative ones.