Fig. 2: Atom remapping step in RTAAM algorithm
Step 3: Rationality checking. After completing the supplementation of the missing products and the atom remapping, the properties of the atom connections and reaction rationality checks are required to prevent the generation of unreasonable molecules. For atomic connection check, since some atom’s connection situation does not satisfy the rationality during the reaction atomic remapping process, the connection of each atom is checked to make sure all atoms are reasonably connected. When there are only two remaining atoms require an additional single bond connection, the algorithm will directly carry out the connection between two atoms to complete the remapping relationship. For other cases, if the algorithm fails to perform the complement, the reaction will be abandoned. The algorithm also detects whether the redox agent is involved in the reaction to ensure that the mapping relationship of the atoms in the reaction is constructed reasonably.

2.2.2 Labels generation

After deleting the reactions without SMARTS information and/or the reactions failed to perform atom mapping complementation algorithm, 2,397,092 reactions are remained for generating modelling dataset. Since labels are necessary for model construction in supervised learning methods. Thus, labels are assigned to reactions based on thresholds inTable 2 to classify superior and inferior reactions. Here, 20,000 reactions are selected to assign labels and 1,400 reactions for external testing. The reactions with high reaction yields, short reaction times and mild reaction temperatures are regarded as positive examples, while the reactions with low reaction yields, long reaction times and tough reaction temperature are regarded as negative ones.