4. Data Availability and Reproducibility Statement

Firstly, all codes needed to get RSscore are available at https://gitee.com/xu-chenyang/rsscore. The results of the RTAAM inFig. 2 can be obtained by the programme in the /Data_Prepare&Augmentation/RTAAM directory of RSscore code and the details of RTAAM algorithm are shown in section S1 of the Supplementary Material. For the reaction representation features in Table 1, more details on reaction representation features are shown in section S2 of the Supplementary Material. The distribution of the reaction yield inFig. 3 is obtained by the USPTO data in the ORD39 database with complete yield, temperature and time information. In Fig. 4(a) and 4(b) , the bar plot and ROC curve are drawn based on the results of the model evaluation inTable 4 used to indicate the advantages difference between different models, the specific construction method and model performance are shown in section S3 of the Supplementary Material. In Fig. 4(c) , the result of the superior and inferior reaction data distribution is drawn by the trained UMAP model. The reaction feature arrays are clustered by UMAP are shown under the /UMAP directory.
The evaluated reactions in Fig. 5 were achieved from Reaxys22, and the RSscore calculation can be obtained in /reaction_evaluation directory of RSscore code. The synthetic routes analysis in Fig. 6 and Fig. 7 can be obtained from patents51-53 and articles49,50. The RSscore calculation can be obtained in /synthetic_route_analysis directory of RSscore code.