4. Data Availability and Reproducibility
Statement
Firstly, all codes needed to get RSscore are available at
https://gitee.com/xu-chenyang/rsscore. The results of the RTAAM inFig. 2 can be obtained by the programme in the
/Data_Prepare&Augmentation/RTAAM directory of RSscore code and the
details of RTAAM algorithm are shown in section S1 of the Supplementary
Material. For the reaction representation features in Table 1, more
details on reaction representation features are shown in section S2 of
the Supplementary Material. The distribution of the reaction yield inFig. 3 is obtained by the USPTO data in the
ORD39 database with complete yield, temperature and
time information. In Fig. 4(a) and 4(b) , the bar plot
and ROC curve are drawn based on the results of the model evaluation inTable 4 used to indicate the advantages difference between
different models, the specific construction method and model performance
are shown in section S3 of the Supplementary Material. In Fig.
4(c) , the result of the superior and inferior reaction data
distribution is drawn by the trained UMAP model. The reaction feature
arrays are clustered by UMAP are shown under the /UMAP directory.
The evaluated reactions in Fig. 5 were achieved from
Reaxys22, and the RSscore calculation can be obtained
in /reaction_evaluation directory of RSscore code. The synthetic routes
analysis in Fig. 6 and Fig. 7 can be obtained from
patents51-53 and articles49,50. The
RSscore calculation can be obtained in /synthetic_route_analysis
directory of RSscore code.