We find that consistent with common assumptions, even recent methods roughly adhere to the requirement of significant increases in computational (time) cost to gain increased thermochemical accuracy, as illustrated in Figure \ref{444292} with R2. Similar trends are found for MARE and Spearman ρ metrics. Since multiple studies have demonstrated the need for accurate treatment of noncovalent interactions including intramolecular electrostatic and dispersion effects for understanding conformer relative energies, it is not surprising that this benchmark illustrates the significant accuracy advantage of modern dispersion-corrected density functional and wavefunction methods.

Use of Machine Learning Methods as Surrogates: ANI and Bag-of-Features

One possible solution to the trade-off between accuracy and computational cost would be the growing use of machine learning (ML) methods in chemistry, particularly as a surrogate for thermochemical parameters such as atomization energies.\cite{Rupp_2012,Hansen_2013,Faber_2017} Typically, these ML methods use deep neural networks (DNN) and have been trained to density functional calculations, particularly hybrid B3LYP or ωB97X atomization energies\cite{Ramakrishnan_2014}\cite{Smith_2017} although recent efforts have included training on coupled-cluster quality data as well.\cite{S_Smith_2019}
In principle, since the evaluation of the DNN is fast, the time required for the prediction of an ML method is dominated by the time to generate the input descriptors – still only a small fraction of that required for a quantum calculation. Therefore, if an ML method could reproduce density functional or coupled-cluster energies at semiempirical or force field computational cost, it would dramatically change the conventional accuracy/time tradeoff.
While evaluation of DNN methods would be significantly faster on graphics processing units (GPUs), and may not be optimized for CPU evaluation, we note that many quantum chemistry methods are also accelerated on GPUs. Thus we retain the single-core CPU timings in Table \ref{310223} and Figure \ref{444292} but note that the actual speed of ML methods such as ANI would be faster when evaluated on a modern GPU.

ANI methods

Table \ref{310223} and Figure \ref{444292} show the ANI family ML methods, ANI-1x, ANI-1ccx, and ANI-2x, performing similarly to GFN tight binding semiempirical methods in both accuracy and speed. ANI-1cxx outperforms the ANI-1x model that does not contain dispersion corrections while performing slightly better than the ANI-2 model. The inclusion of dispersion correction for DFT methods is clearly beneficial as they improve upon their non-dispersion corrected counterparts, as seen in Table \ref{498669}.
In principal, it is possible to perform post hoc addition of a D3 dispersion correction to both ANI-1x and ANI-2x. Table \ref{201193} shows potentially improved performance over their non-dispersion corrected counterparts, although the differences are not statistically significant. Moreover, since the D3 dispersion correction for ωB97X-D3 cannot be calculated by standard tools, applying such a post hoc correction is challenging. For our set, one could calculate the dispersion correction from the ωB97X-D3 calculations performed on the same molecule, but without such density functional calculations, applying dispersion correction would be impossible.
While the newer D4 correction\cite{Caldeweyher_2019,Caldeweyher_2017} can be calculated using the DFTD4 program,\cite{dftd4dftd4} we find adding D4 corrections worsen the median R2 and Spearman metrics, although again the differences are not statistically significant. The variance of applying D3 and D4 corrections to the ANI models illustrates the challenge in current machine learning methods. Since they inherently add some error on top of the underlying data used for training the model, use of coupled-cluster or other highly accurate dispersion-corrected training is needed.