the atoms and pair-wise interactions are sorted into bags, Bond Angle Torsion (BAT)\cite{Huang_2016}, and Bond Angle Torsion Typed (BATTY).
To our knowledge, this is the most extensive computational validation set, both in terms of the number of compounds, geometries, and computational methods for studying low energy molecular conformers. We provide all data and analysis scripts as open data and open source to allow future reuse.[cite GitHub repo]

Results and Discussion

In this work, we focus on the evaluation of single point atomization energy calculations on a subset of 700 organic molecules. Conformers were initially created from a set of 250 diverse poses with maximal heavy-atom root mean squared deviation (RMSD) using Open Babel, and at most 10 poses were selected based on the lowest heat of formation calculated by PM7, followed by full geometry optimization using B3LYP-D3BJ and the def2-SVP basis set.
Using this set of density-functional optimized minima, single point atomization energies were computed using the DLPNO-CCSD(T)\cite{Liakos_2015,Guo_2018} method using the cc-pVTZ basis set.(Dunning 1989, Kendall 1992) This approach has been found to be a highly accurate method for calculating thermochemical properties[r] and with significantly lower computational cost for medium to large organic molecules, compared to canonical CCSD(T) methods. Since some molecules in the original test set included iodine and some DLPNO-CCSD(T) calculations did not converge, a set of 6,756 single-point calculations across 700 different molecules was completed for comparison.
By considering a large number of diverse organic molecules with many poses per molecule, we seek to sample a wide variety of conformer energy preferences (e.g., intramolecular hydrogen and halogen bonding, electrostatic interactions, etc.). While using optimized low-energy conformers may under-estimate the degree of correlation for high-energy structures,\cite{Sharapa_2018} we believe the current metric is a difficult but useful comparison. Regardless of excluding high-energy geometries, many computational predictions rely on Boltzmann-weighted averages of multiple thermally accessible conformers, including NMR  prediction, even understanding the effects of dipole moments on solvent viscosity.\cite{Vo_2019}

Comparison of single points vs. DLPNO-CCSD(T)

For comparison, we considered a wide variety of currently available computational methods:
In the case of B3LYP and PBE dispersion-corrected functionals, we also considered both the commonly-used double-zeta def2-SVP and triple-zeta def2-TZVP basis sets to understand the effects of basis set size.
Since some basis sets (i.e., cc-pVTZ) did not support iodine, and some calculations failed to converge, using only the set of molecules in which one or more methods were not run leaves 6511 entries. Of those, 9 molecules (out of 690) had 2 or fewer poses and were also removed, leaving 681 unique molecules and ~6500 entries for comparison.
[ table of correlations R^2]