Getting more information when the conformer generation doesn’t work
Published
May 17, 2023
When the RDKit generates conformers for a molecule it performs a number of checks along the way to make sure that the atomic coordinates are reasonable/correct. When one of these checks fails, the current conformer is abandoned and the coordinate generation process starts from scratch. The maximum number of times this restart can happen is controlled by the maxIterations parameter: after maxIterations restarts the conformer generation fails. If you’ve called EmbedMolecule(), asking for a single conformer, you’ll get a conformation ID of -1 to mark this failure. If you’ve asked for multiple conformers, you won’t get as many as you requested.
One of the changes for the 2023.03.1 release of the RDKit was the addition of an option to track which checks are failing during conformer generation. This can help understand what problems were encountered and to understand why conformer generation is not generating the expected results. Since, as is often the case, the RDKit’s capabilities are ahead of its documentation, this short blog post will explain how to use this functionality and interpret the results.
A general overview of the conformer generation process can be found in the documentation.
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension: jupyter labextension install jupyterlab_3dmol
Now invert one of the stereocenters to create the stereoisomer which can’t actually exist as a stable structure. Imagine inverting one of the chiral centers in the above molecule and you will see the problem: we’d have to point the attached halogen into the center of the cage.
Here’s a quick loop over all the failure causes to see where the problems were:
def print_failure_causes(counts):for i,k inenumerate(rdDistGeom.EmbedFailureCauses.names):print(k,counts[i])# in v2022.03.1 two names are missing from `rdDistGeom.EmbedFailureCauses`:print('LINEAR_DOUBLE_BOND',counts[i+1])print('BAD_DOUBLE_BOND_STEREO',counts[i+2])
INITIAL_COORDS: generation of the initial coordinates from the random distance matrix (default) or from a set of random coordinates (when using random coordinate embedding) failed.
FIRST_MINIMIZATION: the initial optimization of the atom positions using the distance-geometry force field failed to produce a low-enough energy conformer. The check here has thresholds for both average energy per atom and the individual atom energies. I’m not providing the threshold values here since the energies from the distance-geometry force field are not physically meaningful - the threshold values are not interpretable.
CHECK_TETRAHEDRAL_CENTERS: at least one tetrahedral C and N centers either has a volume around it which is too small or is outside the volume defined by its neighbors
MINIMIZE_FOURTH_DIMENSION: the minmization to force the values of the fourth-dimensional component of each atom position failed
ETK_MINIMIZATION: after the minimization with the ET and/or K terms, at least one atom which should have been planar was not
FINAL_CHIRAL_BOUNDS: the neighborhood of an atom with specified chirality was too distorted (it violated distance constraints)
FINAL_CENTER_IN_VOLUME: an atom with specified chirality was outside of the volume defined by its neighbors
LINEAR_DOUBLE_BOND: one of the end atoms of a double bond had a linear geometry
BAD_DOUBLE_BOND_STEREO: the stereochemistry of a double bond with specified stereochemistry was wrong in the generated conformer
Just for comparison, the starting molecule had no failures:
Here we did get a conformer, but there were some rejected conformers along the way. The double-bond-related failures indicate that the code had some problems with the ring double bond with specified stereochemistry.