Interrupting calculations

tutorial
technical
Sometimes things just take too long…
Published

April 6, 2025

This is a short post to introduce a couple of features to make working with some of the longer-running calculations in the RDKit a bit easier:

from rdkit import Chem
from rdkit.Chem import rdDistGeom
from rdkit.Chem import rdSynthonSpaceSearch
from rdkit.Chem import rdFingerprintGenerator

import time

import rdkit
print(rdkit.__version__)
2025.03.1

Conformer generation

m = Chem.AddHs(Chem.MolFromSmiles('CCCCCCCCCCCCCCCC'))
params = rdDistGeom.ETKDGv3()
params.randomSeed = 0xbad5eed

t1 = time.time()
cids = rdDistGeom.EmbedMultipleConfs(m,1000,params)
t2 = time.time()
print(f'{len(cids)} conformers in {t2-t1:.2f}s')
1000 conformers in 11.72s

What’s new is that you can stop the calculation by pressing the “interrupt the kernel” button in the notebook or by hitting ^C if you are running on the command line:

params = rdDistGeom.ETKDGv3()
params.randomSeed = 0xbad5eed

cids = rdDistGeom.EmbedMultipleConfs(m,1000,params)
[16:34:13] Interrupted, cancelling conformer generation
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[4], line 4
      1 params = rdDistGeom.ETKDGv3()
      2 params.randomSeed = 0xbad5eed
----> 4 cids = rdDistGeom.EmbedMultipleConfs(m,1000,params)

KeyboardInterrupt: Embedding cancelled

Obviously in this case it doesn’t make that big of a difference, but for a molecule which takes longer to embed, for example one with a lot of chiral centers and/or a complex fused-ring system, it’s nice to be able to stop things without having to completely restart the kernel.

It’s now also possible to specify a time out for the conformer generation. If you provide a timeout, the conformer generation will be gracefully cancelled if it runs for longer than the specified value.

Thanks to Nikitas Rontsis, Akvilė Žemgulytė, and Charlie Beattie at Google Deepmind for contributing the timeout support.

params.timeout = 3
t1 = time.time()
cids = rdDistGeom.EmbedMultipleConfs(m,1000,params)
t2 = time.time()
print(f'{len(cids)} conformers in {t2-t1:.2f}s')
1 conformers in 3.01s

The one conformer ID we get is -1, to indicate that the calculation failed:

cids[0]
-1

Synthon searches

Inspired by the changes to the conformer generator, Dave Cosgrove added support for both timeouts and interrupting calculations to the code for doing synthon space searches.

Demonstrate this using synthon search in Chemspace’s FreedomSpace

spc = rdSynthonSpaceSearch.SynthonSpace()
spc.ReadDBFile('/scratch/RDKit_git/Data/2023-05_Freedom_synthons.spc')
spc.GetNumProducts()
9360696185
qry = Chem.MolFromSmiles('FC1=CC=CC2=C1NN=C2')
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000

t1 = time.time()
res = spc.SubstructureSearch(qry,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
5000 results in 1.84s
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000

t1 = time.time()
res = spc.SubstructureSearch(qry,params=ps)
t2 = time.time()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[11], line 5
      2 ps.maxHits = 5000
      4 t1 = time.time()
----> 5 res = spc.SubstructureSearch(qry,params=ps)
      6 t2 = time.time()

RuntimeError: SubstructureSearch cancelled

Substructure searching is really too fast to sensibly demonstrate the timeout, so let’s do a similarity search.

vemurafenib = Chem.MolFromSmiles('CCCS(=O)(=O)Nc1ccc(F)c(c1F)C(=O)c2c[nH]c3c2cc(cn3)c4ccc(Cl)cc4')
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000
ps.numThreads = 8 

fpg = rdFingerprintGenerator.GetMorganGenerator()

t1 = time.time()
res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
[16:34:46] Building the fingerprints may take some time.
0 results in 33.32s
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000
ps.similarityCutoff = 0.4

t1 = time.time()
res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
448 results in 4.88s
ps.timeOut = 1
t1 = time.time()
res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
0 results in 1.03s
[16:35:27] Timed out.
[16:35:27] Timed out.
[16:35:27] Timed out.

We can also cancel the calculation with the “interrupt the kernel” button in the notebook or ^C in the terminal:

ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000
ps.similarityCutoff = 0.4

t1 = time.time()
res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[16], line 6
      3 ps.similarityCutoff = 0.4
      5 t1 = time.time()
----> 6 res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
      7 t2 = time.time()
      8 print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')

RuntimeError: FingerprintSearch cancelled

Other RDKit functions supporting a timeout

Some other RDKit functions have supported providing a timeout for a while:

  • rdFMCS.FindMCS() via rdFMCS.MCSParameters.Timeout
  • The code in rdRascalMCES via rdRascalMCES.RascalOptions.timeout
  • The code in rdRGroupDecompositoin via rdRGroupDecomposition.RGroupDecompositionParameters.timeout

If you have ideas for other long-running functions which should support a timeout and/or being interruptable, please leave a comment or open an RDKit feature request in github.