from rdkit import Chem
from rdkit.Chem import rdDistGeom
from rdkit.Chem import rdSynthonSpaceSearch
from rdkit.Chem import rdFingerprintGenerator
import time
import rdkit
print(rdkit.__version__)
2025.03.1
April 6, 2025
This is a short post to introduce a couple of features to make working with some of the longer-running calculations in the RDKit a bit easier:
from rdkit import Chem
from rdkit.Chem import rdDistGeom
from rdkit.Chem import rdSynthonSpaceSearch
from rdkit.Chem import rdFingerprintGenerator
import time
import rdkit
print(rdkit.__version__)
2025.03.1
params = rdDistGeom.ETKDGv3()
params.randomSeed = 0xbad5eed
t1 = time.time()
cids = rdDistGeom.EmbedMultipleConfs(m,1000,params)
t2 = time.time()
print(f'{len(cids)} conformers in {t2-t1:.2f}s')
1000 conformers in 11.72s
What’s new is that you can stop the calculation by pressing the “interrupt the kernel” button in the notebook or by hitting ^C
if you are running on the command line:
params = rdDistGeom.ETKDGv3()
params.randomSeed = 0xbad5eed
cids = rdDistGeom.EmbedMultipleConfs(m,1000,params)
[16:34:13] Interrupted, cancelling conformer generation
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) Cell In[4], line 4 1 params = rdDistGeom.ETKDGv3() 2 params.randomSeed = 0xbad5eed ----> 4 cids = rdDistGeom.EmbedMultipleConfs(m,1000,params) KeyboardInterrupt: Embedding cancelled
Obviously in this case it doesn’t make that big of a difference, but for a molecule which takes longer to embed, for example one with a lot of chiral centers and/or a complex fused-ring system, it’s nice to be able to stop things without having to completely restart the kernel.
It’s now also possible to specify a time out for the conformer generation. If you provide a timeout, the conformer generation will be gracefully cancelled if it runs for longer than the specified value.
Thanks to Nikitas Rontsis, Akvilė Žemgulytė, and Charlie Beattie at Google Deepmind for contributing the timeout support.
params.timeout = 3
t1 = time.time()
cids = rdDistGeom.EmbedMultipleConfs(m,1000,params)
t2 = time.time()
print(f'{len(cids)} conformers in {t2-t1:.2f}s')
1 conformers in 3.01s
The one conformer ID we get is -1, to indicate that the calculation failed:
Inspired by the changes to the conformer generator, Dave Cosgrove added support for both timeouts and interrupting calculations to the code for doing synthon space searches.
Demonstrate this using synthon search in Chemspace’s FreedomSpace
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000
t1 = time.time()
res = spc.SubstructureSearch(qry,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
5000 results in 1.84s
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000
t1 = time.time()
res = spc.SubstructureSearch(qry,params=ps)
t2 = time.time()
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[11], line 5 2 ps.maxHits = 5000 4 t1 = time.time() ----> 5 res = spc.SubstructureSearch(qry,params=ps) 6 t2 = time.time() RuntimeError: SubstructureSearch cancelled
Substructure searching is really too fast to sensibly demonstrate the timeout, so let’s do a similarity search.
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000
ps.numThreads = 8
fpg = rdFingerprintGenerator.GetMorganGenerator()
t1 = time.time()
res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
[16:34:46] Building the fingerprints may take some time.
0 results in 33.32s
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000
ps.similarityCutoff = 0.4
t1 = time.time()
res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
448 results in 4.88s
ps.timeOut = 1
t1 = time.time()
res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
0 results in 1.03s
[16:35:27] Timed out.
[16:35:27] Timed out.
[16:35:27] Timed out.
We can also cancel the calculation with the “interrupt the kernel” button in the notebook or ^C
in the terminal:
ps = rdSynthonSpaceSearch.SynthonSpaceSearchParams()
ps.maxHits = 5000
ps.similarityCutoff = 0.4
t1 = time.time()
res = spc.FingerprintSearch(vemurafenib,fpg,params=ps)
t2 = time.time()
print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s')
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[16], line 6 3 ps.similarityCutoff = 0.4 5 t1 = time.time() ----> 6 res = spc.FingerprintSearch(vemurafenib,fpg,params=ps) 7 t2 = time.time() 8 print(f'{len(res.GetHitMolecules())} results in {t2-t1:.2f}s') RuntimeError: FingerprintSearch cancelled
Some other RDKit functions have supported providing a timeout for a while:
rdFMCS.FindMCS()
via rdFMCS.MCSParameters.Timeout
rdRascalMCES
via rdRascalMCES.RascalOptions.timeout
rdRGroupDecompositoin
via rdRGroupDecomposition.RGroupDecompositionParameters.timeout
If you have ideas for other long-running functions which should support a timeout and/or being interruptable, please leave a comment or open an RDKit feature request in github.