How long does it take to do common tasks in the RDKit?

optimization
reference
Setting things up to track performance over time.
Published

October 31, 2025

Years ago (back in the sourceforge days), I used to maintain a page with info about how long it took to do common operations in the RDKit. This was useful for both reference purposes and to track the evolution of RDKit performance over time. At some point I stopped doing this, but a recently merged PR got me thinking about this again (@Andrew: thanks for that contribution!).

I’m going to use this notebook to explain and run some new benchmarks (these are different from the PR mentioned above, which is meant to run as part of the RDKit build process). The results, including historical results, are tabulated in the wiki I will update this post as I add more benchmarks.

Let me know if you have ideas for interesting and useful benchmarks I should add!

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

%load_ext sql

Get the SMILES we will be working with:

Get 10000 random compounds from ChEMBL that have a pchembl_value >= 9 and don’t have multiple components.

This stuff doesn’t need to be run every time, so I’m not saving the cells as code.

d = %sql postgresql://localhost/chembl_36 \
  select distinct(canonical_smiles) canonical_smiles,chembl_id from compound_structures tablesample bernoulli(20) repeatable (123892) \
    join chembl_id_lookup on (molregno=entity_id and entity_type='COMPOUND') \
    join activities using (molregno) \
    where activities.pchembl_value>=9 and \
    position('.' in canonical_smiles)=0 \
    limit 10000;

The distinct(canonical_smiles) query I did orders the results, so it looks like all of the compounds have isotopes specified. This is not actually the case:

Make sure all of those convert cleanly into molecules:

sum(1 for x,y in d if Chem.MolFromSmiles(x) is not None)
with open('../data/chembl36_very_active.txt','w+') as outf:
    outf.write('chembl_id canonical_smiles\n')
    for smi,cid in d:
        outf.write(f'{cid} {smi}\n')

Run the benchmarks

import rdkit
print(rdkit.__version__)
2025.09.1
with open('../data/chembl36_very_active.txt','r') as inf:
    ls = [x.strip().split() for x in inf]
    ls.pop(0)
    data = [(smi,cid) for cid,smi in ls]
len(data)
10000

Construct molecule from SMILES

%timeit ms = [Chem.MolFromSmiles(smi) for smi,cid in data]
1.2 s ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
ms = [Chem.MolFromSmiles(smi) for smi,cid in data]

Generate canonical SMILES

%timeit [Chem.MolToSmiles(m) for m in ms]
658 ms ± 2.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
from rdkit.Chem import rdDepictor

Generating 2D coordinates

rdDepictor.SetPreferCoordGen(False)
%timeit [rdDepictor.Compute2DCoords(m) for m in ms]
47.4 s ± 482 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
rdDepictor.SetPreferCoordGen(True)
%timeit [rdDepictor.Compute2DCoords(m) for m in ms]
1min 55s ± 334 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
ms2d = [Chem.Mol(m) for m in ms]
_ = [rdDepictor.Compute2DCoords(m) for m in ms2d]

Writing mol blocks

%timeit [Chem.MolToMolBlock(m) for m in ms2d]
877 ms ± 9.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [Chem.MolToV3KMolBlock(m) for m in ms2d]
1.08 s ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
mbs = [Chem.MolToMolBlock(m) for m in ms2d]
mbs3k = [Chem.MolToV3KMolBlock(m) for m in ms2d]

Parsing mol blocks

%timeit [Chem.MolFromMolBlock(m) for m in mbs]
[08:31:09] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:11] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:12] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:14] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:16] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:17] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:19] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:21] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
1.71 s ± 17.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [Chem.MolFromMolBlock(m) for m in mbs3k]
[08:31:23] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:24] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:26] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:28] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:30] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:32] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:34] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:36] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
1.87 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Adding/removing Hs

%timeit [Chem.AddHs(m) for m in ms]
456 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
mhs = [Chem.AddHs(m) for m in ms]
%timeit [Chem.RemoveHs(m) for m in mhs]
1.52 s ± 5.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Conformer generation

from rdkit.Chem import rdDistGeom
ps = rdDistGeom.EmbedParameters()
ps.randomSeed = 0xf00d
%timeit [rdDistGeom.EmbedMolecule(m,ps) for m in mhs[:1000]]
[08:14:36] UFFTYPER: Unrecognized charge state for atom: 38
[08:15:19] UFFTYPER: Unrecognized charge state for atom: 38
[08:16:03] UFFTYPER: Unrecognized charge state for atom: 38
[08:16:48] UFFTYPER: Unrecognized charge state for atom: 38
[08:17:32] UFFTYPER: Unrecognized charge state for atom: 38
[08:18:16] UFFTYPER: Unrecognized charge state for atom: 38
[08:19:00] UFFTYPER: Unrecognized charge state for atom: 38
[08:19:45] UFFTYPER: Unrecognized charge state for atom: 38
44.1 s ± 170 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
ps = rdDistGeom.ETKDGv3()
ps.randomSeed = 0xf00d
%timeit [rdDistGeom.EmbedMolecule(m,ps) for m in mhs[:1000]]
[08:36:46] UFFTYPER: Unrecognized charge state for atom: 38
[08:37:52] UFFTYPER: Unrecognized charge state for atom: 38
[08:38:58] UFFTYPER: Unrecognized charge state for atom: 38
[08:40:03] UFFTYPER: Unrecognized charge state for atom: 38
[08:41:09] UFFTYPER: Unrecognized charge state for atom: 38
[08:42:14] UFFTYPER: Unrecognized charge state for atom: 38
[08:43:20] UFFTYPER: Unrecognized charge state for atom: 38
[08:44:25] UFFTYPER: Unrecognized charge state for atom: 38
1min 5s ± 245 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Generate fingerprints

from rdkit.Chem import rdFingerprintGenerator
fpg = rdFingerprintGenerator.GetMorganGenerator(radius=3)
%timeit fpg.GetFingerprints(ms)
497 ms ± 3.53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
fpg = rdFingerprintGenerator.GetMorganGenerator(radius=2)
%timeit fpg.GetFingerprints(ms)
379 ms ± 4.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
fpg = rdFingerprintGenerator.GetRDKitFPGenerator()
%timeit fpg.GetFingerprints(ms)
13.1 s ± 71.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
fpg = rdFingerprintGenerator.GetRDKitFPGenerator(maxPath=5)
%timeit fpg.GetFingerprints(ms)
4.48 s ± 27.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
fpg = rdFingerprintGenerator.GetAtomPairGenerator()
%timeit fpg.GetFingerprints(ms)
1.65 s ± 8.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
fpg = rdFingerprintGenerator.GetTopologicalTorsionGenerator()
%timeit fpg.GetFingerprints(ms)
1.36 s ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [Chem.PatternFingerprint(m) for m in ms]
2.49 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)