from rdkit import Chem
from rdkit.Chem import rdDepictor
True)
rdDepictor.SetPreferCoordGen(from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import BRICS
import rdkit
print(rdkit.__version__)
2025.03.5
August 15, 2025
This post is a short tutorial on using the RDKit’s BRICS implementation, expanding a bit on what’s in the documentation.
BRICS is a method for fragmenting molecules into smaller pieces along bonds which are likely to be synthetically accessible. The original paper describing the method is:
Degen, J.; Wegscheid-Gerlach, C.; Zaliani, A.; Rarey, M. On the Art of Compiling and Using “Drug-Like” Chemical Fragment Spaces. ChemMedChem 2008, 3 (10), 1503–1507. https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/cmdc.200800178
from rdkit import Chem
from rdkit.Chem import rdDepictor
rdDepictor.SetPreferCoordGen(True)
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import BRICS
import rdkit
print(rdkit.__version__)
2025.03.5
esomeprazole = Chem.MolFromSmiles('COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1')
rdDepictor.Compute2DCoords(esomeprazole)
esomeprazole
{'[14*]c1ncc(C)c([16*])c1C',
'[3*]OC',
'[8*]C[S+]([O-])c1nc2cc([16*])ccc2[nH]1'}
The atom environments and connection rules can be found in Scheme 2 of the BRICS paper:
Note that the RDKit BRICS implementation does not contain a definition of L2, a three connected N with a lone pair. We incorporated that with L5 in a more general definition of amine. In the RDKit implementation L5 can connect with L1, L4, L12, L13, L14, L15, and L16. We also added a few additional connection definitions. The actual SMARTS used in the RDKit can be found in MolFragmenter.cpp along with the and the connection rules. There is also a Python version of the definitions in BRICS.py, but this is no longer used and may not exactly the match the C++ implementation.
The RDKit’s BRICS implementation includes the ability to create new molecules by stitching fragments together according to the connection rules above. Here’s an illustration of that showing all the molecules that can be formed by combining the three fragments from esomeprazole using the BRICS connection rules with a maximum enumeration depth of 2:
Here I grab the ChEMBL molecules that were most similar to esomeprazole, fragment them together with esomeprazole, and then explore some of the resulting molecules.
Here’s the query I executed and the results returned:
chembl_35=# select chembl_id,m,similarity from get_mfp2_neighbors('COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1') join chembl_id_lookup on (molregno=entity_id and entity_type='COMPOUND') join molecule_hierarchy using (molregno) where molregno=parent_molregno and similarity<1 order by similarity desc limit 20;
chembl_id | m | similarity
---------------+-------------------------------------------------------------+--------------------
CHEMBL9890 | COc1c(C)cnc(C[S+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c1C | 0.8333333333333334
CHEMBL5089043 | COc1c(C)cnc(C[S@@+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c1C | 0.8333333333333334
CHEMBL5076667 | COc1c(C)cnc(C[S@+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c1C | 0.8333333333333334
CHEMBL3527071 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(CO)c(OC)c3C)nc2c1 | 0.8148148148148148
CHEMBL4525760 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C(=O)O)c(OC)c3C)nc2c1 | 0.8
CHEMBL10184 | COc1c(C)cnc(C[S+]([O-])c2nc3ccccc3[nH]2)c1C | 0.7692307692307693
CHEMBL9430 | COc1cnc(C[S+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c(C)c1OC | 0.7288135593220338
CHEMBL59068 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N(C)C)c3Cl)nc2c1 | 0.7192982456140351
CHEMBL138250 | COc1c(C)cnc(C[S+]([O-])c2nc3cscc3[nH]2)c1C | 0.7169811320754716
CHEMBL5070031 | COc1ccc2[nH]c([S@@+]([O-])Cc3nccc(OC)c3C)nc2c1 | 0.7090909090909091
CHEMBL10061 | COc1cnc(C[S+]([O-])c2nc3cc(OC(F)(F)C(F)F)ccc3[nH]2)c(C)c1OC | 0.6935483870967742
CHEMBL1475252 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2n1 | 0.6909090909090909
CHEMBL9992 | COc1cnc(C[S+]([O-])c2nc3cc(OCC(F)(F)F)ccc3[nH]2)c(C)c1OC | 0.6885245901639344
CHEMBL440676 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCCC4)c3Cl)nc2c1 | 0.6721311475409836
CHEMBL20315 | COc1ccc2[nH]c([S+]([O-])Cc3c(N)cc(C)c(OC)c3C)nc2c1 | 0.6666666666666666
CHEMBL59784 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCCCC4)c3Cl)nc2c1 | 0.6612903225806451
CHEMBL144285 | COc1ccc2[nH]c([S+]([O-])Cc3nccc(OC(C)C)c3C)nc2c1 | 0.65
CHEMBL341550 | COc1ccc(-c2scc3[nH]c([S+]([O-])Cc4ncc(C)c(OC)c4C)nc23)cc1 | 0.65
CHEMBL1796802 | COc1cc2nc([S+]([O-])Cc3ncc(C)c(OC)c3C)[nH]c2cc1NC(C)=O | 0.6451612903225806
CHEMBL58833 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCOCC4)c3Cl)nc2c1 | 0.640625
(20 rows)
the very useful get_mfp2_neighbors()
function is from the cartridge documentation.
Here’s the set of molecules we’ll use:
smis = '''COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1
COc1c(C)cnc(C[S+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c1C
COc1ccc2[nH]c([S+]([O-])Cc3ncc(CO)c(OC)c3C)nc2c1
COc1ccc2[nH]c([S+]([O-])Cc3ncc(C(=O)O)c(OC)c3C)nc2c1
COc1c(C)cnc(C[S+]([O-])c2nc3ccccc3[nH]2)c1C
COc1cnc(C[S+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c(C)c1OC
COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N(C)C)c3Cl)nc2c1
COc1c(C)cnc(C[S+]([O-])c2nc3cscc3[nH]2)c1C
COc1ccc2[nH]c([S+]([O-])Cc3nccc(OC)c3C)nc2c1
COc1cnc(C[S+]([O-])c2nc3cc(OC(F)(F)C(F)F)ccc3[nH]2)c(C)c1OC
COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2n1
COc1cnc(C[S+]([O-])c2nc3cc(OCC(F)(F)F)ccc3[nH]2)c(C)c1OC
COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCCC4)c3Cl)nc2c1
COc1ccc2[nH]c([S+]([O-])Cc3c(N)cc(C)c(OC)c3C)nc2c1
COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCCCC4)c3Cl)nc2c1
COc1ccc2[nH]c([S+]([O-])Cc3nccc(OC(C)C)c3C)nc2c1
COc1ccc(-c2scc3[nH]c([S+]([O-])Cc4ncc(C)c(OC)c4C)nc23)cc1
COc1cc2nc([S+]([O-])Cc3ncc(C)c(OC)c3C)[nH]c2cc1NC(C)=O
COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCOCC4)c3Cl)nc2c1'''.split('\n')
mols = [Chem.MolFromSmiles(smi) for smi in smis]
Draw.MolsToGridImage(mols,molsPerRow=4)
Fragment all the molecules and keep the unique fragments:
{'[1*]C(C)=O',
'[14*]c1ncc(C)c([16*])c1C',
'[14*]c1ncc(C)c([16*])c1Cl',
'[14*]c1ncc([16*])c([16*])c1C',
'[14*]c1nccc([16*])c1C',
'[16*]c1c(C)cc(N)c([16*])c1C',
'[16*]c1ccc([16*])cc1',
'[3*]OC',
'[3*]OC(F)F',
'[3*]O[3*]',
'[4*]C(C)C',
'[4*]C(F)(F)C(F)F',
'[4*]CC(F)(F)F',
'[5*]N(C)C',
'[5*]N1CCCC1',
'[5*]N1CCCCC1',
'[5*]N1CCOCC1',
'[5*]N[5*]',
'[6*]C(=O)O',
'[8*]CO',
'[8*]C[S+]([O-])c1nc2c([14*])scc2[nH]1',
'[8*]C[S+]([O-])c1nc2cc([16*])c([16*])cc2[nH]1',
'[8*]C[S+]([O-])c1nc2cc([16*])ccc2[nH]1',
'[8*]C[S+]([O-])c1nc2ccccc2[nH]1',
'[8*]C[S+]([O-])c1nc2cscc2[nH]1',
'[8*]C[S+]([O-])c1nc2nc([14*])ccc2[nH]1'}
We can then recombine fragments to produce new molecules using BRICS.BRICSBuild
:
The result is a generator:
Here’s what the first 16 results look like:
Let’s filter out the everything that doesn’t have exactly one sulfoxide:
Another option in the BRICS builder is to provide one or more seeds that must be present in every output molecule. Here’s an example of how to do that using the scaffold of esomeprazole along with a version of the scaffold where the N in the second ring is replaced by a C:
seeds = [Chem.MolFromSmiles(x) for x in ('c19ncc(C)c([16*])c1C.C9[S+]([O-])c1nc2cc([16*])ccc2[nH]1',
'c19ccc(C)c([16*])c1C.C9[S+]([O-])c1nc2cc([16*])ccc2[nH]1',)]
Draw.MolsToGridImage(seeds)
Now do the enumeration using those seeds:
random.seed(127)
builder = BRICS.BRICSBuild(fragms,seeds=seeds)
newMols = [next(filter(builder)) for i in range(16)]
Draw.MolsToGridImage(newMols,molsPerRow=4)
Doing enumeration providing seeds that are this specific is similar in ways to doing enumeration using the results of R-group decomposition