R-group decomposition and molzip
Generating molecules from all possible combinations of R groups
- Read in the dataset
- Doing the R-group decomposition
- Enumerating all possible molecules from the R groups
Recently a couple of customers have asked questions along the lines of: "How do I do an R-group decomposition and then recombine the cores and R groups to create new molecules?" That's an interesting and useful task which the RDKit has some built-in tools to help with, so I figured I'd do a blog post.
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem import rdqueries
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdRGroupDecomposition
from rdkit.Chem import rdDepictor
from rdkit import Geometry
rdDepictor.SetPreferCoordGen(True)
import itertools
import rdkit
print(rdkit.__version__)
Note: Though I'm doing this blog post using a local build from RDKit master all of the functionality which I demonstrate here is already available in the 2021.09 release series.
Read in a bunch of molecules from a J Med Chem paper (https://doi.org/10.1021/acs.jmedchem.7b00306 the paper is open access). Here I downloaded the SMILES in the supporting information, sketched the scaffold manually, and then saved the scaffold + molecules to an SD file. The scaffold is the first molecule in the SDF.
ms = [mol for mol in Chem.SDMolSupplier('../data/jm7b00306.sdf')]
core = ms[0]
ms.pop(0)
print(f'There are {len(ms)} molecules')
core