I was chatting with Pat Walters at the CADD GRC and he brought up point 7 from his famous 14 points LinkedIn post. The usual justification given for using mol2 files is that they are the only file format we have that allows including partial charges. This is a widely held opinion, but it’s not really true. It’s actually quite easy to store arbitrary atom (or bond) properties in SD files and the RDKit has a simple mechanism for doing this. This post is a short tutorial on how to use this functionality.
The handling of atomic and bond properties in SD files is described in more detail in the docs.
Rather than doing the usual thing and generating Gasteiger-Marsilli charges, here I’m going to use the DASH-props tree that we developed in the Riniker lab to rapidly generate approximate AM1BCC charges. DASH uses a hierarchical tree of substructures to compute high-quality estimates of AM1BCC charges for a molecule. Because it’s based on substructures, the method is fast and conformation independent. If want to learn more about the algorithm, take a look at the open-access paper.
If you want to run the code in this notebook, you’ll need to install DASH props. This requires adding some packages to your conda environment. There’s an environment file in the DASH-tree repo that you can use, but if you already have a working RDKit environment, I think this is sufficient:
An aside: the way these properties are stored in the SD file is very easy to parse (The format is actually the result of some discussions I had with a couple of cheminformatics software vendors back in 2019; I think I ended up being the only one to implement what we had discussed), so adding support for this format to other software or toolkits that can already read SD files wouldn’t be much of a lift. If you’re a developer and have questions, please just ask.
from rdkit import Chemfrom rdkit.Chem import Drawimport rdkitprint(rdkit.__version__)# import the required stuff from DASH:from serenityff.charge.tree.dash_tree import DASHTree, TreeTypefrom serenityff.charge.data import dash_props_tree_path
2025.03.4
# Load the property tree.# Note, that the files will be automatically downloaded the first time the tree is loaded from the ETHZ Research Collection.tree = DASHTree(tree_folder_path=dash_props_tree_path, tree_type=TreeType.FULL)
Loading DASH tree data
Loaded 122 trees and data
Construct a sample molecule and compute the charges:
for atom in esomep_h.GetAtoms(): atom.SetDoubleProp("DASH_AM1BCC_CHARGES", charges[atom.GetIdx()])
# this is the one extra call you have to make in order to # convert the atom properties to a molecule property which will # be written to the SD file:Chem.CreateAtomDoublePropertyList(esomep_h,'DASH_AM1BCC_CHARGES')# write the SD data to a string so that we can look at it:from io import StringIOsio = StringIO()with Chem.SDWriter(sio) as w: w.write(esomep_h)sdf = sio.getvalue()print(sdf)
You can see that the property atom.dprop.DASH_AM1BCC_CHARGES has been added to the SD string. Also that we’re writing way too many sig figs… it’s probably worth adding an option to control that.
When that SD string is parsed, the atom properties are extracted automatically and assigned to the atoms:
suppl = Chem.SDMolSupplier()suppl.SetData(sdf,removeHs=False)m =next(suppl)for atom in m.GetAtoms():print(atom.GetIdx(), atom.GetDoubleProp("DASH_AM1BCC_CHARGES"))