# ETKDG and distance constraints

Biasing the conformational search

The RDKit's conformer generator allows you to provide distance "constraints" to bias the conformers which it produces. Last week I wondered how those constraints interact with the terms which the ETKDG algorithm adds to the "distance geometry force field".

This post uses a simple example to explore that interaction

See another recent blog post for an overview of how the conformer generator works.

```
from rdkit import Chem
from rdkit.Chem import rdDistGeom
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdMolTransforms
IPythonConsole.ipython_3d = True
from rdkit.Chem import Draw
import rdkit
print(rdkit.__version__)
%pylab inline
```

Here's the molecule we'll use:

```
m = Chem.AddHs(Chem.MolFromSmiles('OCCCCCCCN'))
```

```
from rdkit.Chem import rdDepictor
m2d = Chem.Mol(m)
rdDepictor.Compute2DCoords(m2d)
IPythonConsole.drawOptions.addAtomIndices = True
m2d
```

Get the bounds matrix for the molecule and look at the min/max values allowed for the O-N distance:

```
bounds = rdDistGeom.GetMoleculeBoundsMatrix(m)
bounds[8,0],bounds[0,8]
```

Let's generate a bunch of conformers and look at the distribution of O-N distances:

```
figsize(6,6)
ps = rdDistGeom.ETKDGv3()
ps.randomSeed = 0xf00d
cids = rdDistGeom.EmbedMultipleConfs(m,500,ps)
dists_etkdg = [rdMolTransforms.GetBondLength(conf,0,8) for conf in m.GetConformers()]
hist(dists_etkdg,bins=20);
title('ETKDG');
xlabel('O--N distance');
```

Look at one conformer:

```
print(rdMolTransforms.GetBondLength(m.GetConformer(cids[0]),0,8) )
IPythonConsole.drawMol3D(m,confId=cids[1])
```

Compare the distribution we get doing plain DG:

```
figsize(6,6)
ps = rdDistGeom.EmbedParameters()
ps.useExpTorsionAnglePrefs = False
ps.useBasicKnowledge = False
ps.randomSeed = 0xf00d
cids = rdDistGeom.EmbedMultipleConfs(m,500,ps)
dists = [rdMolTransforms.GetBondLength(conf,0,8) for conf in m.GetConformers()]
hist(dists,bins=20);
title('DG');
xlabel('O--N distance');
```

There's not a giant difference, but it does look like the DG conformers for this molecule tend to be more extended: the O and N tend to be farther away from each other.

Here's how we can modify the bounds matrix to bring the O and N closer together:

```
bounds[0,8] = 4.1
bounds[8,0] = 4.0
from rdkit import DistanceGeometry
DistanceGeometry.DoTriangleSmoothing(bounds)
```

Start with using this bounds matrix together with plain DG:

```
figsize(6,6)
ps = rdDistGeom.EmbedParameters()
ps.useExpTorsionAnglePrefs = False
ps.useBasicKnowledge = False
ps.randomSeed = 0xf00d
ps.SetBoundsMat(bounds)
cids = rdDistGeom.EmbedMultipleConfs(m,500,ps)
dists = [rdMolTransforms.GetBondLength(conf,0,8) for conf in m.GetConformers()]
hist(dists,bins=20);
title('DG, distance constraints');
xlabel('O--N distance');
```

```
print(rdMolTransforms.GetBondLength(m.GetConformer(cids[0]),0,8) )
IPythonConsole.drawMol3D(m,confId=cids[0])
```

When we do ETKDG we add additional terms to the force field that's used to optimize the structure. Do these override our distance constraints?

```
figsize(6,6)
ps = rdDistGeom.ETKDGv3()
ps.randomSeed = 0xf00d
ps.SetBoundsMat(bounds)
cids = rdDistGeom.EmbedMultipleConfs(m,500,ps)
dists = [rdMolTransforms.GetBondLength(conf,0,8) for conf in m.GetConformers()]
hist(dists,bins=20);
title('ETKDG with constraints');
xlabel('O--N distance');
```

Most of the distances are longer than what we were looking for, but they are still considerably shorter than what we saw before:

```
figsize(9,6)
ps = rdDistGeom.ETKDGv3()
ps.randomSeed = 0xf00d
ps.SetBoundsMat(bounds)
cids = rdDistGeom.EmbedMultipleConfs(m,500,ps)
dists = [rdMolTransforms.GetBondLength(conf,0,8) for conf in m.GetConformers()]
hist(dists,bins=20,label='constraints');
title('ETKDG');
hist(dists_etkdg,bins=20,label='no constraints');
legend();
xlabel('O--N distance');
```

So that answers our original question: the "constraints" we place on the conformers by modifying the bounds matrix aren't strict, so the additional terms added by ETKDG can result in them being violated. But the results are still significant biased towards the region of conformer space we wanted to explore.

Let's try forcing conformations which have distances consistent with an intra-molecular hydrogen bond. Here we need to modify the bounds matrix elements between both the O and the N as well as the O and one of the Hs attached to the N. If we don't adjust the O-N distance bounds too we end up with a bounds matrix which cannot be smoothed.

```
bounds = rdDistGeom.GetMoleculeBoundsMatrix(m)
bounds[0,25] = 1.9
bounds[25,0] = 1.8
bounds[0,8] = 3.2
bounds[8,0] = 2.9
from rdkit import DistanceGeometry
DistanceGeometry.DoTriangleSmoothing(bounds)
```

```
figsize(9,6)
ps = rdDistGeom.EmbedParameters()
ps.useExpTorsionAnglePrefs = False
ps.useBasicKnowledge = False
ps.randomSeed = 0xf00d
ps.SetBoundsMat(bounds)
cids = rdDistGeom.EmbedMultipleConfs(m,500,ps)
dists = [rdMolTransforms.GetBondLength(conf,0,25) for conf in m.GetConformers()]
hist(dists,bins=20);
xlabel('O--H-N distance');
```

```
print(rdMolTransforms.GetBondLength(m.GetConformer(cids[0]),0,25) )
IPythonConsole.drawMol3D(m,confId=cids[0])
```

Try using ETKDG:

```
figsize(6,6)
ps = rdDistGeom.ETKDGv3()
ps.randomSeed = 0xf00d
ps.SetBoundsMat(bounds)
cids = rdDistGeom.EmbedMultipleConfs(m,500,ps)
dists = [rdMolTransforms.GetBondLength(conf,0,25) for conf in m.GetConformers()]
hist(dists,bins=20);
title('ETKDG, with constraints');
xlabel('O--H-N distance');
```

```
print(rdMolTransforms.GetBondLength(m.GetConformer(cids[0]),0,25) )
IPythonConsole.drawMol3D(m,confId=cids[0])
```

The O--H-N distances here aren't completely obeying the 1.8-1.9 distance bounds we imposed, but they seem to match a bit better than what we saw above when we constrained the O--N distance. I think that's likely because now we have an additional constraining term - the O--H distance as well as the O--N distance - to help override the ETKDG preferences.

So to repeat the conclusion: modifying the distance bounds matrix doesn't act as a hard constraint when we include ETKDG terms in the conformer generation process, but it definitely biases the results towards the areas of conformer space which we were trying to access.