<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>RDKit blog</title>
<link>https://greglandrum.github.io/rdkit-blog/</link>
<atom:link href="https://greglandrum.github.io/rdkit-blog/index.xml" rel="self" type="application/rss+xml"/>
<description>RDKit experiments, tips, and tutorials</description>
<generator>quarto-1.6.43</generator>
<lastBuildDate>Fri, 27 Feb 2026 23:00:00 GMT</lastBuildDate>
<item>
  <title>GPU-Accelerated Clustering with nvMolKit</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2026-02-28-nvmolkit-clustering.html</link>
  <description><![CDATA[ 




<section id="gpu-accelerated-molecular-clustering-with-nvmolkit" class="level1">
<h1>GPU-Accelerated Molecular Clustering with nvMolKit</h1>
<p><strong>Note</strong>: This is a guest post written by Kevin Boyd at NVIDIA.</p>
<p><a href="https://github.com/nvidia-digital-bio/nvMolKit">nvMolKit</a> is an open-source library that provides GPU-accelerated implementations of common RDKit cheminformatics operations. The APIs closely mirror RDKit’s, with batching as needed for GPU efficiency, making it easy to drop nvMolKit into existing workflows while achieving significant speedups on large datasets.</p>
<p>As of January 2026, nvMolKit v0.3 supports: - Batch Morgan fingerprint calculation - Many to many Tanimoto and Cosine similarity calculations - Butina clustering - Batch ETKDG conformer generation - Batch MMFF force field optimization</p>
<p>In this post, we’ll demonstrate nvMolKit’s capabilities through a molecular clustering workflow. We’ll compute Morgan fingerprints, calculate pairwise Tanimoto similarities, and perform Butina clustering, with both the RDKit and nvMolKit APIs.</p>
<section id="installation" class="level2">
<h2 class="anchored" data-anchor-id="installation">Installation</h2>
<p>nvMolKit is available on conda-forge alongside RDKit. nvMolKit requires an NVIDIA GPU with compute capability 7.0 (V100) or higher. Some nvMolKit results are on the GPU, a CUDA-compatible torch installation can be used to interpret results.</p>
<div id="cell-3" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>conda install <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>c conda<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>forge rdkit nvmolkit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>q</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.
</code></pre>
</div>
</div>
<div id="cell-4" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb3-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> torch</span>
<span id="cb3-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">assert</span> torch.cuda.is_available(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CUDA is required for nvMolKit"</span></span></code></pre></div>
</div>
</section>
<section id="configuration" class="level2">
<h2 class="anchored" data-anchor-id="configuration">Configuration</h2>
<p>We’ll process 20,000 molecules from Enamine REAL for this comparison.</p>
<div id="cell-6" class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Number of molecules to process</span></span>
<span id="cb4-2">n_mols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20000</span></span>
<span id="cb4-3"></span>
<span id="cb4-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># CPU threads for multithreaded RDKit operations</span></span>
<span id="cb4-5">n_cpu_threads <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span></span>
<span id="cb4-6"></span>
<span id="cb4-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Morgan fingerprint parameters</span></span>
<span id="cb4-8">fp_radius <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span>
<span id="cb4-9">fp_nbits <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1024</span></span>
<span id="cb4-10"></span>
<span id="cb4-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Butina clustering threshold</span></span>
<span id="cb4-12">distance_threshold <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span></span></code></pre></div>
</div>
</section>
<section id="loading-molecules" class="level2">
<h2 class="anchored" data-anchor-id="loading-molecules">Loading Molecules</h2>
<p>First, we’ll parse molecules from SMILES using RDKit. We’ll take the first molecules from the <a href="https://enamine.net/compound-collections/real-compounds/real-database-subsets">Enamine Real 10.4M sample</a></p>
<div id="cell-8" class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb5-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> MolFromSmiles</span>
<span id="cb5-3"></span>
<span id="cb5-4">smi_file <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"enamine_real_10M.csxmiles"</span></span>
<span id="cb5-5"></span>
<span id="cb5-6">smis <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(smi_file, nrows<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>n_mols, usecols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>], sep<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>).iloc[:, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].to_list()</span>
<span id="cb5-7">mols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [MolFromSmiles(smi) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> smi <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> smis]</span>
<span id="cb5-8">mols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [mol <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> mol <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> mols <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> mol]  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove any parse failures</span></span>
<span id="cb5-9">n_mols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(mols)</span>
<span id="cb5-10"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Parsed </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>n_mols<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> molecules"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Parsed 20000 molecules</code></pre>
</div>
</div>
</section>
<section id="workflow-description" class="level2">
<h2 class="anchored" data-anchor-id="workflow-description">Workflow Description</h2>
<p>This example workflow consists of 3 steps: 1. <strong>Fingerprinting</strong>: Generate Morgan fingerprints 2. <strong>Similarity</strong>: Compute pairwise Tanimoto distances 3. <strong>Clustering</strong>: Run Butina algorithm on the distance matrix</p>
</section>
<section id="rdkit-workflow" class="level2">
<h2 class="anchored" data-anchor-id="rdkit-workflow">RDKit Workflow</h2>
<div id="cell-11" class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> time</span>
<span id="cb7-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdFingerprintGenerator</span>
<span id="cb7-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.DataStructs <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> BulkTanimotoSimilarity</span>
<span id="cb7-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.ML.Cluster.Butina <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ClusterData</span>
<span id="cb7-5"></span>
<span id="cb7-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: Fingerprinting</span></span>
<span id="cb7-7">t_start <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time()</span>
<span id="cb7-8">generator <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetMorganGenerator(radius<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>fp_radius, fpSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>fp_nbits)</span>
<span id="cb7-9">fps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> generator.GetFingerprints(mols, numThreads<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>n_cpu_threads)</span>
<span id="cb7-10">t_fp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> t_start</span>
<span id="cb7-11"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Fingerprinting: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>t_fp<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">s"</span>)</span>
<span id="cb7-12"></span>
<span id="cb7-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: Pairwise Tanimoto distances</span></span>
<span id="cb7-14">t_start <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time()</span>
<span id="cb7-15">distances <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb7-16"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(mols)):</span>
<span id="cb7-17">    distances.extend(BulkTanimotoSimilarity(fps[i], fps[:i], returnDistance<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>))</span>
<span id="cb7-18">t_sim <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> t_start</span>
<span id="cb7-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Similarity matrix: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>t_sim<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">s"</span>)</span>
<span id="cb7-20"></span>
<span id="cb7-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: Butina clustering</span></span>
<span id="cb7-22">t_start <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time()</span>
<span id="cb7-23">rdkit_clusters <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ClusterData(</span>
<span id="cb7-24">    np.array(distances),</span>
<span id="cb7-25">    n_mols,</span>
<span id="cb7-26">    distance_threshold,</span>
<span id="cb7-27">    isDistData<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb7-28">    distFunc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>,</span>
<span id="cb7-29">    reordering<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb7-30">)</span>
<span id="cb7-31">t_clust <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> t_start</span>
<span id="cb7-32"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Butina clustering: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>t_clust<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">s"</span>)</span>
<span id="cb7-33"></span>
<span id="cb7-34">rdkit_total_time <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> t_fp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> t_sim <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> t_clust</span>
<span id="cb7-35"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Total RDKit time: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>rdkit_total_time<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">s"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Fingerprinting: 0.04s
Similarity matrix: 11.17s
Butina clustering: 7.65s

Total RDKit time: 18.85s</code></pre>
</div>
</div>
</section>
<section id="nvmolkit-workflow" class="level2">
<h2 class="anchored" data-anchor-id="nvmolkit-workflow">nvMolKit Workflow</h2>
<p>Note that we add synchronizes to delimit individual timings, but the entire workflow can be run asynchronously with a single sync at the end</p>
<div id="cell-13" class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> nvmolkit.fingerprints <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> MorganFingerprintGenerator</span>
<span id="cb9-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> nvmolkit.similarity <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> crossTanimotoSimilarity</span>
<span id="cb9-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> nvmolkit.clustering <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> butina</span>
<span id="cb9-4"></span>
<span id="cb9-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: Fingerprinting</span></span>
<span id="cb9-6">t_start <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time()</span>
<span id="cb9-7">nvmolkit_fpgen <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> MorganFingerprintGenerator(radius<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>fp_radius, fpSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>fp_nbits)</span>
<span id="cb9-8">nvmolkit_fps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> nvmolkit_fpgen.GetFingerprints(mols, num_threads<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>n_cpu_threads)</span>
<span id="cb9-9">torch.cuda.synchronize()</span>
<span id="cb9-10">t_fp_nv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> t_start</span>
<span id="cb9-11"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Fingerprinting: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>t_fp_nv<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">s"</span>)</span>
<span id="cb9-12"></span>
<span id="cb9-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: Pairwise Tanimoto similarity -&gt; distances</span></span>
<span id="cb9-14">t_start <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time()</span>
<span id="cb9-15">nvmolkit_similarities <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> crossTanimotoSimilarity(nvmolkit_fps)</span>
<span id="cb9-16">torch.cuda.synchronize()</span>
<span id="cb9-17">t_sim_nv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> t_start</span>
<span id="cb9-18"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Similarity matrix: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>t_sim_nv<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">s"</span>)</span>
<span id="cb9-19"></span>
<span id="cb9-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: Butina clustering</span></span>
<span id="cb9-21">t_start <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time()</span>
<span id="cb9-22">nvmolkit_distances <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> nvmolkit_similarities.torch()</span>
<span id="cb9-23">cluster_ids <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> butina(nvmolkit_distances, distance_threshold).torch()</span>
<span id="cb9-24">torch.cuda.synchronize()</span>
<span id="cb9-25">t_clust_nv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> t_start</span>
<span id="cb9-26"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Butina clustering: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>t_clust_nv<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">s"</span>)</span>
<span id="cb9-27"></span>
<span id="cb9-28">nvmolkit_total_time <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> t_fp_nv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> t_sim_nv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> t_clust_nv</span>
<span id="cb9-29"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Total nvMolKit time: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nvmolkit_total_time<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">s"</span>)</span>
<span id="cb9-30"></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Fingerprinting: 0.05s
Similarity matrix: 0.02s
Butina clustering: 0.02s

Total nvMolKit time: 0.09s</code></pre>
</div>
</div>
</section>
<section id="performance-comparison---rtx-5080-gpu-vs-ryzen-9-9950x" class="level2">
<h2 class="anchored" data-anchor-id="performance-comparison---rtx-5080-gpu-vs-ryzen-9-9950x">Performance Comparison - RTX 5080 GPU vs Ryzen 9 9950X</h2>
<p>Let’s visualize the speedup for each step of the workflow.</p>
<div id="cell-15" class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb11-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb11-3"></span>
<span id="cb11-4">steps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Fingerprinting'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Similarity'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Clustering'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Total'</span>]</span>
<span id="cb11-5">rdkit_times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [t_fp, t_sim, t_clust, rdkit_total_time]</span>
<span id="cb11-6">nvmolkit_times <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [t_fp_nv, t_sim_nv, t_clust_nv, nvmolkit_total_time]</span>
<span id="cb11-7"></span>
<span id="cb11-8">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.arange(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(steps))</span>
<span id="cb11-9">width <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.35</span></span>
<span id="cb11-10"></span>
<span id="cb11-11">fig, (ax1, ax2) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplots(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb11-12"></span>
<span id="cb11-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Time comparison</span></span>
<span id="cb11-14">bars1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ax1.bar(x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> width<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, rdkit_times, width, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'RDKit'</span>, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#1f77b4'</span>)</span>
<span id="cb11-15">bars2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ax1.bar(x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> width<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, nvmolkit_times, width, label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'nvMolKit'</span>, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#2ca02c'</span>)</span>
<span id="cb11-16">ax1.set_ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Time (seconds)'</span>)</span>
<span id="cb11-17">ax1.set_title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Execution Time by Step'</span>)</span>
<span id="cb11-18">ax1.set_xticks(x)</span>
<span id="cb11-19">ax1.set_xticklabels(steps)</span>
<span id="cb11-20">ax1.legend()</span>
<span id="cb11-21">ax1.set_yscale(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'log'</span>)</span>
<span id="cb11-22">ax1.grid(axis<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb11-23"></span>
<span id="cb11-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Speedup</span></span>
<span id="cb11-25">speedups <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [r<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>n <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r, n <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(rdkit_times, nvmolkit_times)]</span>
<span id="cb11-26">colors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#ff7f0e'</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#d62728'</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> s <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> speedups]</span>
<span id="cb11-27">bars3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ax2.bar(steps, speedups, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>colors)</span>
<span id="cb11-28">ax2.axhline(y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'gray'</span>, linestyle<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'--'</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>)</span>
<span id="cb11-29">ax2.set_ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Speedup (RDKit time / nvMolKit time)'</span>)</span>
<span id="cb11-30">ax2.set_title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'nvMolKit Speedup vs RDKit'</span>)</span>
<span id="cb11-31">ax2.grid(axis<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'y'</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)</span>
<span id="cb11-32"></span>
<span id="cb11-33"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add speedup labels</span></span>
<span id="cb11-34"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> bar, speedup <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(bars3, speedups):</span>
<span id="cb11-35">    ax2.text(bar.get_x() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> bar.get_width()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, bar.get_height() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, </span>
<span id="cb11-36">             <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>speedup<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">x'</span>, ha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'center'</span>, va<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bottom'</span>, fontweight<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bold'</span>)</span>
<span id="cb11-37"></span>
<span id="cb11-38">plt.tight_layout()</span>
<span id="cb11-39">plt.show()</span>
<span id="cb11-40"></span>
<span id="cb11-41"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">Overall speedup: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>rdkit_total_time <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> nvmolkit_total_time<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">x"</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2026-02-28-nvmolkit-clustering_files/figure-html/cell-8-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>
Overall speedup: 218.8x</code></pre>
</div>
</div>
<p>RDKit fingerprinting benefits from being the lowest compute workload and from native RDKit multithreading. On this hardware, the performance is about equivalent. We get large speedups from putting similarity and clustering on the GPU. On NVIDIA datacenter GPUs, speedups can be well over 1000x.</p>
</section>
<section id="comparing-clustering-results" class="level2">
<h2 class="anchored" data-anchor-id="comparing-clustering-results">Comparing Clustering Results</h2>
<p>nvMolKit clusters may or may not be identical to RDKit, but are valid Butina clusters. Potential differences can happen when there are multiple candidate clusters of the same size.</p>
<div id="cell-18" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Convert cluster IDs to cluster tuples for comparison</span></span>
<span id="cb13-2">n_clusters <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cluster_ids.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>().item() <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb13-3">nvmolkit_clusters <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">tuple</span>(torch.where(cluster_ids <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> i)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>].tolist()) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n_clusters)]</span>
<span id="cb13-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Number of clusters: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(nvmolkit_clusters)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb13-5"></span>
<span id="cb13-6">rdkit_cluster_sizes <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>([<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(c) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> rdkit_clusters], reverse<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb13-7">nvmolkit_cluster_sizes <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>([<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(c) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> nvmolkit_clusters], reverse<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb13-8"></span>
<span id="cb13-9">plt.figure()</span>
<span id="cb13-10">plt.plot(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(rdkit_cluster_sizes)), rdkit_cluster_sizes, </span>
<span id="cb13-11">         label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'RDKit'</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.5</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>)</span>
<span id="cb13-12">plt.plot(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(nvmolkit_cluster_sizes)), nvmolkit_cluster_sizes, </span>
<span id="cb13-13">         label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'nvMolKit'</span>, linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.5</span>, linestyle<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'--'</span>, alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>)</span>
<span id="cb13-14">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Cluster Rank'</span>)</span>
<span id="cb13-15">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Cluster Size (molecules)'</span>)</span>
<span id="cb13-16">plt.title(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Cluster Size Distribution'</span>)</span>
<span id="cb13-17">plt.legend()</span>
<span id="cb13-18">plt.ylim(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(rdkit_cluster_sizes) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.05</span>)</span>
<span id="cb13-19"></span>
<span id="cb13-20">plt.tight_layout()</span>
<span id="cb13-21">plt.show()</span>
<span id="cb13-22"></span>
<span id="cb13-23"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"RDKit: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(rdkit_clusters)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> clusters, largest has </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(rdkit_cluster_sizes)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> molecules"</span>)</span>
<span id="cb13-24"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"nvMolKit: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(nvmolkit_clusters)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> clusters, largest has </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(nvmolkit_cluster_sizes)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> molecules"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Number of clusters: 14164</code></pre>
</div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2026-02-28-nvmolkit-clustering_files/figure-html/cell-9-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>RDKit: 14162 clusters, largest has 14 molecules
nvMolKit: 14164 clusters, largest has 14 molecules</code></pre>
</div>
</div>
</section>
<section id="links" class="level2">
<h2 class="anchored" data-anchor-id="links">Links</h2>
<p>For more information, see the <a href="https://nvidia-digital-bio.github.io/nvMolKit/">nvMolKit documentation</a> and <a href="https://github.com/nvidia-digital-bio/nvMolKit">GitHub repository</a>.</p>


</section>
</section>

 ]]></description>
  <category>clustering</category>
  <category>guest post</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2026-02-28-nvmolkit-clustering.html</guid>
  <pubDate>Fri, 27 Feb 2026 23:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/nvmolkit-clustering-1.png" medium="image" type="image/png" height="51" width="144"/>
</item>
<item>
  <title>Creating a patent data set</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2026-01-24-creating-a-patent-dataset.html</link>
  <description><![CDATA[ 




<p>For a while I’ve been meaning to put together a collection of sets of related compounds for use in teaching and some other experiments. I’ll do that by taking compounds from patent data found in ChEMBL.</p>
<blockquote class="blockquote">
<p>I could also have worked with SureChEMBL for this, but ChEMBL is easier for me to work with (since I always have a copy of ChEMBL arounnd) and contains enough patent data for my purposes. I will probably also take advantage of the fact that there is bioactivity data associated with these compounds in ChEMBL at some point in the future.</p>
</blockquote>
<p>In this post I put together the queries, grab the data, and do a bit of initial exploration. I’ll do more with these in the future</p>
<p>If you’re interested in the data sets themselves, you can find them <a href="https://github.com/greglandrum/rdkit_blog/tree/master/data/patent_datasets">in the source blog repo</a>.</p>
<div id="dd1778d8-3622-484a-b83b-9179fa0a82ae" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> matplotlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb1-7">plt.style.use(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tableau-colorblind10'</span>)</span>
<span id="cb1-8"></span>
<span id="cb1-9"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>load_ext sql</span>
<span id="cb1-10"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span>
<span id="cb1-11"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.feedback<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span></code></pre></div>
</div>
<section id="creating-the-data-sets" class="level1">
<h1>Creating the data sets</h1>
<p>Look at the number of compounds in the ChEMBL patents:</p>
<div id="d4a20979-3d5a-4fd8-9e2d-31dd4069e558" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-2">  select patent_id,chembl_id,count(distinct molregno) mrn_count <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> docs join compound_records using (doc_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-3">   where patent_id <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> null <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-4">   group by (patent_id,chembl_id)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb2-5">   order by mrn_count desc limit <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> </span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">patent_id</th>
<th data-quarto-table-cell-role="th">chembl_id</th>
<th data-quarto-table-cell-role="th">mrn_count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>US-9718790-B2</td>
<td>CHEMBL5725546</td>
<td>2236</td>
</tr>
<tr class="even">
<td>US-20130252896-A1</td>
<td>CHEMBL4420062</td>
<td>1901</td>
</tr>
<tr class="odd">
<td>US-11286268-B1</td>
<td>CHEMBL5729040</td>
<td>1775</td>
</tr>
<tr class="even">
<td>US-8466108-B2</td>
<td>CHEMBL4419170</td>
<td>1689</td>
</tr>
<tr class="odd">
<td>US-11618753-B2</td>
<td>CHEMBL5729657</td>
<td>1530</td>
</tr>
<tr class="even">
<td>US-9302989-B2</td>
<td>CHEMBL3886617</td>
<td>1518</td>
</tr>
<tr class="odd">
<td>US-9796708-B2</td>
<td>CHEMBL5726550</td>
<td>1352</td>
</tr>
<tr class="even">
<td>US-11124486-B2</td>
<td>CHEMBL5728711</td>
<td>1227</td>
</tr>
<tr class="odd">
<td>US-10774051-B2</td>
<td>CHEMBL5727936</td>
<td>1204</td>
</tr>
<tr class="even">
<td>US-9303033-B2</td>
<td>CHEMBL3886195</td>
<td>1175</td>
</tr>
</tbody>
</table>

<span style="font-style:italic;text-align:center;">Truncated to <a href="https://jupysql.ploomber.io/en/latest/api/configuration.html#displaylimit">displaylimit</a> of 10.</span>
</div>
</div>
<p>I’m not looking for patents with that many compounds… let’s limit it to between 50 and 300:</p>
<div id="a3c80337-ae7e-4783-8612-67deba46f495" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb3-2">  select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> (select patent_id,chembl_id,count(distinct molregno) mrn_count<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb3-3">                        <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> docs join compound_records using (doc_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb3-4">                        where patent_id <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> null <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb3-5">                        group by (patent_id,chembl_id)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb3-6">                        order by mrn_count desc) tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb3-7">    where mrn_count<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> mrn_count<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> </span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>2721</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>That’s a lot</p>
<p>Create a temporary table with the count of the number of unique compounds in each of the patents that contains between 50 and 300 compounds.</p>
<div id="93182bbd-0e5d-498c-8b86-d1536f33460b" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-2">  drop table <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> exists patents</span>
<span id="cb4-3"></span>
<span id="cb4-4"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-5">  select <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> into temporary table patents <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-6">    (select patent_id,chembl_id,doc_id,count(distinct molregno) mrn_count<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-7">     <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> docs join compound_records using (doc_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-8">     where patent_id <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> null <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-9">     group by (patent_id,chembl_id,doc_id)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-10">     order by mrn_count desc) tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-11">  where mrn_count<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> mrn_count<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<table class="caption-top" data-quarto-postprocess="true">
<tbody>
</tbody>
</table>
</div>
</div>
<div id="56d10d21-b7c9-4857-b3af-be2ab6cf6fe2" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb5-2">   select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patents<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>2721</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="d49004bb-f55d-47c4-9f93-5f31ae1195fd" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb6-2">   select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> (select distinct on (doc_id) doc_id <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patents) tmp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>2721</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>For each of those patents, get the counts of the number of activity values for each single-protein assay in the patent. Also pulls info about the assay itself</p>
<div id="2a3477cf-5d0b-4b5b-b0d0-056b974a9916" class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-2">  drop table <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> exists patent_all_targets</span>
<span id="cb7-3"></span>
<span id="cb7-4"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-5">   select assays.chembl_id assay_chembl_id,target_dictionary.chembl_id target_chembl_id,<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-6">          pref_name,target_type,count(distinct molregno) act_count, tid, assay_id,assays.doc_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-7">   into temporary table patent_all_targets <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-8">   <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patents join assays using (doc_id) join activities using (assay_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-9">       join target_dictionary using (tid) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-10">   where pchembl_value <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> null <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> target_type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'SINGLE PROTEIN'</span>\</span>
<span id="cb7-11">   group by (assays.chembl_id,target_dictionary.chembl_id,pref_name,target_type,tid, assay_id,assays.doc_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-12">   order by act_count desc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<table class="caption-top" data-quarto-postprocess="true">
<tbody>
</tbody>
</table>
</div>
</div>
<div id="71941a37-252c-4932-9796-e4080c7f2824" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb8-2">   select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> (select distinct on (doc_id) doc_id <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_all_targets) tmp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="12">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>2334</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Get the most populated target for each patent. We will call this the target for the patent (though this is clearly a crude assumption).</p>
<p>The <code>distinct on (doc_id)</code> thing combined with the <code>order by doc_id, act_count desc</code> is the way to get the row for each <code>doc_id</code> that has the highest <code>act_count</code> in postgresql.</p>
<div id="1b7f04b2-cad8-4d12-98fe-c70c36b910d3" class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb9-2">  drop table <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> exists patent_targets</span>
<span id="cb9-3"></span>
<span id="cb9-4"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb9-5">   select distinct on (doc_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb9-6">      assay_chembl_id,target_chembl_id,pref_name,act_count,tid,assay_id,doc_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb9-7">    into temporary table patent_targets <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb9-8">    <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_all_targets order by doc_id, act_count desc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="13">
<table class="caption-top" data-quarto-postprocess="true">
<tbody>
</tbody>
</table>
</div>
</div>
<div id="ab047c38-2608-4342-bdbf-2edd2ca34eeb" class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql select count(distinct target_chembl_id) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_targets<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="14">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>641</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Find the most popular targets and the number of patents assigned to each:</p>
<div id="0272f75f-49c7-4ecd-a85f-9b04535b330c" class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-2"> select target_chembl_id,pref_name,count(distinct doc_id) num_patents,tid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-3">  <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_targets <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-4">  group by (target_chembl_id,pref_name,tid) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-5">  order by num_patents desc limit <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="15">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">target_chembl_id</th>
<th data-quarto-table-cell-role="th">pref_name</th>
<th data-quarto-table-cell-role="th">num_patents</th>
<th data-quarto-table-cell-role="th">tid</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>CHEMBL5251</td>
<td>Tyrosine-protein kinase BTK</td>
<td>61</td>
<td>100097</td>
</tr>
<tr class="even">
<td>CHEMBL6136</td>
<td>Lysine-specific histone demethylase 1A</td>
<td>56</td>
<td>102439</td>
</tr>
<tr class="odd">
<td>CHEMBL1741186</td>
<td>Nuclear receptor ROR-gamma</td>
<td>40</td>
<td>103982</td>
</tr>
<tr class="even">
<td>CHEMBL2000</td>
<td>Plasma kallikrein</td>
<td>32</td>
<td>23</td>
</tr>
<tr class="odd">
<td>CHEMBL1163125</td>
<td>Bromodomain-containing protein 4</td>
<td>30</td>
<td>103454</td>
</tr>
<tr class="even">
<td>CHEMBL4409</td>
<td>cAMP and cAMP-inhibited cGMP 3',5'-cyclic phosphodiesterase 10A</td>
<td>29</td>
<td>100010</td>
</tr>
<tr class="odd">
<td>CHEMBL2815</td>
<td>High affinity nerve growth factor receptor</td>
<td>28</td>
<td>11902</td>
</tr>
<tr class="even">
<td>CHEMBL3130</td>
<td>Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform</td>
<td>26</td>
<td>11177</td>
</tr>
<tr class="odd">
<td>CHEMBL6175</td>
<td>Lysine-specific demethylase 4C</td>
<td>25</td>
<td>102452</td>
</tr>
<tr class="even">
<td>CHEMBL2973</td>
<td>Rho-associated protein kinase 2</td>
<td>24</td>
<td>11149</td>
</tr>
</tbody>
</table>

<span style="font-style:italic;text-align:center;">Truncated to <a href="https://jupysql.ploomber.io/en/latest/api/configuration.html#displaylimit">displaylimit</a> of 10.</span>
</div>
</div>
<div id="52419a11-5ec1-4c18-aaad-571a3cbf38c1" class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-2"> select target_chembl_id,num_patents <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-3">    (select target_chembl_id,pref_name,count(distinct doc_id) num_patents,tid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-4">     <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_targets <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-5">     group by (target_chembl_id,pref_name,tid) ) tmp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-6">  where num_patents <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-7">  order by num_patents desc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb12-8">d</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="17">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">target_chembl_id</th>
<th data-quarto-table-cell-role="th">num_patents</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>CHEMBL5251</td>
<td>61</td>
</tr>
<tr class="even">
<td>CHEMBL6136</td>
<td>56</td>
</tr>
<tr class="odd">
<td>CHEMBL1741186</td>
<td>40</td>
</tr>
<tr class="even">
<td>CHEMBL2000</td>
<td>32</td>
</tr>
<tr class="odd">
<td>CHEMBL1163125</td>
<td>30</td>
</tr>
<tr class="even">
<td>CHEMBL4409</td>
<td>29</td>
</tr>
<tr class="odd">
<td>CHEMBL2815</td>
<td>28</td>
</tr>
<tr class="even">
<td>CHEMBL3130</td>
<td>26</td>
</tr>
<tr class="odd">
<td>CHEMBL6175</td>
<td>25</td>
</tr>
<tr class="even">
<td>CHEMBL2973</td>
<td>24</td>
</tr>
</tbody>
</table>

<span style="font-style:italic;text-align:center;">Truncated to <a href="https://jupysql.ploomber.io/en/latest/api/configuration.html#displaylimit">displaylimit</a> of 10.</span>
</div>
</div>
<div id="67a56398" class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(d)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="18">
<pre><code>13</code></pre>
</div>
</div>
<p>Write out the compounds for those 13 targets:</p>
<div id="f0897e38-8968-4253-a04a-c509e7613888" class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.named_parameters<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"enabled"</span> </span>
<span id="cb15-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> cid,_ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d:</span>
<span id="cb15-3">    targetd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-4">      select patent_id, patents.chembl_id patent_chembl_id, chembl_id_lookup.chembl_id compound_chembl_id,<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-5">           canonical_smiles <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-6">        <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patents join patent_targets using (doc_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-7">         join compound_records using (doc_id) join molecule_hierarchy using (molregno) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-8">         join compound_structures cs on (cs.molregno<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>parent_molregno) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-9">         join chembl_id_lookup on (parent_molregno<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>entity_id <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> entity_type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'COMPOUND'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-10">        where target_chembl_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>:cid</span>
<span id="cb15-11">    </span>
<span id="cb15-12">    targetd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> targetd.DataFrame()</span>
<span id="cb15-13">    targetd.to_csv(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'../data/patent_datasets/target_</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>cid<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">.csv'</span>,index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
</div>
<div id="30b89512-899c-423a-91b5-d3b3ecebae79" class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>head ..<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>patent_datasets<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>target_CHEMBL1163125.csv</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>patent_id,patent_chembl_id,compound_chembl_id,canonical_smiles
US-8975417-B2,CHEMBL3638496,CHEMBL3650854,COc1ccc(Cn2nc3c(c2C)C(c2ccc(Cl)cc2)N(c2ccc(=O)n(C)c2)C3=O)cc1
US-8975417-B2,CHEMBL3638496,CHEMBL3650855,Cc1[nH]nc2c1C(c1ccc(Cl)cc1)N(c1ccc(=O)n(C)c1)C2=O
US-8975417-B2,CHEMBL3638496,CHEMBL3650856,Cc1c2c(nn1C)C(=O)N(c1ccc(=O)n(C)c1)C2c1ccc(Cl)cc1
US-8975417-B2,CHEMBL3638496,CHEMBL3650857,CC1=NN(C)C2C(=O)N(c3ccc(=O)n(C)c3)C(c3ccc(Cl)cc3)C12
US-8975417-B2,CHEMBL3638496,CHEMBL3650858,Cc1[nH]nc2c1C(c1ccc(Cl)cc1)N(c1cc(Cl)c(=O)n(C)c1)C2=O
US-8975417-B2,CHEMBL3638496,CHEMBL3650859,Cc1c2c(nn1C)C(=O)N(c1cc(Cl)c(=O)n(C)c1)C2c1ccc(Cl)cc1
US-8975417-B2,CHEMBL3638496,CHEMBL3650860,Cc1c2c(nn1C)C(=O)N(c1cc(Cl)c(=O)n(C)c1)[C@@H]2c1ccc(Cl)cc1
US-8975417-B2,CHEMBL3638496,CHEMBL3650861,Cc1c2c(nn1C)C(=O)N(c1cc(Cl)c(=O)n(C)c1)[C@H]2c1ccc(Cl)cc1
US-8975417-B2,CHEMBL3638496,CHEMBL3650862,Cn1cc(N2C(=O)c3n[nH]cc3C2c2ccc(Cl)cc2)cc(Cl)c1=O</code></pre>
</div>
</div>
</section>
<section id="looking-at-the-some-results-with-umap" class="level1">
<h1>Looking at the some results with UMAP</h1>
<p>Let’s do a bit of visualization workn by grabbing the data for a single target:</p>
<div id="b66661d6-1202-48dd-a17c-59299ede1574" class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1">target_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CHEMBL5251'</span></span>
<span id="cb18-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb18-3">  drop table <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> exists patent_compounds</span>
<span id="cb18-4"></span>
<span id="cb18-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb18-6">  select patent_id,patents.chembl_id patent_chembl_id,chembl_id_lookup.chembl_id compound_chembl_id,canonical_smiles <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb18-7">    into temporary table patent_compounds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb18-8">    <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patents join patent_targets using (doc_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb18-9">     join compound_records using (doc_id) join molecule_hierarchy using (molregno) join compound_structures cs on (cs.molregno<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>parent_molregno) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb18-10">     join chembl_id_lookup on (parent_molregno<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>entity_id <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> entity_type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'COMPOUND'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb18-11">    where target_chembl_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>:target_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="21">
<table class="caption-top" data-quarto-postprocess="true">
<tbody>
</tbody>
</table>
</div>
</div>
<div id="b0f878ad-d721-4a59-86b4-011e6286c373" class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_compounds</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="22">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>8450</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="ee898d7d-bff2-4c62-9912-2846a1c16802" class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql select <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_compounds limit <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="23">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">patent_id</th>
<th data-quarto-table-cell-role="th">patent_chembl_id</th>
<th data-quarto-table-cell-role="th">compound_chembl_id</th>
<th data-quarto-table-cell-role="th">canonical_smiles</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670078</td>
<td>Cc1c(-c2cc(Nc3cc4n(n3)CCN(C)C4)c(=O)[nH]n2)cccc1N1CCn2c(cc3c2CCCC3)C1=O</td>
</tr>
<tr class="even">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670079</td>
<td>Cc1c(-c2cc(Nc3cc4n(n3)CCN(C)C4)c(=O)[nH]n2)cccc1N1CCn2c(cc3ccccc32)C1=O</td>
</tr>
<tr class="odd">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670080</td>
<td>Cc1c(-c2cc(Nc3cc4n(n3)CCN(C)C4)c(=O)n(C)n2)cccc1N1Cc2c(sc3ccccc23)C1=O</td>
</tr>
<tr class="even">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670081</td>
<td>Cn1cc(-c2cccc(N3CCc4c(sc5c4CCCC5)C3=O)c2CO)cc(Nc2ccncn2)c1=O</td>
</tr>
<tr class="odd">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670082</td>
<td>Cn1cc(-c2cccc(N3CCc4c(sc5c4CC(C)(C)C5)C3=O)c2CO)cc(Nc2ccncn2)c1=O</td>
</tr>
<tr class="even">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670083</td>
<td>Cn1cc(-c2cccc(N3CCn4c(cc5c4CCCC5)C3=O)c2CO)cc(Nc2cc(C3CC3)n[nH]2)c1=O</td>
</tr>
<tr class="odd">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670084</td>
<td>Cn1cc(-c2cccc(N3CCn4c(cc5c4CCCC5)C3=O)c2CO)cc(Nc2ccc(N3CCOCC3)cn2)c1=O</td>
</tr>
<tr class="even">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670085</td>
<td>CC(=O)N1CCn2nc(Nc3cc(-c4cccc(N5CCn6c(cc7c6CCCC7)C5=O)c4CO)cn(C)c3=O)cc2C1</td>
</tr>
<tr class="odd">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670086</td>
<td>Cn1cc(-c2cccc(N3CCn4c(cc5c4CCCC5)C3=O)c2CO)cc(Nc2ccncn2)c1=O</td>
</tr>
<tr class="even">
<td>US-8618107-B2</td>
<td>CHEMBL3639120</td>
<td>CHEMBL3670087</td>
<td>Cn1cc(-c2cccc(N3CCn4c(cc5c4CCCC5)C3=O)c2CO)cc(Nc2cc3n(n2)CCOC3)c1=O</td>
</tr>
</tbody>
</table>

<span style="font-style:italic;text-align:center;">Truncated to <a href="https://jupysql.ploomber.io/en/latest/api/configuration.html#displaylimit">displaylimit</a> of 10.</span>
</div>
</div>
<p>Start with a subset of 12 patents, sorted by number of compounds they contain.</p>
<p>We will have a bunch of duplicate compounds. For this exercise, pick each compound only once:</p>
<div id="aaa3a574-cb85-4800-95d5-699d83877d81" class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb21-2">  select distinct on (compound_chembl_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_compounds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb21-3">   join (select patent_id <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_compounds group by patent_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb21-4">         order by count(distinct compound_chembl_id) desc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb21-5">         limit <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>) tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb21-6">   using (patent_id)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb21-7">   order by compound_chembl_id, patent_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
</div>
<div id="c165a996-148a-477a-98e4-6990d2d7b8d8" class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> d.DataFrame()</span>
<span id="cb22-2">df.head()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="25">
<div>


<table class="dataframe caption-top table table-sm table-striped small" data-quarto-postprocess="true" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">patent_id</th>
<th data-quarto-table-cell-role="th">patent_chembl_id</th>
<th data-quarto-table-cell-role="th">compound_chembl_id</th>
<th data-quarto-table-cell-role="th">canonical_smiles</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">0</td>
<td>US-10828300-B2</td>
<td>CHEMBL5728085</td>
<td>CHEMBL2216827</td>
<td>C=CC(=O)Nc1cccc(Nc2nc(Nc3ccc(Oc4ccnc(C(=O)NC)c...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">1</td>
<td>US-10828300-B2</td>
<td>CHEMBL5728085</td>
<td>CHEMBL3301625</td>
<td>C=CC(=O)Nc1cccc(Nc2nc(Nc3ccc(OCCOC)cc3)ncc2F)c1</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">2</td>
<td>US-9447106-B2</td>
<td>CHEMBL3886959</td>
<td>CHEMBL3889791</td>
<td>C=CC(=O)Nc1cc(C2CCNc3c(C(N)=O)c(-c4ccc(Oc5cccc...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">3</td>
<td>US-9447106-B2</td>
<td>CHEMBL3886959</td>
<td>CHEMBL3890068</td>
<td>C=CC(=O)Nc1ccccc1C1CCNc2c(C(N)=O)c(-c3ccc(OCCO...</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">4</td>
<td>US-9447106-B2</td>
<td>CHEMBL3886959</td>
<td>CHEMBL3890197</td>
<td>NC(=O)c1c(-c2ccc(Oc3ccccc3)cc2)nn2c1NC(=O)C21C...</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
<div id="db55f06c-655d-4155-973a-13abbb4d6366" class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1">df.shape,<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'canonical_smiles'</span>]))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="26">
<pre><code>((1656, 4), 1656)</code></pre>
</div>
</div>
<p>Add fingerprints:</p>
<div id="c96eef5f-73a6-431f-848e-d7cdd1ffe648" class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mol'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'canonical_smiles'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">apply</span>(Chem.MolFromSmiles)</span></code></pre></div>
</div>
<div id="9cd2793a-8b00-4bfc-948f-1c6a94ab66aa" class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdFingerprintGenerator</span>
<span id="cb26-2">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetMorganGenerator(radius<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,fpSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2048</span>)</span>
<span id="cb26-3">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cmfp3'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mol'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">apply</span>(fpg.GetCountFingerprintAsNumPy)</span></code></pre></div>
</div>
<p>Now do UMAP using the Dice similarity metric. I did a bit of exploration to come up with the values of <code>n_neighbors</code> and <code>min_dist</code> used here.</p>
<div id="f49dad34-0a1c-47f6-8af7-11a297484cc4" class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> umap</span>
<span id="cb27-2">fitter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> umap.UMAP(n_neighbors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>,min_dist<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span>,metric<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dice'</span>,n_jobs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb27-3">pts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fitter.fit_transform(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cmfp3'</span>].tolist())</span>
<span id="cb27-4">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'X'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pts[:,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb27-5">df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Y'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pts[:,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/home/glandrum/mambaforge/envs/rdkit_blog/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
  warnings.warn(
/home/glandrum/mambaforge/envs/rdkit_blog/lib/python3.12/site-packages/umap/umap_.py:1887: UserWarning: gradient function is not yet implemented for dice distance metric; inverse_transform will be unavailable
  warn(</code></pre>
</div>
</div>
<p>And do a scatter plot of the results:</p>
<div id="f8a3ffad-f6a2-4790-aad1-cb17fab8f422" class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb29-2">markers <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ovs*+d'</span></span>
<span id="cb29-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,pid <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'patent_chembl_id'</span>])):</span>
<span id="cb29-4">    tdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'patent_chembl_id'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>pid]</span>
<span id="cb29-5">    plt.scatter(tdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'X'</span>],tdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Y'</span>],label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>pid,marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>markers[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(markers)],s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2026-01-24-creating-a-patent-dataset_files/figure-html/cell-28-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Look at a couple of different targets together</p>
<div id="651ad469" class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb30-1">dfs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb30-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> target_id <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CHEMBL3864'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CHEMBL3130'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CHEMBL4409'</span>):</span>
<span id="cb30-3">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-4">      drop table <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> exists patent_compounds</span>
<span id="cb30-5"></span>
<span id="cb30-6">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-7">      select patent_id,patents.chembl_id patent_chembl_id,chembl_id_lookup.chembl_id compound_chembl_id,canonical_smiles <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-8">        into temporary table patent_compounds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-9">        <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patents join patent_targets using (doc_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-10">         join compound_records using (doc_id) join molecule_hierarchy using (molregno) join compound_structures cs on (cs.molregno<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>parent_molregno) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-11">         join chembl_id_lookup on (parent_molregno<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>entity_id <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> entity_type<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'COMPOUND'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-12">        where target_chembl_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>:target_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb30-13">    d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-14">  select distinct on (compound_chembl_id) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_compounds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-15">   join (select patent_id <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> patent_compounds group by patent_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-16">         order by count(distinct compound_chembl_id) desc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-17">         limit <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>) tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-18">   using (patent_id)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb30-19">   order by compound_chembl_id, patent_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb30-20">    df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> d.DataFrame()</span>
<span id="cb30-21">    df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'target_chembl_id'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> target_id</span>
<span id="cb30-22">    dfs.append(df)</span>
<span id="cb30-23">hybrid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.concat(dfs)</span></code></pre></div>
</div>
<div id="25f75684" class="cell" data-execution_count="32">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1">hybrid.shape</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="32">
<pre><code>(4642, 5)</code></pre>
</div>
</div>
<div id="7a3c1926" class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1">hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mol'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'canonical_smiles'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">apply</span>(Chem.MolFromSmiles)</span>
<span id="cb33-2">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetMorganGenerator(radius<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,fpSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2048</span>)</span>
<span id="cb33-3">hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cmfp3'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Mol'</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">apply</span>(fpg.GetCountFingerprintAsNumPy)</span>
<span id="cb33-4"></span>
<span id="cb33-5">fitter <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> umap.UMAP(n_neighbors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>,min_dist<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span>,metric<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dice'</span>,n_jobs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb33-6">pts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fitter.fit_transform(hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cmfp3'</span>].tolist())</span>
<span id="cb33-7">hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'X'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pts[:,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb33-8">hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Y'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pts[:,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>/home/glandrum/mambaforge/envs/rdkit_blog/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
  warnings.warn(
/home/glandrum/mambaforge/envs/rdkit_blog/lib/python3.12/site-packages/umap/umap_.py:1887: UserWarning: gradient function is not yet implemented for dice distance metric; inverse_transform will be unavailable
  warn(
/home/glandrum/mambaforge/envs/rdkit_blog/lib/python3.12/site-packages/umap/spectral.py:548: UserWarning: Spectral initialisation failed! The eigenvector solver
failed. This is likely due to too small an eigengap. Consider
adding some noise or jitter to your data.

Falling back to random initialisation!
  warn(</code></pre>
</div>
</div>
<div id="85c4c619" class="cell" data-execution_count="34">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb35-2">markers <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ovs*+d'</span></span>
<span id="cb35-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,pid <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'patent_chembl_id'</span>])):</span>
<span id="cb35-4">    tdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> hybrid[hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'patent_chembl_id'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>pid]</span>
<span id="cb35-5">    plt.scatter(tdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'X'</span>],tdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Y'</span>],label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>pid,marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>markers[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(markers)],s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2026-01-24-creating-a-patent-dataset_files/figure-html/cell-32-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Color by target id instead of patent:</p>
<div id="1b692936" class="cell" data-execution_count="40">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb36-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>))</span>
<span id="cb36-2">markers <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ovs*+d'</span></span>
<span id="cb36-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,tid <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'target_chembl_id'</span>])):</span>
<span id="cb36-4">    d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>chembl_36 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb36-5">      select pref_name <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> target_dictionary where chembl_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>:tid</span>
<span id="cb36-6">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tid<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>d[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb36-7">    tdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> hybrid[hybrid[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'target_chembl_id'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>tid]</span>
<span id="cb36-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> j,pid <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(tdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'patent_chembl_id'</span>])):</span>
<span id="cb36-9">        pdf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tdf[tdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'patent_chembl_id'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>pid]</span>
<span id="cb36-10">        plt.scatter(pdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'X'</span>],pdf[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Y'</span>],label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>pid,marker<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>markers[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(markers)],s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,linewidth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,c<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'C</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>i<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>CHEMBL4409: cAMP and cAMP-inhibited cGMP 3',5'-cyclic phosphodiesterase 10A
CHEMBL3130: Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
CHEMBL3864: Tyrosine-protein phosphatase non-receptor type 11</code></pre>
</div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2026-01-24-creating-a-patent-dataset_files/figure-html/cell-33-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>That’s enough for now… these compound sets are sure to show up in future blog posts.</p>


</section>

 ]]></description>
  <category>datasets</category>
  <category>exploration</category>
  <category>patents</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2026-01-24-creating-a-patent-dataset.html</guid>
  <pubDate>Fri, 23 Jan 2026 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Wrapping up 2025</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-12-27-2025-wrapup.html</link>
  <description><![CDATA[ 




<p>For some random reason, at the beginning of 2025 I decided that I was going to do an RDKit blog post every week in 2025. It was a struggle at times, but if you include updates of old blog posts (which I definitely do!), I made it! :tada: For the last post of the year, I’m going to do a short look back at 2025.</p>
<section id="the-rdkit" class="level1">
<h1>The RDKit</h1>
<p>For me the biggest RDKit thing of each year is the UGM, and in 2025 there were actually two. The first North American edition of the UGM took place in April in Cambridge, MA. I didn’t manage the travel for that, but it sounded like it went really well and was a great success. Here’s hoping that we can make that a regular thing! The European UGM took place in September in Prague. I did a <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-09-14-UGM-recap.html">recap post</a> back in September, but the UGM was great. I love the chance to spend a couple of days surrounded by the RDKit community. Registration for the 2026 European UGM, which will take place from 16-18 September in Darmstadt, Germany, will open early next year.</p>
<section id="some-numbers" class="level2">
<h2 class="anchored" data-anchor-id="some-numbers">Some numbers</h2>
<p>(GitHub could make this easier…):</p>
<ul>
<li>two major releases</li>
<li>ten minor releases (well, nine so far, the tenth comes early next week, before the year ends)</li>
<li>348 commits</li>
<li>The two largest contributions (in terms of lines of code) were the SCSR parser from Tad Hurst at CDD and the CDX/CDXML parser from Brian Kelley at Glysade (using code contributed by Revvity)</li>
<li>83 contributors (counting pull requests and resolved issues and enhancements)</li>
<li>147 closed bugs</li>
<li>122 completed enhancements</li>
<li>39 cleanup PRs</li>
<li>17 documentation PRs</li>
<li>I’d include something here about the number of discussions posts, but I’m not willing to invest the time to figure out the GraphQL API for doing that.</li>
</ul>
</section>
<section id="things-that-should-go-better" class="level2">
<h2 class="anchored" data-anchor-id="things-that-should-go-better">Things that should go better</h2>
<p>Reviewing pull requests, particularly larger ones, can take much longer than it really should. We need to figure out a better way to handle these.</p>
<p>Next year I would like to be a bit better at paying attention to the Discussions tab in GitHub. I’m not 100% sure that GitHub Discussions is the best option for questions and general discussion, so I’m also trying an experiment with <a href="https://rdkit.zulipchat.com/">zulip</a>. If you’re interested, <a href="https://rdkit.zulipchat.com/join/3fpjkfmilku2y6gz4qx5bwld/">this link</a> should work to allow you to join automatically for the next 30 days. There’s a chicken and egg problem with this (we need participants for it to be useful, but it needs to be useful to attract participants), but we’ll see how it goes.</p>
<p>Because of all the other stuff going on, I didn’t manage to complete (or even make a lot of progress on) some of my longer-term RDKit projects this year; let’s see what 2026 brings there.</p>
</section>
<section id="related-stuff" class="level2">
<h2 class="anchored" data-anchor-id="related-stuff">Related stuff</h2>
<p>We continued to use and improve our lightweight registration system <a href="https://github.com/rinikerlab/lightweight-registration">lwreg</a> (the Application Note is <a href="https://pubs.acs.org/doi/full/10.1021/acs.jcim.4c01133">here</a>). One of the big things this year is that Jessica cleaned up the docs and figured out how to get things working with readthedocs, so now we have <a href="https://lightweight-registration.readthedocs.io/en/latest/">really nice looking documentation</a>.</p>
<p>Sereina and I taught a new course on Cheminformatics and Computer-Aided Drug Design in the fall semester this year. Of course we used a lot of RDKit for that. :-) As you would expect, putting together a new set of lectures and associated exercises (we did hands-on stuff in Jupyter) was a lot of work, but I really enjoyed both the preparation and doing the lectures. We’re not doing student evaluations this year, so we won’t get feedback from the students, but hopefully they found the material useful.</p>
</section>
</section>
<section id="personal-things" class="level1">
<h1>Personal things</h1>
<p>Its my blog, so I’ll go ahead and include some non-RDKit stuff too.</p>
<p>According to Garmin I have (so far, there are still a few days left in the year) run 2563km with about 71km of elevation gain (that doesn’t count elevation when doing uphill treadmill workouts). Over the summer I ran two ultra-marathons: a 77km race that I finished and a 170km race that I DNF’ed after 100km (it turns out that running that distance is even more complicated than I thought it would be). Getting ready for those took a lot of time, but much of it was fun days moving “fast” in the mountains, so I enjoyed it. I also ran my usual stage for the research group’s team in the Zurich SOLA relay race and managed a PR there. Finally, at the end of the year I set myself the random goal of running a fast 5km (not a race, just a fast time on a stretch I run a few times a week). I didn’t manage to hit my time goal of 20 minutes, but I only missed by 16 seconds, so I’m still pretty happy with the time. It was definitely entertaining to change things up and do some focused training for going faster.</p>
<p>I hiked 653km with about 40km of elevation gain; this is down from a normal year because of the amount of running and because our summer vacation didn’t end up going as planned.</p>
<p>The climbing doesn’t lend itself to quantification (at least not the way we do it) but we ended up doing a fair amount this year. I did considerably less bouldering than last year since I mainly went running on the days I normally would have gone to the bouldering gym, so I didn’t make as much progress on technique as last year, but it looks like I didn’t lose too much.</p>
<p>Ok, it’s time to wrap this up and get back to my holiday project.</p>


</section>

 ]]></description>
  <category>general</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-12-27-2025-wrapup.html</guid>
  <pubDate>Fri, 26 Dec 2025 23:00:00 GMT</pubDate>
</item>
<item>
  <title>About tetrahedral chirality in the RDKit</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-12-21-Chiral-atoms.html</link>
  <description><![CDATA[ 




<p>This week, to try and find inspiration for a short, or at least quick, blog post, I did an “AI Search” and asked for some frequently un-answered RDKit questions. One of those was related to understanding stereochemistry. Then, this morning, I noticed a question in GitHub Discussions asking why the chirality on a tertiary amine wasn’t being preserved. My post topic was clear!</p>
<p>The contents of this post will end up, in an edited form, in the <a href="https://www.rdkit.org/docs/RDKit_Book.html">RDKit Book</a></p>
<div id="bc25d8a8-88c1-4cc7-bf29-cb01006f0f43" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-3">IPythonConsole.molSize <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span></span></code></pre></div>
</div>
<section id="intro" class="level1">
<h1>Intro</h1>
<p>In this post I am going to focus solely on stereochemistry/chirality around tetrahedral atoms. I am aware that there are non-tetrahedral forms of stereochemistry for three- and four-coordinate atoms (and the RDKit supports some of them). Non-tetrahedral stereochemistry, double-bond stereochemistry, and atropisomerism are topics for a possible future post.</p>
</section>
<section id="what-can-be-chiral" class="level1">
<h1>What can be chiral?</h1>
<p>I will borrow the language used in the <a href="https://www.inchi-trust.org/download/104/InChI_TechMan.pdf">InChI technical manual</a> and use the term “stereogenic” to refer to atoms which can be chiral if all of their substituents are different. The RDKit uses the same conditions to determine which atoms can be involved in ring stereochemistry (see below).</p>
<p>If you ever want to go straight to the source to see what could be a potential stereocenter, the function to look for is <a href="">isAtomPotentialChiralCenter()</a> .</p>
<p>In general, the following conditions have to be met for an atom to be considered a potential stereocenter: - It must have a total nonzero-degree of three or four. I.e. three or four neighbors not connected via zero-order bonds. - It cannot have two attached hydrogen atoms (explicit or implicit). For the purposes of this count, only H atoms with unspecified isotopes count. - Three coordinate atoms cannot have an H atom attached, except for phosphines, arsines, S and Se with an explicit valence of four, and S+ and Se+ with an explicit valence of three. - Degree three N atoms cannot have an H atom attached and must be involved in either a three-membered ring or be a bridgehead</p>
<p>Because arbitrary decisions have to be made when dealing with this topic - for example, are tertiary amines stereogenic? - we choose to (try and) follow InChI and use Table 8 from the <a href="https://www.inchi-trust.org/download/104/InChI_TechMan.pdf">InChI technical manual</a> here. If an element is not present in that InChI table, we use the general rules above. One notable exception for main-group atoms is that the RDKit allows trivalent N atoms that are in a bridgehead to be chiral centers.</p>
<p>An example chiral phosphine (invented):</p>
<div id="9adb5d8a-55f2-45de-b1e1-313623ce1af6" class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C12C[N@](C)C(C2)C[P@H]1'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="21">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-21-Chiral-atoms_files/figure-html/cell-3-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<blockquote class="blockquote">
<p>That really should have the <code>[H]</code> drawn as a separate atom</p>
</blockquote>
<p>Notice that, though the chirality on the N was specified in the input, it was removed when the molecule was parsed: the N is neither in a three-membered ring nor a bridgehead.</p>
<p>Here’s an example of a ChEMBL molecule with a chiral bridgehead N:</p>
<div id="e67eb19b-e39b-4fe4-bab5-19452f532eec" class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'COC(=O)[C@@H]1C[N@@]2CCC[C@@H]1C2'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="15">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-21-Chiral-atoms_files/figure-html/cell-4-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="the-rdkits-representation-of-atomic-stereochemistry" class="level1">
<h1>The RDKit’s representation of atomic stereochemistry</h1>
<div id="356bb9e8-7649-46d2-9c02-c1f5a21c7c9f" class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">IPythonConsole.drawOptions.addAtomIndices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb4-2">m <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C[C@](C(=O)O)(CCC)CC'</span>)</span>
<span id="cb4-3">m</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-21-Chiral-atoms_files/figure-html/cell-5-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>We can see the internal representation in the output of <code>Debug()</code> for the chiral center, atom 1:</p>
<div id="aa58cc35-f803-42be-b7d8-1410902c12ad" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">m.Debug()</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>Atoms:
    0 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: SP3
    1 6 C chg: 0  deg: 4 exp: 4 imp: 0 hyb: SP3 chi: CCW nbrs:[0 2 5 8]
    2 6 C chg: 0  deg: 3 exp: 4 imp: 0 hyb: SP2
    3 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: SP2
    4 8 O chg: 0  deg: 1 exp: 1 imp: 1 hyb: SP2
    5 6 C chg: 0  deg: 2 exp: 2 imp: 2 hyb: SP3
    6 6 C chg: 0  deg: 2 exp: 2 imp: 2 hyb: SP3
    7 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: SP3
    8 6 C chg: 0  deg: 2 exp: 2 imp: 2 hyb: SP3
    9 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: SP3
Bonds:
    0 0-&gt;1 order: 1
    1 1-&gt;2 order: 1
    2 2-&gt;3 order: 2 conj?: 1
    3 2-&gt;4 order: 1 conj?: 1
    4 1-&gt;5 order: 1
    5 5-&gt;6 order: 1
    6 6-&gt;7 order: 1
    7 1-&gt;8 order: 1
    8 8-&gt;9 order: 1</code></pre>
</div>
</div>
<p>The important information here is <code>chi</code> and <code>nbrs</code>, which tell us that if we look from atom 0 to atom 1, we need to rotate counter-clockwise to go from atom 2 to atom 5 (or from atom 5 to atom 8).</p>


</section>

 ]]></description>
  <category>tutorial</category>
  <category>documentation</category>
  <category>stereochemistry</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-12-21-Chiral-atoms.html</guid>
  <pubDate>Sat, 20 Dec 2025 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Building synthon spaces with combinatorial reactions</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons.html</link>
  <description><![CDATA[ 




<p>Last week’s <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons.html">blog post</a> looked at using <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial.html">BRICS</a> to build a <a href="https://www.rdkit.org/docs/GettingStartedInPython.html#searching-synthon-spaces">synthon search space</a>. This post builds on that and creates a search space using some combichem reactions from a set published by <a href="http://pubs.acs.org/doi/abs/10.1021/ci200379p">Hartenfeller et al</a>.</p>
<div id="872abcd6-8e5b-44ee-8e83-33b9d77adcd2" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdChemReactions</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdSynthonSpaceSearch</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb1-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.09.3</code></pre>
</div>
</div>
<p>I’m going to use a set of Enamine building blocks that I have on my machine. I loaded these into a <a href="https://greglandrum.github.io/rdkit-blog/posts/2021-12-20-substructlibrary-search-order.html">SubstructLibrary</a> to make them fast and easy to search.</p>
<div id="b0cacab6" class="cell" data-execution_count="251">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdSubstructLibrary</span>
<span id="cb3-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pickle</span>
<span id="cb3-3">sslib <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pickle.load(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'/scratch/Data/Enamine/real_reagents.sslib.pkl'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rb'</span>))</span></code></pre></div>
</div>
<section id="the-first-reaction" class="level1">
<h1>The first reaction</h1>
<p>I’ll start with the Pictet-Spengler reaction from the <a href="http://pubs.acs.org/doi/abs/10.1021/ci200379p">Hartenfeller paper</a>. It’s the first reaction in the SI for that paper and there’s something the name that I really like.</p>
<p>Here’s the definition of the reaction from the paper</p>
<div id="60d638fc" class="cell" data-execution_count="207">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">sma <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[cH1:1]1:[c:2](-[CH2:7]-[CH2:8]-[NH2:9]):[c:3]:[c:4]:[c:5]:[c:6]:1.[#6:11]-[CH1;R0:10]=[OD1]&gt;&gt;[c:1]12:[c:2](-[CH2:7]-[CH2:8]-[NH1:9]-[C:10]-2(-[#6:11])):[c:3]:[c:4]:[c:5]:[c:6]:1'</span></span>
<span id="cb4-2"></span>
<span id="cb4-3">rxn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(sma)</span>
<span id="cb4-4">rxn</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="207">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-4-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>These are the two prototype educts from the SI:</p>
<div id="02076fd2" class="cell" data-scrolled="true" data-execution_count="208">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">r1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1cc(CCN)ccc1'</span>)</span>
<span id="cb5-2">r2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CC(=O)'</span>)</span>
<span id="cb5-3">Draw.MolsToGridImage((r1,r2))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="208">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-5-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>To make what’s going on in the reaction a bit easier to identify, here I add the atom map numbers from the reaction to the sample educts:</p>
<div id="e392add8" class="cell" data-execution_count="209">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">r_queries <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [rxn.GetReactantTemplate(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(rxn.GetNumReactantTemplates())]</span>
<span id="cb6-2"></span>
<span id="cb6-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r,q <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>([r1,r2],r_queries):</span>
<span id="cb6-4">    match <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r.GetSubstructMatch(q)</span>
<span id="cb6-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">assert</span> match</span>
<span id="cb6-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,midx <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(match):</span>
<span id="cb6-7">        mnum <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> q.GetAtomWithIdx(i).GetAtomMapNum()</span>
<span id="cb6-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> mnum:</span>
<span id="cb6-9">            r.GetAtomWithIdx(midx).SetAtomMapNum(mnum)</span>
<span id="cb6-10">Draw.MolsToGridImage((r1,r2))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="209">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-6-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="4865d47c" class="cell" data-execution_count="206">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">r</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="206">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-7-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>I’m going to encode the reaction as three synthons: a core and two “sidechains”</p>
<div id="c91b7376" class="cell" data-scrolled="true" data-execution_count="195">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">core <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[7*]CCNC([11*])[1*]'</span>)</span>
<span id="cb8-2">core</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="195">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-8-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Here’s what I get for the two sample educts:</p>
<div id="53a79ef2" class="cell" data-execution_count="196">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">chains <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[7*]c1ccccc1[1*]'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C[11*]'</span>)]</span>
<span id="cb9-2">Draw.MolsToGridImage([core,chains[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],chains[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]],legends<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'core'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'educt1'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'educt2'</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="196">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-9-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>We can put these together using molzip (which is what the sython search code uses) to get the product:</p>
<div id="883f16e5" class="cell" data-execution_count="197">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">tm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RWMol(core)</span>
<span id="cb10-2">tm.InsertMol(chains[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb10-3">tm.InsertMol(chains[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb10-4"></span>
<span id="cb10-5">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolzipParams()</span>
<span id="cb10-6">ps.label <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolzipLabel.Isotope</span>
<span id="cb10-7">tm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.molzip(tm,ps)</span>
<span id="cb10-8">tm</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="197">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-10-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Now we need reactions that transform the building blocks that you would find in a chemical catalog into the synthons we need for the two educts.</p>
<p>I will do this using reactions.</p>
<div id="e8142722" class="cell" data-execution_count="198">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">r1_prep <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[cH1:1]1:[c:2](-[CH2:7]-[CH2:8]-[NH2:9]):[c:3]:[c:4]:[c:5]:[c:6]:1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb11-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&gt;&gt;[1*]-[c:1]1:[c:2](-[7*])[c:3]:[c:4]:[c:5]:[c:6]1'</span>)</span>
<span id="cb11-3">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r1_prep.RunReactant(r1,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[15:59:19] mapped atoms in the reactants were not mapped in the products.
  unmapped numbers are: 7 8 9 </code></pre>
</div>
</div>
<div id="eadce555" class="cell" data-execution_count="199">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1">Draw.MolsToGridImage([r1,p[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]],legends<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'building block'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synthon'</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="199">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-12-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="2ceebe11" class="cell" data-execution_count="200">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1">r2_prep <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[#6:11]-[CH1;R0:10]=[OD1]&gt;&gt;[#6:11]-[11*]'</span>)</span>
<span id="cb14-2">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r2_prep.RunReactant(r2,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb14-3"></span>
<span id="cb14-4">Draw.MolsToGridImage([r2,p[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]],legends<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'building block'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synthon'</span>])</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[15:59:41] mapped atoms in the reactants were not mapped in the products.
  unmapped numbers are: 10 </code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="200">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-13-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>More complex reactions (see the next example) may have multiple cores. I want to try and have the code for creating the space be reasonably generic, so I’ll put the preparation reactions and core into a list:</p>
<div id="a9799540" class="cell" data-execution_count="215">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1">r_queries <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [rxn.GetReactantTemplate(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(rxn.GetNumReactantTemplates())]</span>
<span id="cb16-2">possibles <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(core,(r1_prep,r2_prep))]</span></code></pre></div>
</div>
<div id="6a98280b" class="cell" data-execution_count="220">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> enumerate_synthons(possibles,outf,writeHeader<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,startIdx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,nWritten<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>):</span>
<span id="cb17-2">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> writeHeader:</span>
<span id="cb17-3">        outf.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'SMILES'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synton_id'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synton#'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'reaction_id'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'release'</span>])<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb17-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> rxnidx,poss <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(possibles,start<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>startIdx):</span>
<span id="cb17-5">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get the core and preparation reactions</span></span>
<span id="cb17-6">        core,preps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> poss</span>
<span id="cb17-7">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># write out the core:</span></span>
<span id="cb17-8">        outf.write(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>Chem<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>MolToSmiles(core)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nWritten<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">r</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>rxnidx<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb17-9"></span>
<span id="cb17-10">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># now prepare each of the sidechain synthons:</span></span>
<span id="cb17-11">        nWritten <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb17-12">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> ridx,prep <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(preps):</span>
<span id="cb17-13">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> prep.GetNumProductTemplates():</span>
<span id="cb17-14">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb17-15">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># find possible building blocks that could work here:</span></span>
<span id="cb17-16">            poss <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sslib.GetMatches(r_queries[ridx],maxResults<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span>
<span id="cb17-17">            <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rxnidx,ridx,<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(poss),nWritten)</span>
<span id="cb17-18">            seen <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>()</span>
<span id="cb17-19">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># now loop over those, prepare them, and write them to the output:</span></span>
<span id="cb17-20">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> mol <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> (sslib.GetMol(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> poss):</span>
<span id="cb17-21">                <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># we use an R0 primitive in the query, so make sure</span></span>
<span id="cb17-22">                <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># we have ring presence</span></span>
<span id="cb17-23">                Chem.FastFindRings(mol)</span>
<span id="cb17-24">                <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># run our prep reaction:</span></span>
<span id="cb17-25">                ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> prep.RunReactant(mol,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb17-26">                <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># write each product (in case there's more than one)</span></span>
<span id="cb17-27">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ps:</span>
<span id="cb17-28">                    smi<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolToSmiles(p[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb17-29">                    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># don't write duplicates:</span></span>
<span id="cb17-30">                    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> smi <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> seen:</span>
<span id="cb17-31">                        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb17-32">                    outf.write(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>smi<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nWritten<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ridx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">r</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>rxnidx<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb17-33">                    seen.add(smi)</span>
<span id="cb17-34">                    nWritten <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb17-35">            <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>,nWritten)</span>
<span id="cb17-36">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> rxnidx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,nWritten</span>
<span id="cb17-37"></span>
<span id="cb17-38"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./space1.txt'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'w+'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> outf:</span>
<span id="cb17-39">    nextRxn,nextIdx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> enumerate_synthons(possibles,outf)</span>
<span id="cb17-40">            </span>
<span id="cb17-41">                </span>
<span id="cb17-42"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>head space1.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1 0 264 2
     333
1 1 4456 333
     4778
SMILES  synton_id   synton# reaction_id release
[1*]C([11*])NCC[7*] 1   1   r1  1
[1*]c1ccccc1[7*]    2   2   r1  1
[1*]c1cc(OC)ccc1[7*]    3   2   r1  1
[1*]c1cc(S(N)(=O)=O)ccc1[7*]    4   2   r1  1
[1*]c1cc(Cl)ccc1[7*]    5   2   r1  1
[1*]c1cc(OC)c(OC)cc1[7*]    6   2   r1  1
[1*]c1c([7*])ccc(OC)c1OC    7   2   r1  1
[1*]c1cc(Cl)cc(Cl)c1[7*]    8   2   r1  1
[1*]c1cc(F)ccc1[7*] 9   2   r1  1</code></pre>
</div>
</div>
<div id="915fca67" class="cell" data-execution_count="213">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1">spc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpace()</span>
<span id="cb19-2">spc.ReadTextFile(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'space1.txt'</span>)</span>
<span id="cb19-3">spc.GetNumProducts()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="213">
<pre><code>1471295</code></pre>
</div>
</div>
<div id="7666a191" class="cell" data-scrolled="false" data-execution_count="214">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1">q2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">''</span>)</span>
<span id="cb21-2">q1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1cccc(C(F)(F)F)c1C'</span>)</span>
<span id="cb21-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#q = Chem.MolFromSmarts('O=c1ncncc1')</span></span>
<span id="cb21-4"></span>
<span id="cb21-5">params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpaceSearchParams()</span>
<span id="cb21-6">params.randomSample <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb21-7">params.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb21-8"></span>
<span id="cb21-9">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(q1,params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params)</span>
<span id="cb21-10">resMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>(res.GetHitMolecules(),key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:x.GetNumHeavyAtoms()))</span>
<span id="cb21-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># resMols = res.GetHitMolecules()</span></span>
<span id="cb21-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> results'</span>)</span>
<span id="cb21-13">Draw.MolsToGridImage(resMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1000 results</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="214">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-17-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="second-reaction" class="level1">
<h1>Second reaction</h1>
<p>The second reaction is <code>Niementowski_quinazoline</code>, also from the <a href="http://pubs.acs.org/doi/abs/10.1021/ci200379p">Hartenfeller paper</a>. I’ve used this reaction before in a <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-06-12-using-reaction-info.html">blog post showing how to extract information from reaction products</a>.</p>
<p>This one is a bit trickier to encode.</p>
<p>Start with the reaction definition:</p>
<div id="cccd2e00" class="cell" data-execution_count="252">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1">sma <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[c:1](-[C;$(C-c1ccccc1):2](=[OD1:3])-[OH1]):[c:4](-[NH2:5]).[N;!H0;!$(N-N);!$(N-C=N);!$(N(-C=O)-C=O):6]-[C;H1,$(C-[#6]):7]=[OD1]&gt;&gt;[c:4]2:[c:1]-[C:2](=[O:3])-[N:6]-[C:7]=[N:5]-2'</span></span>
<span id="cb23-2"></span>
<span id="cb23-3">rxn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(sma)</span>
<span id="cb23-4">rxn</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="252">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-20-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>These are the two educts from the SI:</p>
<div id="6a8c623d" class="cell" data-scrolled="true" data-execution_count="253">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1">r1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1c(C(=O)O)c(N)ccc1'</span>)</span>
<span id="cb24-2">r2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C(=O)N'</span>)</span>
<span id="cb24-3">Draw.MolsToGridImage((r1,r2))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="253">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-21-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Here are the educts with atom map info:</p>
<div id="c495cb86" class="cell" data-execution_count="254">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1">r_queries <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [rxn.GetReactantTemplate(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(rxn.GetNumReactantTemplates())]</span>
<span id="cb25-2"></span>
<span id="cb25-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r,q <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>([r1,r2],r_queries):</span>
<span id="cb25-4">    match <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r.GetSubstructMatch(q)</span>
<span id="cb25-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">assert</span> match</span>
<span id="cb25-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,midx <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(match):</span>
<span id="cb25-7">        mnum <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> q.GetAtomWithIdx(i).GetAtomMapNum()</span>
<span id="cb25-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> mnum:</span>
<span id="cb25-9">            r.GetAtomWithIdx(midx).SetAtomMapNum(mnum)</span>
<span id="cb25-10">Draw.MolsToGridImage((r1,r2))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="254">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-22-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="f13e33ba" class="cell" data-execution_count="288">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#-----</span></span>
<span id="cb26-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Educt 2 is a primary amine</span></span>
<span id="cb26-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   core1: no substituent on educt 2 carbon</span></span>
<span id="cb26-4">core1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O=C1-N-C=N-C([1*]):C1[2*]'</span>,sanitize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb26-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   core3: educt2 has a substituent on the carbon</span></span>
<span id="cb26-6">core3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O=C1-N-C([3*])=N-C([1*]):C1[2*]'</span>,sanitize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb26-7"></span>
<span id="cb26-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Educt 2 is a secondary amine</span></span>
<span id="cb26-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   core2: no substituent on educt 2 carbon</span></span>
<span id="cb26-10">core2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O=C1-N([4*])-C=N-C([1*]):C1[2*]'</span>,sanitize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb26-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   core4: educt2 has a substituent on the carbon</span></span>
<span id="cb26-12">core4 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O=C1-N([4*])-C([3*])=N-C([1*]):C1[2*]'</span>,sanitize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb26-13"></span>
<span id="cb26-14">Draw.MolsToGridImage([core1,core2,core3,core4],legends<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'core1'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'core2'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'core3'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'core4'</span>],molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="288">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-23-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="01c2c2b3" class="cell" data-execution_count="257">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1">r1_prep <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[c:1]1(-[C:2](=[OD1:3])-[OH1]):[c:4](-[NH2:5])[c:6][c:7][c:8][c:9]1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb27-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&gt;&gt;[2*][c:6][c:7][c:8][c:9][1*]'</span>)</span>
<span id="cb27-3">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r1_prep.RunReactant(r1,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[05:23:12] mapped atoms in the reactants were not mapped in the products.
  unmapped numbers are: 1 2 3 4 5 </code></pre>
</div>
</div>
<div id="1a50ccad" class="cell" data-execution_count="258">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1">p[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="258">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-25-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Make sure we form the correct ring:</p>
<div id="ffef8551" class="cell" data-execution_count="227">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb30-1">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolzipParams()</span>
<span id="cb30-2">ps.label <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolzipLabel.Isotope</span>
<span id="cb30-3">tm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.molzip(core1,p[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],ps)</span>
<span id="cb30-4">tm</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="227">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-26-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Now do the preparation reactions for R2, taking the four scenarios into account:</p>
<div id="ef7e7721" class="cell" data-execution_count="294">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1">r2_prep1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[NH2;!$(N-N);!$(N-C=N);!$(N(-C=O)-C=O):6]-[CH1:7]=[OD1]</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb31-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&gt;&gt;'</span>)</span>
<span id="cb31-3">r2_prep3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[NH2;!$(N-N);!$(N-C=N);!$(N(-C=O)-C=O):6]-[C:7](-[#6:3])=[OD1]</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb31-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&gt;&gt;[3*][#6:3]'</span>)</span>
<span id="cb31-5">r2_prep2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[*:8][NH1;!$(N-N);!$(N-C=N);!$(N(-C=O)-C=O):6]-[CH1:7]=[OD1]</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb31-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&gt;&gt;[4*][*:8]'</span>)</span>
<span id="cb31-7">r2_prep4 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[*:8][NH1;!$(N-N);!$(N-C=N);!$(N(-C=O)-C=O):6]-[C;R0:7](-[#6:3])=[OD1]</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb31-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&gt;&gt;([4*][*:8].[3*][*:7])'</span>)</span></code></pre></div>
</div>
<div id="9ea7b4fa" class="cell" data-execution_count="295">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb32-1">possibles <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(core1,(r1_prep,r2_prep1)),(core2,(r1_prep,r2_prep2)),</span>
<span id="cb32-2">             (core3,(r1_prep,r2_prep3)),(core4,(r1_prep,r2_prep4))]</span></code></pre></div>
</div>
<p>Create the synthon space:</p>
<div id="737d10c1" class="cell" data-execution_count="296">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./space2.txt'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'w+'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> outf:</span>
<span id="cb33-2">    nextRxn,nextIdx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> enumerate_synthons(possibles,outf)</span>
<span id="cb33-3">            </span>
<span id="cb33-4">                </span>
<span id="cb33-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>head space2.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1 0 189 2
     188
2 0 189 189
     375
2 1 8173 375</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>[05:38:56] mapped atoms in the reactants were not mapped in the products.
  unmapped numbers are: 6 7 </code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>     383
3 0 189 384
     570
3 1 8173 570</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>[05:38:57] mapped atoms in the reactants were not mapped in the products.
  unmapped numbers are: 6 7 </code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>     1690
4 0 189 1691
     1877
4 1 8173 1877</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>[05:38:57] mapped atoms in the reactants were not mapped in the products.
  unmapped numbers are: 6 3 </code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>     3769
SMILES  synton_id   synton# reaction_id release
[1*]c1nc[nH]c(=O)c1[2*] 1   1   r1  1
[1*]ccc1c([2*])C(=O)c2ccccc2C1=O    2   2   r1  1
[1*]cc(Cl)cc([2*])Cl    3   2   r1  1
[1*]ccc(c[2*])C(F)(F)F  4   2   r1  1
[1*]ccc(Cl)c[2*]    5   2   r1  1
[1*]cc1ccccc1c[2*]  6   2   r1  1
[1*]cc(OC)c(c[2*])OC    7   2   r1  1
[1*]cc(cc[2*])S(=O)(=O)Nc1ccccc1OC  8   2   r1  1
[1*]cc(cc[2*])S(=O)(=O)Nc1ccc(OC)cc1    9   2   r1  1</code></pre>
</div>
</div>
<p>Read it in</p>
<div id="8c666ccf" class="cell" data-scrolled="true" data-execution_count="297">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb41-1">spc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpace()</span>
<span id="cb41-2">spc.ReadTextFile(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'space2.txt'</span>)</span>
<span id="cb41-3">spc.GetNumProducts()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="297">
<pre><code>561906</code></pre>
</div>
</div>
<p>Search</p>
<div id="96ca2242" class="cell" data-execution_count="286">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb43-1">q2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cC(F)(F)F'</span>)</span>
<span id="cb43-2">params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpaceSearchParams()</span>
<span id="cb43-3">params.randomSample <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb43-4">params.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb43-5"></span>
<span id="cb43-6">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(q2,params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params)</span>
<span id="cb43-7">resMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>(res.GetHitMolecules(),key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:x.GetNumHeavyAtoms()))</span>
<span id="cb43-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># resMols = res.GetHitMolecules()</span></span>
<span id="cb43-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> results'</span>)</span>
<span id="cb43-10">Draw.MolsToGridImage(resMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1000 results</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="286">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-31-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>We have results here for cores 2-4, but none for core 1. Let’s confirm that there are core1 results in the set</p>
<div id="604129de" class="cell" data-execution_count="298">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb45-1">q2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[#6](=O)@[#7H1]@[#6H1]@[#7]'</span>)</span>
<span id="cb45-2">params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpaceSearchParams()</span>
<span id="cb45-3">params.randomSample <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb45-4">params.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb45-5"></span>
<span id="cb45-6">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(q2,params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params)</span>
<span id="cb45-7">resMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>(res.GetHitMolecules(),key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:x.GetNumHeavyAtoms()))</span>
<span id="cb45-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># resMols = res.GetHitMolecules()</span></span>
<span id="cb45-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> results'</span>)</span>
<span id="cb45-10">Draw.MolsToGridImage(resMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[05:41:48] Complex queries can be slow.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>186 results</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="298">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-32-output-3.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="combine-the-two-reaction-spaces" class="level1">
<h1>Combine the two reaction spaces</h1>
<p>We now add these synthons to the other space. We could do this by sequentially enumerating the spaces into the same output file, but it’s quicker to re-enumerate the second space onto the bottom of the output file from the first one:</p>
<div id="a6bc6a51" class="cell" data-execution_count="289">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>cp space1.txt combined.txt</span>
<span id="cb48-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>tail combined.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[11*]c1ccnn1-c1cccnc1   4768    3   r1  1
[11*]C1CC2(C1)CC(OC)C2  4769    3   r1  1
[11*]c1c(Cl)cnc(F)c1Cl  4770    3   r1  1
[11*]c1cc(Cl)c(C(=O)O)s1    4771    3   r1  1
[11*]C1COC2CC1C2    4772    3   r1  1
[11*]C12C3CCC(CC31)C2(F)F   4773    3   r1  1
[11*]c1ccc(C(F)(F)C(F)(F)F)s1   4774    3   r1  1
[11*]C12C3CC(CC31)C2NC(=O)OC(C)(C)C 4775    3   r1  1
[11*]C1OCCOC1(C)C   4776    3   r1  1
[11*]c1cc(C)cc(C(F)F)c1 4777    3   r1  1</code></pre>
</div>
</div>
<p>Now re-enumerate the second space, start the synthon IDs at 4778 and the reaction IDs at 2:</p>
<div id="47526c7a" class="cell" data-execution_count="299">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb50-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'combined.txt'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'a'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> outf:</span>
<span id="cb50-2">    enumerate_synthons(possibles,outf,writeHeader<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,nWritten<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4778</span>,startIdx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb50-3">spc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpace()</span>
<span id="cb50-4">spc.ReadTextFile(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'combined.txt'</span>)</span>
<span id="cb50-5">spc.GetNumProducts()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2 0 189 4779
     4965
3 0 189 4966
     5152
3 1 8173 5152
     5160
4 0 189 5161
     5347
4 1 8173 5347
     6467
5 0 189 6468
     6654
5 1 8173 6654
     8546</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="299">
<pre><code>5965799</code></pre>
</div>
</div>
<div id="da47a957" class="cell" data-scrolled="true" data-execution_count="300">
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb53-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>tail combined.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[3*]C.[4*]CC1(N)CCC1    8536    3   r5  1
[3*]C.[4*]C1CCCCC1CN    8537    3   r5  1
[3*]C.[4*]CC1(N)CCCCC1  8538    3   r5  1
[3*]C.[4*]CC1(N)CCCC1   8539    3   r5  1
[3*]C.[4*]C1CC(N)C12CCC2    8540    3   r5  1
[3*]C.[4*]C1CCOCC1N 8541    3   r5  1
[3*]C.[4*]CC(N)c1ccccc1OC   8542    3   r5  1
[3*]C.[4*]C(CN)C(C)(C)C 8543    3   r5  1
[3*]C.[4*]C1Cc2ccccc2C1N    8544    3   r5  1
[3*]C.[4*]C1CC2(C1)CC(N)C2  8545    3   r5  1</code></pre>
</div>
</div>
<p>Verify that searches return results from both spaces:</p>
<div id="495fe668" class="cell" data-execution_count="301">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb55-1">q2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cC(F)(F)F'</span>)</span>
<span id="cb55-2">params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpaceSearchParams()</span>
<span id="cb55-3">params.randomSample <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb55-4">params.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb55-5"></span>
<span id="cb55-6">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(q2,params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params)</span>
<span id="cb55-7">resMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>(res.GetHitMolecules(),key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:x.GetNumHeavyAtoms()))</span>
<span id="cb55-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># resMols = res.GetHitMolecules()</span></span>
<span id="cb55-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> results'</span>)</span>
<span id="cb55-10">Draw.MolsToGridImage(resMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1000 results</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="301">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-36-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="reaction-3" class="level1">
<h1>Reaction 3</h1>
<p>For the last example, I’ll add the reaction definition for spiro chromanone from the Hartenfeller paper. This one is fun because it forms a spiro linkage.</p>
<div id="ff4b7794" class="cell" data-execution_count="302">
<div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb57-1">sma <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[c:1](-[C;$(C-c1ccccc1):2](=[OD1:3])-[CH3:4]):[c:5](-[OH1:6]).[C;$(C1-[CH2]-[CH2]-[N,C]-[CH2]-[CH2]-1):7](=[OD1])&gt;&gt;[O:6]1-[c:5]:[c:1]-[C:2](=[OD1:3])-[C:4]-[C:7]-1'</span></span>
<span id="cb57-2"></span>
<span id="cb57-3">rxn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(sma)</span>
<span id="cb57-4">rxn</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="302">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-38-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>These are the two educts from the SI:</p>
<div id="5197ad25" class="cell" data-scrolled="true" data-execution_count="303">
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb58-1">r1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1cc(C(=O)C)c(O)cc1'</span>)</span>
<span id="cb58-2">r2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C1(=O)CCNCC1'</span>)</span>
<span id="cb58-3">Draw.MolsToGridImage((r1,r2))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="303">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-39-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Here’s what the prototype product looks like:</p>
<div id="aaba9e83" class="cell" data-execution_count="304">
<div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb59-1">p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rxn.RunReactants([r1,r2])</span>
<span id="cb59-2">p[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="304">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-40-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Here are the educts with atom map info:</p>
<div id="7a4c0b4f" class="cell" data-execution_count="305">
<div class="sourceCode cell-code" id="cb60" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb60-1">r_queries <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [rxn.GetReactantTemplate(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(rxn.GetNumReactantTemplates())]</span>
<span id="cb60-2"></span>
<span id="cb60-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r,q <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>([r1,r2],r_queries):</span>
<span id="cb60-4">    match <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r.GetSubstructMatch(q)</span>
<span id="cb60-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">assert</span> match</span>
<span id="cb60-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,midx <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(match):</span>
<span id="cb60-7">        mnum <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> q.GetAtomWithIdx(i).GetAtomMapNum()</span>
<span id="cb60-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> mnum:</span>
<span id="cb60-9">            r.GetAtomWithIdx(midx).SetAtomMapNum(mnum)</span>
<span id="cb60-10">Draw.MolsToGridImage((r1,r2))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="305">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-41-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="ae71849f" class="cell" data-execution_count="309">
<div class="sourceCode cell-code" id="cb61" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb61-1">core <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O=C(-[C][4*])c([1*]):c([5*])[O][6*]'</span>,sanitize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb61-2">core</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="309">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-42-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="962786d1" class="cell" data-execution_count="317">
<div class="sourceCode cell-code" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb62-1">r1_prep <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[c:1]1(-[C:2](=[OD1:3])-[CH3:4]):[c:5](-[OH1:6]):[c:7]:[c:8]:[c:9]:[c:10]1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb62-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&gt;&gt;[*1]:[c:7]:[c:8]:[c:9]:[c:10]:[5*]'</span>)</span>
<span id="cb62-3">p1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r1_prep.RunReactant(r1,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb62-4">p1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[06:13:20] mapped atoms in the reactants were not mapped in the products.
  unmapped numbers are: 1 2 3 4 5 6 </code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="317">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-43-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="81c1e13d" class="cell" data-execution_count="318">
<div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb64-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">'[C;$(C1-[CH2]-[CH2]-[N,C]-[CH2]-[CH2]-1):7](=[OD1])'</span></span>
<span id="cb64-2">r2_prep <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O=[C:11]1-[CH2:12]-[CH2:13]-[N,C:14]-[CH2:15]-[CH2:16]-1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb64-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&gt;&gt;[4*][C:11]1([6*])-[CH2:12]-[CH2:13]-[N,C:14]-[CH2:15]-[CH2:16]-1'</span>)</span>
<span id="cb64-4">p2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r2_prep.RunReactant(r2,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb64-5">p2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="318">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-44-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Make sure we form the correct ring:</p>
<div id="b44bb490" class="cell" data-execution_count="321">
<div class="sourceCode cell-code" id="cb65" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb65-1">tm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RWMol(core)</span>
<span id="cb65-2">tm.InsertMol(p1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb65-3">tm.InsertMol(p2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb65-4">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolzipParams()</span>
<span id="cb65-5">ps.label <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolzipLabel.Isotope</span>
<span id="cb65-6">tm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.molzip(tm,ps)</span>
<span id="cb65-7">tm</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="321">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-45-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="b3520b30" class="cell" data-execution_count="334">
<div class="sourceCode cell-code" id="cb66" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb66-1">Draw.MolsToGridImage([core,p1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],p2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],tm],legends<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synthon1'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synthon2'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synthon3'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'product'</span>],molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="334">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-46-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="4fecb38b" class="cell" data-execution_count="322">
<div class="sourceCode cell-code" id="cb67" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb67-1">possibles <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(core,(r1_prep,r2_prep))]</span></code></pre></div>
</div>
<p>Create the synthon space:</p>
<div id="99cc6ea4" class="cell" data-execution_count="323">
<div class="sourceCode cell-code" id="cb68" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb68-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./space3.txt'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'w+'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> outf:</span>
<span id="cb68-2">    nextRxn,nextIdx <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> enumerate_synthons(possibles,outf)</span>
<span id="cb68-3">            </span>
<span id="cb68-4">                </span>
<span id="cb68-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>head space2.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1 0 58 2
     60
1 1 241 60
     301
SMILES  synton_id   synton# reaction_id release
[1*]c1nc[nH]c(=O)c1[2*] 1   1   r1  1
[1*]ccc1c([2*])C(=O)c2ccccc2C1=O    2   2   r1  1
[1*]cc(Cl)cc([2*])Cl    3   2   r1  1
[1*]ccc(c[2*])C(F)(F)F  4   2   r1  1
[1*]ccc(Cl)c[2*]    5   2   r1  1
[1*]cc1ccccc1c[2*]  6   2   r1  1
[1*]cc(OC)c(c[2*])OC    7   2   r1  1
[1*]cc(cc[2*])S(=O)(=O)Nc1ccccc1OC  8   2   r1  1
[1*]cc(cc[2*])S(=O)(=O)Nc1ccc(OC)cc1    9   2   r1  1</code></pre>
</div>
</div>
<p>Read it in</p>
<div id="e6920d21" class="cell" data-scrolled="true" data-execution_count="324">
<div class="sourceCode cell-code" id="cb70" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb70-1">spc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpace()</span>
<span id="cb70-2">spc.ReadTextFile(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'space3.txt'</span>)</span>
<span id="cb70-3">spc.GetNumProducts()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="324">
<pre><code>13978</code></pre>
</div>
</div>
<p>Search</p>
<div id="3f3ee541" class="cell" data-execution_count="325">
<div class="sourceCode cell-code" id="cb72" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb72-1">q2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cC(F)(F)F'</span>)</span>
<span id="cb72-2">params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpaceSearchParams()</span>
<span id="cb72-3">params.randomSample <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb72-4">params.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb72-5"></span>
<span id="cb72-6">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(q2,params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params)</span>
<span id="cb72-7">resMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>(res.GetHitMolecules(),key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:x.GetNumHeavyAtoms()))</span>
<span id="cb72-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># resMols = res.GetHitMolecules()</span></span>
<span id="cb72-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> results'</span>)</span>
<span id="cb72-10">Draw.MolsToGridImage(resMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>778 results</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="325">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-50-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>And combine it with the other two:</p>
<div id="75d1f4b5" class="cell" data-scrolled="true" data-execution_count="326">
<div class="sourceCode cell-code" id="cb74" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb74-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>tail combined.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[3*]C.[4*]CC1(N)CCC1    8536    3   r5  1
[3*]C.[4*]C1CCCCC1CN    8537    3   r5  1
[3*]C.[4*]CC1(N)CCCCC1  8538    3   r5  1
[3*]C.[4*]CC1(N)CCCC1   8539    3   r5  1
[3*]C.[4*]C1CC(N)C12CCC2    8540    3   r5  1
[3*]C.[4*]C1CCOCC1N 8541    3   r5  1
[3*]C.[4*]CC(N)c1ccccc1OC   8542    3   r5  1
[3*]C.[4*]C(CN)C(C)(C)C 8543    3   r5  1
[3*]C.[4*]C1Cc2ccccc2C1N    8544    3   r5  1
[3*]C.[4*]C1CC2(C1)CC(N)C2  8545    3   r5  1</code></pre>
</div>
</div>
<div id="a25a69ca" class="cell" data-execution_count="328">
<div class="sourceCode cell-code" id="cb76" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb76-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'combined.txt'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'a'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> outf:</span>
<span id="cb76-2">    enumerate_synthons(possibles,outf,writeHeader<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,nWritten<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8546</span>,startIdx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)</span>
<span id="cb76-3">spc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpace()</span>
<span id="cb76-4">spc.ReadTextFile(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'combined.txt'</span>)</span>
<span id="cb76-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(spc.GetNumProducts())</span>
<span id="cb76-6"></span>
<span id="cb76-7"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>tail combined.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>6 0 58 8547
     8605
6 1 241 8605
     8846
6077623
[4*]C1([6*])CCC(=C(F)F)CC1  8836    3   r6  1
[4*]C1([6*])CCN(S(=O)(=O)F)CC1  8837    3   r6  1
[4*]C1([6*])CCN(c2cc(=O)[nH]cn2)CC1 8838    3   r6  1
[4*]C1([6*])CCN(OCc2ccc(Cl)cc2)CC1  8839    3   r6  1
[4*]C1([6*])CCC(N2CCCC2)CC1 8840    3   r6  1
[4*]C1([6*])CCC(N2CCOCC2)CC1    8841    3   r6  1
[4*]C1([6*])CCN(c2cc[nH]c(=O)c2)CC1 8842    3   r6  1
[4*]C1([6*])CCC([N+](=O)[O-])CC1    8843    3   r6  1
[4*]C1([6*])CCC2(CC1)CC2C(=O)OC 8844    3   r6  1
[4*]C1([6*])CCC2(CCC(CBr)O2)CC1 8845    3   r6  1</code></pre>
</div>
</div>
<p>Do a search to be sure we get results from multiple reactions:</p>
<div id="d5be75ad" class="cell" data-execution_count="332">
<div class="sourceCode cell-code" id="cb78" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb78-1">q2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cC(=O)C'</span>)</span>
<span id="cb78-2">params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpaceSearchParams()</span>
<span id="cb78-3">params.randomSample <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb78-4">params.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb78-5"></span>
<span id="cb78-6">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(q2,params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>params)</span>
<span id="cb78-7">resMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>(res.GetHitMolecules(),key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:x.GetNumHeavyAtoms()))</span>
<span id="cb78-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># resMols = res.GetHitMolecules()</span></span>
<span id="cb78-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> results'</span>)</span>
<span id="cb78-10">Draw.MolsToGridImage(resMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1000 results</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="332">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons_files/figure-html/cell-53-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>There are many more reactions in the Hartenfeller paper, but I’m going to stop here.</p>


</section>

 ]]></description>
  <category>tutorial</category>
  <category>documentation</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-12-14-Reaction-synthons.html</guid>
  <pubDate>Sat, 13 Dec 2025 23:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/reaction-synthons-1.png" medium="image" type="image/png" height="36" width="144"/>
</item>
<item>
  <title>Building synthon spaces with BRICS fragments</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons.html</link>
  <description><![CDATA[ 




<p>Last year we added functionality to the RDKit to allow searching in synthon, or combinatorial library spaces. Dave Cosgrove <a href="https://greglandrum.github.io/rdkit-blog/posts/2024-12-03-introducing-synthon-search.html">did a blog post</a> on this a while ago and there also a <a href="https://www.rdkit.org/docs/GettingStartedInPython.html#searching-synthon-spaces">tutorial in the docs</a>.</p>
<p>I’ve been asked a few times how one can create a synthon space from either a set of compounds or a set of reactions and building blocks. This post will focus on the first use case: creating a synthon space from the fragments created by applying BRICS fragmentation to a set of ChEMBL compounds. I will try to do a post covering the other use case in the not-too-distant future.</p>
<div id="872abcd6-8e5b-44ee-8e83-33b9d77adcd2" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> BRICS</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdSynthonSpaceSearch</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb1-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.09.3</code></pre>
</div>
</div>
<section id="the-synthon-space-input-file" class="level1">
<h1>The synthon space input file</h1>
<p>Let’s start by looking at what the synthon space input file looks like:</p>
<div id="fed730d1-e1fb-49c2-a3ca-4c29819ad5cb" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># snipped from the freedom space input</span></span>
<span id="cb3-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./blah.txt'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'w+'</span>).write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'''SMILES synton_id   synton# reaction_id release</span></span>
<span id="cb3-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Clc1ccc(-c2cnc(N[U])s2)cc1  6   1   a1  3</span></span>
<span id="cb3-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">O=[N+]([O-])c1cccc(N[U])c1O 10  1   a1  3</span></span>
<span id="cb3-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">O=C([U])c1sc2cc(Cl)ccc2c1Cl 31  2   a1  3</span></span>
<span id="cb3-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1cccc(O)c1C(=O)[U]   86  2   a1  3</span></span>
<span id="cb3-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'''</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<pre><code>189</code></pre>
</div>
</div>
<p>There’s more information about the format and how searches work <a href="https://www.rdkit.org/docs/GettingStartedInPython.html#how-it-works">in the docs</a>.</p>
<p>Here’s a quick demo on using the file:</p>
<div id="a8602b25-fe60-4eec-bd4e-4f583b50f65b" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">spc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpace()</span>
<span id="cb5-2">spc.ReadTextFile(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blah.txt'</span>)</span>
<span id="cb5-3">spc.GetNumProducts()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>4</code></pre>
</div>
</div>
<div id="7d981a24-bee5-4718-9129-dbf297a664f3" class="cell" data-scrolled="false" data-execution_count="4">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CN'</span>))</span>
<span id="cb7-2">Draw.MolsToGridImage(res.GetHitMolecules())</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons_files/figure-html/cell-5-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="brics-decomposition" class="level1">
<h1>BRICS decomposition:</h1>
<p>I did a post earlier this year with a <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial.html">tutorial on BRICS decomposition</a>, so I won’t get into a lot of detail here.</p>
<p>Here’s a ChEMBL molecule we’ll work with:</p>
<div id="d610df36-0b11-4099-a428-f0dedfafa30d" class="cell" data-scrolled="true" data-execution_count="5">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">m <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C1=C(c2ccc(CCN3CCCCC3)cc2)C2CN(Cc3ccccc3)CC2C1 CHEMBL256225'</span>)</span>
<span id="cb8-2">m</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons_files/figure-html/cell-6-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>And this is what we get when we do a BRICS decomposition:</p>
<div id="bd232c92-073e-412d-822e-9213bc34e22d" class="cell" data-scrolled="true" data-execution_count="6">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">BRICS.BRICSDecompose(m)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>{'[16*]c1ccccc1',
 '[4*]CC[8*]',
 '[4*]C[8*]',
 '[5*]N1CC2CC=C(c3ccc([16*])cc3)C2C1',
 '[5*]N1CCCCC1'}</code></pre>
</div>
</div>
<p>The synthon space search code works by combining multiple building blocks to form a compound in a single “reaction” step, and it’s unhappy if you have dummy atoms/attachment points leftover after that step, So these fragments are too small to be useful in sython searching.</p>
<p>Fortunately, we can have the BRICS decomposition give us partial decomposition results along with the final results:</p>
<div id="7ec211eb" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">BRICS.BRICSDecompose(m,keepNonLeafNodes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,minFragmentSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<pre><code>{'C1=C(c2ccc(CCN3CCCCC3)cc2)C2CN(Cc3ccccc3)CC2C1',
 '[16*]c1ccc(C2=CCC3CN(Cc4ccccc4)CC23)cc1',
 '[16*]c1ccccc1',
 '[4*]CC[8*]',
 '[4*]CCc1ccc(C2=CCC3CN(C[8*])CC23)cc1',
 '[4*]CCc1ccc(C2=CCC3CN(Cc4ccccc4)CC23)cc1',
 '[4*]CCc1ccc(C2=CCC3CN([5*])CC23)cc1',
 '[4*]Cc1ccccc1',
 '[5*]N1CC2CC=C(c3ccc(CCN4CCCCC4)cc3)C2C1',
 '[5*]N1CC2CC=C(c3ccc([16*])cc3)C2C1',
 '[5*]N1CCCCC1',
 '[8*]CCN1CCCCC1',
 '[8*]CN1CC2CC=C(c3ccc(CCN4CCCCC4)cc3)C2C1',
 '[8*]CN1CC2CC=C(c3ccc([16*])cc3)C2C1'}</code></pre>
</div>
</div>
<p>What we’ll do here is pick out all of the fragments that have two attachment points to use as core synthons for a reaction and then provide all the single-attachment point fragments that are compatible with those two attachment points as the other partners in a reaction.</p>
<p>So in this case one “reaction” would be around the core <code>[4*]CCc1ccc(C2=CCC3CN(C[8*])CC23)cc1</code> as synthon 1. We would we’d provide all of the single-attachment fragments that can connect to attachment point <code>[4*]</code> (in this set of fragments that’s only <code>[5*]</code>) as synthon 2, and all of the single-attachment fragnets that can connect to attachment point <code>[8*]</code> (here that’s <code>[16*]</code>) as synthon 3.</p>
<p>Because the synthon code connects attachment points with the same label, we need to transform all of the <code>[5*]</code>s into <code>[4*]</code>s and all of the <code>[16*]</code>s into <code>[8*]</code>s to give this:</p>
<pre><code>SMILES  synton_id   synton# reaction_id release
[4*]CCc1ccc(C2=CCC3CN(C[8*])CC23)cc1    1   1   r1  1
[4*]N1CCCCC1    2   2   r1  1
[4*]Cc1ccccc1   3   2   r1  1
[4*]N1CC2CC=C(c3ccc(CCN4CCCCC4)cc3)C2C1 4   2   r1  1
[4*]CCc1ccc(C2=CCC3CN(Cc4ccccc4)CC23)cc1    5   2   r1  1
[8*]c1ccc(C2=CCC3CN(Cc4ccccc4)CC23)cc1  6   3   r1  1
[8*]CN1CC2CC=C(c3ccc(CCN4CCCCC4)cc3)C2C1    7   3   r1  1
[8*]CCN1CCCCC1  8   3   r1  1
[8*]c1ccccc1    9   3   r1  1</code></pre>
<p>Automating this to handle a set of molecules doesn’t require a massive amount of code:</p>
<div id="53c93237" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> collections <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> defaultdict</span>
<span id="cb14-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> re</span>
<span id="cb14-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> BRICS</span>
<span id="cb14-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> copy</span>
<span id="cb14-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> logging</span>
<span id="cb14-6"></span>
<span id="cb14-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_connectors():</span>
<span id="cb14-8">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">''' read all the BRICS "reaction" definitions and build a dict mapping</span></span>
<span id="cb14-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">      connector type -&gt; types it can connect to</span></span>
<span id="cb14-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    '''</span></span>
<span id="cb14-11">    connectors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> defaultdict(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>)</span>
<span id="cb14-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> defs <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> BRICS.reactionDefs:</span>
<span id="cb14-13">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,j,b <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> defs:</span>
<span id="cb14-14">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'-'</span>:</span>
<span id="cb14-15">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb14-16">            connectors[i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'*'</span>].append(j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'*'</span>)</span>
<span id="cb14-17">            connectors[j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'*'</span>].append(i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'*'</span>)</span>
<span id="cb14-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remove duplicates and convert to a standard dict</span></span>
<span id="cb14-19">    res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb14-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k,v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> connectors.items():</span>
<span id="cb14-21">        res[k] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(v))</span>
<span id="cb14-22">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> res</span>
<span id="cb14-23"></span>
<span id="cb14-24"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_synthons(mol):</span>
<span id="cb14-25">    frags <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BRICS.BRICSDecompose(mol,keepNonLeafNodes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,minFragmentSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb14-26">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># remove anything with 7 in it since those are attached with double bonds</span></span>
<span id="cb14-27">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># and we don't handle those:</span></span>
<span id="cb14-28">    frags <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> frags <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[7*'</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> x]</span>
<span id="cb14-29">    cnts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(x,x.count(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'*'</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> frags]</span>
<span id="cb14-30">    ones <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>([x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x,cnt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> cnts <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x)))</span>
<span id="cb14-31">    twos <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>([x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x,cnt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> cnts <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>],key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(x)))</span>
<span id="cb14-32">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> twos,ones</span>
<span id="cb14-33"></span>
<span id="cb14-34"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_possibles(rlabel,frags,connectors):</span>
<span id="cb14-35">    res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb14-36">    poss <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> connectors[rlabel]</span>
<span id="cb14-37">    rlabel <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'['</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span>rlabel</span>
<span id="cb14-38">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> smi <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> frags:</span>
<span id="cb14-39">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> rlabel <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> smi:</span>
<span id="cb14-40">            res.append(smi)</span>
<span id="cb14-41">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb14-42">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> lbl <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> poss:</span>
<span id="cb14-43">            lbl <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'['</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span>lbl</span>
<span id="cb14-44">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> lbl <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> smi:</span>
<span id="cb14-45">                nsmi <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> smi.replace(lbl,rlabel)</span>
<span id="cb14-46">                res.append(nsmi)</span>
<span id="cb14-47">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">break</span></span>
<span id="cb14-48">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> res</span>
<span id="cb14-49"></span>
<span id="cb14-50"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_rs(core):</span>
<span id="cb14-51">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> re.findall(<span class="vs" style="color: #20794D;
background-color: null;
font-style: inherit;">r'\[(.*?\*)\]'</span>,core)</span>
<span id="cb14-52"></span>
<span id="cb14-53"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> generate_space(mols,fname):</span>
<span id="cb14-54">    connectors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_connectors()</span>
<span id="cb14-55">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(fname,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'w+'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> outf:</span>
<span id="cb14-56">        nWritten <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb14-57">        nRxns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb14-58">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,mol <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(mols):</span>
<span id="cb14-59">            twos,ones <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_synthons(mol)</span>
<span id="cb14-60">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(twos):</span>
<span id="cb14-61">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb14-62">            nWritten,nRxns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> generate_reactions(outf,twos,ones,connectors,nWritten,nRxns,writeHeader<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> nWritten))</span>
<span id="cb14-63">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> nWritten,nRxns</span>
<span id="cb14-64"></span>
<span id="cb14-65"></span>
<span id="cb14-66"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> generate_reactions(outf,allTwos,allOnes,connectors,nWritten<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,nRxns<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,writeHeader<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>):</span>
<span id="cb14-67">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> writeHeader:</span>
<span id="cb14-68">        outf.write(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'SMILES'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synton_id'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'synton#'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'reaction_id'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'release'</span>])<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb14-69">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> rs,twos <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> allTwos.items():</span>
<span id="cb14-70">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(rs)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb14-71">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># do we have the same label twice?</span></span>
<span id="cb14-72">        newrs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rs[:]</span>
<span id="cb14-73">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(rs)) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>:</span>
<span id="cb14-74">            rs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(rs)</span>
<span id="cb14-75">            rlabel <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb14-76">            newlabel <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rlabel.replace(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'*'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'00*'</span>)</span>
<span id="cb14-77">            newrs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (newlabel,rs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb14-78">            connectors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> copy.deepcopy(connectors)</span>
<span id="cb14-79">            connectors[newlabel] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> connectors[rlabel]</span>
<span id="cb14-80">            rs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">tuple</span>(rs)</span>
<span id="cb14-81">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> core <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> twos:</span>
<span id="cb14-82">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> newrs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> rs:</span>
<span id="cb14-83">                core <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> core.replace(rs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],newrs[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb14-84">            outf.write(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>core<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nWritten<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">r</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nRxns<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb14-85">            nWritten <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb14-86">        ones <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> allOnes[rs]</span>
<span id="cb14-87">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> j,r <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(newrs):</span>
<span id="cb14-88">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> get_possibles(r,ones,connectors):</span>
<span id="cb14-89">                outf.write(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nWritten<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>j<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">r</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nRxns<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">1</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb14-90">                nWritten <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb14-91">        nRxns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb14-92">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> nWritten,nRxns</span>
<span id="cb14-93"></span>
<span id="cb14-94"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> generate_space(mols,fname):</span>
<span id="cb14-95">    connectors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_connectors()</span>
<span id="cb14-96">    allOnes <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> defaultdict(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>)</span>
<span id="cb14-97">    allTwos <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> defaultdict(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>)</span>
<span id="cb14-98">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> mol <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> mols:</span>
<span id="cb14-99">        twos,ones <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_synthons(mol)</span>
<span id="cb14-100">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> two <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> twos:</span>
<span id="cb14-101">            rs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">tuple</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>(get_rs(two)))</span>
<span id="cb14-102">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(rs) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>:</span>
<span id="cb14-103">                logger.warning(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'core </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>two<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> does not have two attachment points'</span>)</span>
<span id="cb14-104">            allTwos[rs].add(two)</span>
<span id="cb14-105">            allOnes[rs] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> allOnes[rs].union(ones)</span>
<span id="cb14-106">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(fname,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'w+'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> outf:</span>
<span id="cb14-107">        nWritten,nRxns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> generate_reactions(outf,allTwos,allOnes,connectors)</span>
<span id="cb14-108">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> nWritten,nRxns</span></code></pre></div>
</div>
<p>Try that out on two molecules:</p>
<div id="88ec3e88" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">sample1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C1=C(c2ccc(CCN3CCCCC3)cc2)C2CN(Cc3ccccc3)CC2C1 CHEMBL256225'</span>)</span>
<span id="cb15-2">sample2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C1=C(c2ccc(CCN3CCCCCC3)cc2)C2CN(Cc3ccccc3)CC2C1 madeup'</span>)</span>
<span id="cb15-3">nWritten,nRxns<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>generate_space([sample1,sample2],<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'blah2.txt'</span>)</span>
<span id="cb15-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(nWritten,nRxns)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>65 4</code></pre>
</div>
</div>
<div id="fad5d4ab" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>head blah2.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>SMILES  synton_id   synton# reaction_id release
[4*]CCc1ccc(C2=CCC3CN(C[8*])CC23)cc1    1   1   r1  1
[4*]CC[8*]  2   1   r1  1
[4*]N1CCCCC1    3   2   r1  1
[4*]Cc1ccccc1   4   2   r1  1
[4*]N1CC2CC=C(c3ccc(CCN4CCCCC4)cc3)C2C1 5   2   r1  1
[4*]N1CCCCCC1   6   2   r1  1
[4*]CCc1ccc(C2=CCC3CN(Cc4ccccc4)CC23)cc1    7   2   r1  1
[4*]N1CC2CC=C(c3ccc(CCN4CCCCCC4)cc3)C2C1    8   2   r1  1
[8*]CCN1CCCCC1  9   3   r1  1</code></pre>
</div>
</div>
<p>Notice in the above that we have combined the two cores that have attachment points <code>[4*]</code> and <code>[8*]</code> into a single reaction, <code>r1</code>. This makes the resulting synthon space smaller and more efficient to search.</p>
<p>Ok, let’s do more molecules. I’ll use the set of very active molecules from ChEMBL36 that I put together in an <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-10-31-how-long-does-it-take.html">earlier blog post</a>.</p>
<div id="0f7ef016" class="cell" data-scrolled="true" data-execution_count="11">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1">lines <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../data/chembl36_very_active.txt'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'r'</span>).readlines()</span>
<span id="cb19-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># header:</span></span>
<span id="cb19-3">lines.pop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb19-4">keep <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb19-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> l <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> lines:</span>
<span id="cb19-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'['</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> l:</span>
<span id="cb19-7">        keep.append(l.strip().split()[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb19-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(keep)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<pre><code>3074</code></pre>
</div>
</div>
<div id="d15a2d0d" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1">Draw.MolsToGridImage([Chem.MolFromSmiles(smi) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> smi <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> keep[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>]],molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="12">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons_files/figure-html/cell-13-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Generate a synthon space for the first 300 of the ChEMBL compounds:</p>
<div id="24fb0066" class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1">first <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> keep[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>]]</span>
<span id="cb22-2"></span>
<span id="cb22-3">fname <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./results/actives_space.txt'</span></span>
<span id="cb22-4">nwritten,nrxns <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> generate_space(first,fname)</span>
<span id="cb22-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nwritten<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> synthons in </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>nrxns<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> reactions'</span>)</span>
<span id="cb22-6"></span>
<span id="cb22-7">spc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdSynthonSpaceSearch.SynthonSpace()</span>
<span id="cb22-8">spc.ReadTextFile(fname)</span>
<span id="cb22-9"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>rdSynthonSpaceSearch<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>FormattedIntegerString(spc.GetNumProducts())<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> products in space'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>23789 synthons in 79 reactions
253 702 889 products in space</code></pre>
</div>
</div>
<p>Here’s what the file looks like:</p>
<div id="2125691f" class="cell" data-scrolled="true" data-execution_count="14">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>head .<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>results<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>actives_space.txt</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>SMILES  synton_id   synton# reaction_id release
[500*]NC(=O)N[5*]   1   1   r1  1
[500*]NC(=O)NC1CCN(C(=O)c2ccc(C(=O)N[5*])cc2)CC1    2   1   r1  1
[500*]Nc1nc2c(nc1N1CCC(Oc3ccc(F)cc3F)CC1)C(C)N([5*])CC2 3   1   r1  1
[500*]Nc1cc2cc(C3CCN([5*])CC3)c(C)cc2cn1    4   1   r1  1
[500*]Nc1nc2c(nc1N1CCC(Oc3ccc(F)cc3F)CC1)CN([5*])C(C)C2 5   1   r1  1
[500*]NCCCN1CCN(CCCN[5*])CC1    6   1   r1  1
[500*]Nc1cccc(CCN2CCN([5*])CC2)c1   7   1   r1  1
[500*]Nc1ccc(S(=O)(=O)N[5*])cc1F    8   1   r1  1
[500*]NC(=O)c1ccc(-c2cnc3c(N[5*])cc(C(F)(F)c4cccc(F)c4)nn23)cc1C    9   1   r1  1</code></pre>
</div>
</div>
<p>Let’s do a few searches:</p>
<div id="3cfdcad3" class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1cccs1'</span>))</span>
<span id="cb26-2">resMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> res.GetHitMolecules()</span>
<span id="cb26-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> results'</span>)</span>
<span id="cb26-4">Draw.MolsToGridImage(resMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1000 results</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="16">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons_files/figure-html/cell-16-output-2.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="287ec6ab" class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> spc.SubstructureSearch(Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1ncncc1'</span>))</span>
<span id="cb28-2">resMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> res.GetHitMolecules()</span>
<span id="cb28-3"></span>
<span id="cb28-4">Draw.MolsToGridImage(resMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(resMols) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons_files/figure-html/cell-17-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>It’s nicer to sort by increasing molecular size:</p>
<div id="966e397c" class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1">tms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>(resMols,key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:x.GetNumHeavyAtoms()))</span>
<span id="cb29-2">Draw.MolsToGridImage(tms[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(tms) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="21">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons_files/figure-html/cell-18-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>As we saw in the <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial.html">blog post on BRICS</a>, you can get some odd molecules by putting together BRICS fragments, but there are plenty of results in there that still look reasonable.</p>


</section>

 ]]></description>
  <category>tutorial</category>
  <category>documentation</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-12-07-BRICS-synthons.html</guid>
  <pubDate>Sat, 06 Dec 2025 23:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/brics-synthons-1.png" medium="image" type="image/png" height="47" width="144"/>
</item>
<item>
  <title>Thresholds for “random” with 3D similarity methods</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d.html</link>
  <description><![CDATA[ 




<section id="intro" class="level1">
<h1>Intro</h1>
<p>In this one I’m generating some reference data for 3D similarity approaches. The idea is inspired by <a href="https://greglandrum.github.io/rdkit-blog/posts/2021-05-18-fingerprint-thresholds1.html">this blog post</a>, where I figured out noise thresholds for similarity calculations with a bunch of 2D fingerprints. Here I do basically the same thing: calculating a number of different 3D similarity (or distance) metrics on random pairs of molecules in order to establish noise thresholds for those metrics.</p>
<p>I compare results from two different data sets for this: 1. 25000 random pairs of molecules with crystal structures from the LOBSTER data set. The <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-11-08-working-with-lobster-1.html">last</a> <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2.html">three</a> <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3.html">blog posts</a> have looked at LOBSTER. I used the crystal structures from the LOBSTER data set for these molecules. 2. 50000 random pairs of molecules from the ChEMBL set I used in the <a href="https://greglandrum.github.io/rdkit-blog/posts/2021-05-18-fingerprint-thresholds1.html">fingerprint thresholds post</a>. I used ETKDGv3 conformers for these molecules.</p>
<p>The first data set is considerably less diverse, the pairs are formed from only 3583 unique molecules, but the 3D structures are from crystals so I think it’s worth considering (even though we know that ETKDG does generally produce reasonable structures). For cases where the values disagree, I think it’s probably better to use the ChEMBL results since they come from a larger data set.</p>
<p>Here’s the summary of the results.</p>
<section id="alignment-based-approaches" class="level2">
<h2 class="anchored" data-anchor-id="alignment-based-approaches">Alignment based approaches</h2>
<p>Here some of the metrics are similarity based, where the thresholds are lower bounds, and some are distance based, where the thresholds are upper bounds. Rather than transform the distance into similarity, I think it’s more useful to report the values that actually come from the RDKit function. If this turns out to be wrong, I will update the blog post.</p>
<p>For example, if you do a shape-based alignment of two molecules to each other and get a shape Tanimoto score of 0.80, the LOBSTER data would say that the value is larger than 95% of the random pairs while the ChEMBL data says it’s larger (more significant) than 99% of the random pairs. Similarly, if the shape-based alignment produces a shape Tanimoto distance of 0.30, both data sets would say that the distance is smaller (more significant) than 99% of the random pairs.</p>
<section id="lobster-set" class="level3">
<h3 class="anchored" data-anchor-id="lobster-set">LOBSTER Set</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 39%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
</colgroup>
<thead>
<tr class="header">
<th>metric</th>
<th>70%</th>
<th>80%</th>
<th>90%</th>
<th>95%</th>
<th>99%</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>baseline_ShapeTanimoto</td>
<td>&gt;0.49</td>
<td>&gt;0.54</td>
<td>&gt;0.62</td>
<td>&gt;0.69</td>
<td>&gt;0.81</td>
<td>SIMILARITY</td>
</tr>
<tr class="even">
<td>shape_align_ShapeTanimoto</td>
<td>&gt;0.62</td>
<td>&gt;0.66</td>
<td>&gt;0.71</td>
<td>&gt;0.76</td>
<td>&gt;0.85</td>
<td>SIMILARITY</td>
</tr>
<tr class="odd">
<td>baseline_TanimotoDist</td>
<td>&lt;0.61</td>
<td>&lt;0.57</td>
<td>&lt;0.52</td>
<td>&lt;0.48</td>
<td>&lt;0.39</td>
<td>DISTANCE</td>
</tr>
<tr class="even">
<td>shape_align_TanimotoDist</td>
<td>&lt;0.53</td>
<td>&lt;0.50</td>
<td>&lt;0.46</td>
<td>&lt;0.42</td>
<td>&lt;0.34</td>
<td>DISTANCE</td>
</tr>
<tr class="odd">
<td>shape_align_noc_TanimotoDist</td>
<td>&lt;0.52</td>
<td>&lt;0.49</td>
<td>&lt;0.45</td>
<td>&lt;0.42</td>
<td>&lt;0.34</td>
<td>DISTANCE</td>
</tr>
<tr class="even">
<td>o3a_align_TanimotoDist</td>
<td>&lt;0.58</td>
<td>&lt;0.55</td>
<td>&lt;0.51</td>
<td>&lt;0.46</td>
<td>&lt;0.36</td>
<td>DISTANCE</td>
</tr>
<tr class="odd">
<td>crippeno3a_align_TanimotoDist</td>
<td>&lt;0.59</td>
<td>&lt;0.56</td>
<td>&lt;0.51</td>
<td>&lt;0.47</td>
<td>&lt;0.37</td>
<td>DISTANCE</td>
</tr>
</tbody>
</table>
</section>
<section id="chembl-set" class="level3">
<h3 class="anchored" data-anchor-id="chembl-set">ChEMBL Set</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 39%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
</colgroup>
<thead>
<tr class="header">
<th>metric</th>
<th>70%</th>
<th>80%</th>
<th>90%</th>
<th>95%</th>
<th>99%</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>baseline_ShapeTanimoto</td>
<td>&gt;0.44</td>
<td>&gt;0.48</td>
<td>&gt;0.54</td>
<td>&gt;0.59</td>
<td>&gt;0.69</td>
<td>SIMILARITY</td>
</tr>
<tr class="even">
<td>shape_align_ShapeTanimoto</td>
<td>&gt;0.58</td>
<td>&gt;0.61</td>
<td>&gt;0.65</td>
<td>&gt;0.69</td>
<td>&gt;0.76</td>
<td>SIMILARITY</td>
</tr>
<tr class="odd">
<td>baseline_TanimotoDist</td>
<td>&lt;0.64</td>
<td>&lt;0.61</td>
<td>&lt;0.57</td>
<td>&lt;0.54</td>
<td>&lt;0.47</td>
<td>DISTANCE</td>
</tr>
<tr class="even">
<td>shape_align_TanimotoDist</td>
<td>&lt;0.55</td>
<td>&lt;0.53</td>
<td>&lt;0.50</td>
<td>&lt;0.47</td>
<td>&lt;0.42</td>
<td>DISTANCE</td>
</tr>
<tr class="odd">
<td>shape_align_noc_TanimotoDist</td>
<td>&lt;0.55</td>
<td>&lt;0.53</td>
<td>&lt;0.50</td>
<td>&lt;0.47</td>
<td>&lt;0.41</td>
<td>DISTANCE</td>
</tr>
<tr class="even">
<td>o3a_align_TanimotoDist</td>
<td>&lt;0.61</td>
<td>&lt;0.59</td>
<td>&lt;0.55</td>
<td>&lt;0.52</td>
<td>&lt;0.46</td>
<td>DISTANCE</td>
</tr>
<tr class="odd">
<td>crippeno3a_align_TanimotoDist</td>
<td>&lt;0.61</td>
<td>&lt;0.59</td>
<td>&lt;0.55</td>
<td>&lt;0.52</td>
<td>&lt;0.46</td>
<td>DISTANCE</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="non-alignment-based-approaches" class="level2">
<h2 class="anchored" data-anchor-id="non-alignment-based-approaches">Non-alignment based approaches</h2>
<section id="lobster-set-1" class="level3">
<h3 class="anchored" data-anchor-id="lobster-set-1">LOBSTER Set</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 39%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
</colgroup>
<thead>
<tr class="header">
<th>metric</th>
<th>70%</th>
<th>80%</th>
<th>90%</th>
<th>95%</th>
<th>99%</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>USR_score</td>
<td>&gt;0.66</td>
<td>&gt;0.71</td>
<td>&gt;0.76</td>
<td>&gt;0.80</td>
<td>&gt;0.87</td>
<td>SIMILARITY</td>
</tr>
<tr class="even">
<td>noh_USR_score</td>
<td>&gt;0.66</td>
<td>&gt;0.71</td>
<td>&gt;0.76</td>
<td>&gt;0.80</td>
<td>&gt;0.86</td>
<td>SIMILARITY</td>
</tr>
<tr class="odd">
<td>AP3D_DiceSimilarity</td>
<td>&gt;0.49</td>
<td>&gt;0.54</td>
<td>&gt;0.60</td>
<td>&gt;0.63</td>
<td>&gt;0.70</td>
<td>SIMILARITY</td>
</tr>
<tr class="even">
<td>noh_AP3D_DiceSimilarity</td>
<td>&gt;0.29</td>
<td>&gt;0.33</td>
<td>&gt;0.38</td>
<td>&gt;0.43</td>
<td>&gt;0.51</td>
<td>SIMILARITY</td>
</tr>
<tr class="odd">
<td>E3FP_DiceSimilarity</td>
<td>&gt;0.28</td>
<td>&gt;0.31</td>
<td>&gt;0.34</td>
<td>&gt;0.37</td>
<td>&gt;0.43</td>
<td>SIMILARITY</td>
</tr>
<tr class="even">
<td>noh_E3FP_DiceSimilarity</td>
<td>&gt;0.23</td>
<td>&gt;0.26</td>
<td>&gt;0.29</td>
<td>&gt;0.32</td>
<td>&gt;0.38</td>
<td>SIMILARITY</td>
</tr>
</tbody>
</table>
</section>
<section id="chembl-set-1" class="level3">
<h3 class="anchored" data-anchor-id="chembl-set-1">ChEMBL Set</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 39%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
</colgroup>
<thead>
<tr class="header">
<th>metric</th>
<th>70%</th>
<th>80%</th>
<th>90%</th>
<th>95%</th>
<th>99%</th>
<th>type</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>USR_score</td>
<td>&gt;0.65</td>
<td>&gt;0.69</td>
<td>&gt;0.74</td>
<td>&gt;0.78</td>
<td>&gt;0.84</td>
<td>SIMILARITY</td>
</tr>
<tr class="even">
<td>noh_USR_score</td>
<td>&gt;0.65</td>
<td>&gt;0.69</td>
<td>&gt;0.74</td>
<td>&gt;0.78</td>
<td>&gt;0.84</td>
<td>SIMILARITY</td>
</tr>
<tr class="odd">
<td>AP3D_DiceSimilarity</td>
<td>&gt;0.57</td>
<td>&gt;0.60</td>
<td>&gt;0.65</td>
<td>&gt;0.68</td>
<td>&gt;0.73</td>
<td>SIMILARITY</td>
</tr>
<tr class="even">
<td>noh_AP3D_DiceSimilarity</td>
<td>&gt;0.34</td>
<td>&gt;0.37</td>
<td>&gt;0.42</td>
<td>&gt;0.45</td>
<td>&gt;0.51</td>
<td>SIMILARITY</td>
</tr>
<tr class="odd">
<td>E3FP_DiceSimilarity</td>
<td>&gt;0.30</td>
<td>&gt;0.33</td>
<td>&gt;0.35</td>
<td>&gt;0.38</td>
<td>&gt;0.42</td>
<td>SIMILARITY</td>
</tr>
<tr class="even">
<td>noh_E3FP_DiceSimilarity</td>
<td>&gt;0.26</td>
<td>&gt;0.28</td>
<td>&gt;0.30</td>
<td>&gt;0.32</td>
<td>&gt;0.36</td>
<td>SIMILARITY</td>
</tr>
</tbody>
</table>
<div id="209d81ae" class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-4">IPythonConsole.ipython_3d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> matplotlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb1-7">plt.style.use(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tableau-colorblind10'</span>)</span>
<span id="cb1-8">plt.rcParams[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'font.size'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'16'</span></span>
<span id="cb1-9"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>load_ext sql</span>
<span id="cb1-12"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.feedback<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span></code></pre></div>
</div>
<div id="f3d421d6" class="cell" data-scrolled="true">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb2-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.09.3</code></pre>
</div>
</div>
</section>
</section>
</section>
<section id="getting-started" class="level1">
<h1>Getting started</h1>
<div id="dc28d683" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> lwreg</span>
<span id="cb4-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lwreg <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> utils</span></code></pre></div>
</div>
<p>Load our lwreg configuration from the database we created before:</p>
<div id="e184d2ce" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">config <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.configure_from_database(dbname<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'lobster_112024'</span>,dbtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'postgresql'</span>)</span>
<span id="cb5-2">lwreg.set_default_config(config)</span>
<span id="cb5-3"></span>
<span id="cb5-4">config</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<pre><code>{'dbname': 'lobster_112024',
 'dbtype': 'postgresql',
 'cacheConnection': True,
 'standardization': 'none',
 'removeHs': 1,
 'useTautomerHashv2': 0,
 'registerConformers': 1,
 'numConformerDigits': 3,
 'lwregSchema': ''}</code></pre>
</div>
</div>
</section>
<section id="random-pairs-from-lobster" class="level1">
<h1>Random pairs from LOBSTER</h1>
<p>Get a map from (nm,pdb) tuples to (molregno,confid,molblock):</p>
<div id="8ff64590" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-2">    select ligname,pdb,molregno,conf_id,molblock <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-3">    <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.all_ligands join conformers using (molregno,conf_id)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb7-4">ligs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb7-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> nm,pdb,mrn,cid,mb <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d:</span>
<span id="cb7-6">    mol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb,removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb7-7">    mol_noh <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb)</span>
<span id="cb7-8">    ligs[(mrn,cid)] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (nm,pdb,mb,mol,mol_noh)</span></code></pre></div>
</div>
<p>Create a bunch of random pairs:</p>
<div id="5aab111d" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> random</span>
<span id="cb8-2">random.seed(<span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xa100f</span>)</span>
<span id="cb8-3"></span>
<span id="cb8-4">ks <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(ligs.keys())</span>
<span id="cb8-5">base <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb8-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">while</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(base)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25000</span>:</span>
<span id="cb8-7">    t <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ks[:]</span>
<span id="cb8-8">    random.shuffle(t)</span>
<span id="cb8-9">    base.extend(t)</span>
<span id="cb8-10">tbase <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> base[:]</span>
<span id="cb8-11">random.shuffle(tbase)</span>
<span id="cb8-12">pairs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>((<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(x,y),<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(x,y)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x,y <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(base,tbase) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span>y)</span>
<span id="cb8-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(pairs)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>25029</code></pre>
</div>
</div>
<div id="41f0b1a6" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">pairs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(pairs)[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">25000</span>]</span>
<span id="cb10-2">pairs[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<pre><code>[((453, 453), (925, 925)),
 ((1574, 1574), (2733, 2733)),
 ((175, 175), (3247, 3247)),
 ((1719, 1719), (3118, 3118)),
 ((3054, 3054), (3458, 3458))]</code></pre>
</div>
</div>
<section id="shape-based-alignment" class="level2">
<h2 class="anchored" data-anchor-id="shape-based-alignment">Shape-based alignment</h2>
<p>Let’s see what we get when we perform shape-based alignment using the crystal conformers.</p>
<p>Start by aligning the crystal conformers.</p>
<div id="d2cc720a" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> gzip,pickle</span>
<span id="cb12-2">res_accum <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pickle.load(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./results/3d_random_distances.pkl'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rb'</span>))</span>
<span id="cb12-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdShapeAlign</span>
<span id="cb12-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdShapeHelpers</span>
<span id="cb12-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolTransforms</span></code></pre></div>
</div>
<div id="7f4e759e" class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> collections <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> defaultdict</span>
<span id="cb13-2">res_accum <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> defaultdict(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>)</span></code></pre></div>
</div>
<div id="571327f7" class="cell" data-execution_count="34">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdShapeAlign</span>
<span id="cb14-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdShapeHelpers</span>
<span id="cb14-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolTransforms</span>
<span id="cb14-4"></span>
<span id="cb14-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> tqdm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb14-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c1,c2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairs):</span>
<span id="cb14-7">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb14-8">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb14-9">    rdMolTransforms.CanonicalizeConformer(m1.GetConformer())</span>
<span id="cb14-10">    rdMolTransforms.CanonicalizeConformer(m2.GetConformer())</span>
<span id="cb14-11">    </span>
<span id="cb14-12">    st,ct <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m1,m2,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb14-13">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTanimoto'</span>].append(st)</span>
<span id="cb14-14">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ColorTanimoto'</span>].append(ct)</span>
<span id="cb14-15">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span>
<span id="cb14-16"></span>
<span id="cb14-17">    rdMolTransforms.CanonicalizeConformer(m1.GetConformer())</span>
<span id="cb14-18">    rdMolTransforms.CanonicalizeConformer(m2.GetConformer())</span>
<span id="cb14-19">    st,ct <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m1,m2,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>,useColors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb14-20">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_noc_ShapeTanimoto'</span>].append(st)</span>
<span id="cb14-21">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_noc_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span>
<span id="cb14-22">    </span>
<span id="cb14-23"></span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|███████████████████████████████████████████████████████████████████████| 25000/25000 [00:47&lt;00:00, 530.84it/s]</code></pre>
</div>
</div>
</section>
<section id="open3dalign" class="level2">
<h2 class="anchored" data-anchor-id="open3dalign">Open3DAlign</h2>
<p>What about an alternative 3D alignment algorithm? Let’s try aligning with Paolo Tosco’s <a href="https://link.springer.com/article/10.1007/s10822-011-9462-9">Open3DAlign</a>:</p>
<div id="e6ecccef" class="cell" data-execution_count="36">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolAlign</span>
<span id="cb16-2"></span>
<span id="cb16-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c1,c2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairs):</span>
<span id="cb16-4">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb16-5">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb16-6">    </span>
<span id="cb16-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb16-8">        o3a <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolAlign.GetO3A(m2,m1)</span>
<span id="cb16-9">        rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Align()</span>
<span id="cb16-10">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Score()</span>
<span id="cb16-11">        res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_rmsd'</span>].append(rmsd)</span>
<span id="cb16-12">        res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_scpre'</span>].append(score)</span>
<span id="cb16-13">        res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span>
<span id="cb16-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>:</span>
<span id="cb16-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">pass</span></span>
<span id="cb16-16">    </span>
<span id="cb16-17"></span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|███████████████████████████████████████████████████████████████████████| 25000/25000 [02:06&lt;00:00, 198.35it/s]</code></pre>
</div>
</div>
<p>The RDKit has a variation on Open3DAlign that uses atomic contributions to the MolLogP value instead of MMFF94 atom types</p>
<div id="6e9ee47e" class="cell" data-execution_count="37">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolAlign</span>
<span id="cb18-2"></span>
<span id="cb18-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c1,c2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairs):</span>
<span id="cb18-4">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb18-5">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb18-6">    </span>
<span id="cb18-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb18-8">        o3a <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolAlign.GetCrippenO3A(m2,m1)</span>
<span id="cb18-9">        rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Align()</span>
<span id="cb18-10">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Score()</span>
<span id="cb18-11">        res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_rmsd'</span>].append(rmsd)</span>
<span id="cb18-12">        res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_scpre'</span>].append(score)</span>
<span id="cb18-13">        res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span>
<span id="cb18-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>:</span>
<span id="cb18-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">pass</span></span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|███████████████████████████████████████████████████████████████████████| 25000/25000 [01:08&lt;00:00, 366.77it/s]</code></pre>
</div>
</div>
</section>
<section id="baseline-canonical-alignment" class="level2">
<h2 class="anchored" data-anchor-id="baseline-canonical-alignment">Baseline: canonical alignment</h2>
<p>Just put the molecules in their principle-axis frame</p>
<div id="4b2bfa9f" class="cell" data-execution_count="93">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1">res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb20-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> tqdm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb20-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c1,c2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairs):</span>
<span id="cb20-4">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb20-5">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb20-6">    rdMolTransforms.CanonicalizeConformer(m1.GetConformer())</span>
<span id="cb20-7">    rdMolTransforms.CanonicalizeConformer(m2.GetConformer())</span>
<span id="cb20-8">    </span>
<span id="cb20-9">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|██████████████████████████████████████████████████████████████████████| 25000/25000 [00:07&lt;00:00, 3483.93it/s]</code></pre>
</div>
</div>
</section>
<section id="baseline-tanimoto-shape-score-in-the-canonical-alignment" class="level2">
<h2 class="anchored" data-anchor-id="baseline-tanimoto-shape-score-in-the-canonical-alignment">Baseline: tanimoto shape score in the canonical alignment</h2>
<p>In the v2025.09.3 RDKit release (released the day before I wrote this post), Dave Cosgrove added a function allowing two molecules to be scored using the Pubchem shape alignment code without doing the alignment first. This gives a score that is directly comparable to what you get when you do an alignment.</p>
<div id="c3454219" class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1">res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_ShapeTanimoto'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb22-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> tqdm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb22-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c1,c2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairs):</span>
<span id="cb22-4">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb22-5">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb22-6">    rdMolTransforms.CanonicalizeConformer(m1.GetConformer())</span>
<span id="cb22-7">    rdMolTransforms.CanonicalizeConformer(m2.GetConformer())</span>
<span id="cb22-8">    opts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.ShapeInputOptions()</span>
<span id="cb22-9">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_ShapeTanimoto'</span>].append(rdShapeAlign.ScoreMol(m1,m2,opts,opts))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|██████████████████████████████████████████████████████████████████████| 25000/25000 [00:05&lt;00:00, 4227.54it/s]</code></pre>
</div>
</div>
<div id="252a2d60" class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pickle</span>
<span id="cb24-2">pickle.dump(res_accum,<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./results/3d_random_distances.pkl'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'wb+'</span>))</span></code></pre></div>
</div>
<div id="2c65c817" class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb25-2">plt.hist([[x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_ShapeTanimoto'</span>]],res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTanimoto'</span>]],</span>
<span id="cb25-3">         bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'aligned'</span>])</span>
<span id="cb25-4">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape tanimoto score'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb25-5">plt.legend()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d_files/figure-html/cell-17-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="d915aac5" class="cell" data-execution_count="95">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb26-2">plt.hist([res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>],],bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline'</span>,])</span>
<span id="cb26-3">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape tanimoto distance'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb26-4">plt.legend()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d_files/figure-html/cell-18-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="b32d77cc" class="cell" data-execution_count="96">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb27-2">plt.hist([res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>],res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_TanimotoDist'</span>],res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_noc_TanimotoDist'</span>]],bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'color'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'no color'</span>])</span>
<span id="cb27-3">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape tanimoto distance'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb27-4">plt.legend()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d_files/figure-html/cell-19-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="97253f37" class="cell" data-execution_count="97">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb28-2">plt.hist([res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>],res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_TanimotoDist'</span>],res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_TanimotoDist'</span>]],</span>
<span id="cb28-3">         bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A'</span>])</span>
<span id="cb28-4">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape tanimoto distance'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb28-5">plt.legend()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d_files/figure-html/cell-20-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="non-alignment-methods" class="level2">
<h2 class="anchored" data-anchor-id="non-alignment-methods">Non-alignment methods</h2>
<div id="9264b870" class="cell" data-execution_count="79">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolDescriptors</span>
<span id="cb29-2">res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_score'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb29-3">res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_USR_score'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb29-4"></span>
<span id="cb29-5"></span>
<span id="cb29-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c1,c2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairs):</span>
<span id="cb29-7">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb29-8">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb29-9">    usr1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m1)</span>
<span id="cb29-10">    usr2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m2)</span>
<span id="cb29-11">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_score'</span>].append(rdMolDescriptors.GetUSRScore(usr1,usr2))</span>
<span id="cb29-12">    </span>
<span id="cb29-13">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])</span>
<span id="cb29-14">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])</span>
<span id="cb29-15">    usr1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m1)</span>
<span id="cb29-16">    usr2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m2)</span>
<span id="cb29-17">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_USR_score'</span>].append(rdMolDescriptors.GetUSRScore(usr1,usr2))</span>
<span id="cb29-18">   </span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|██████████████████████████████████████████████████████████████████████| 25000/25000 [00:03&lt;00:00, 6754.96it/s]</code></pre>
</div>
</div>
<div id="89177da4" class="cell" data-execution_count="80">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetAtomPairGenerator(use2D<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb31-2">res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_DiceSimilarity'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb31-3">res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_AP3D_DiceSimilarity'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb31-4"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c1,c2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairs):</span>
<span id="cb31-5">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb31-6">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb31-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m1.GetNumConformers() <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m2.GetNumConformers():</span>
<span id="cb31-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb31-9">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m1)</span>
<span id="cb31-10">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m2)</span>
<span id="cb31-11">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_DiceSimilarity'</span>].append(DataStructs.DiceSimilarity(fp1,fp2))</span>
<span id="cb31-12"></span>
<span id="cb31-13">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])</span>
<span id="cb31-14">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])</span>
<span id="cb31-15">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m1)</span>
<span id="cb31-16">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m2)</span>
<span id="cb31-17">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_AP3D_DiceSimilarity'</span>].append(DataStructs.DiceSimilarity(fp1,fp2))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|██████████████████████████████████████████████████████████████████████| 25000/25000 [00:21&lt;00:00, 1148.58it/s]</code></pre>
</div>
</div>
<div id="73fd7228" class="cell" data-execution_count="83">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> logging</span>
<span id="cb33-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> e3fp.pipeline <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> fprints_from_mol</span>
<span id="cb33-3">logging.disable(logging.INFO)</span>
<span id="cb33-4"></span>
<span id="cb33-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_fp(m):</span>
<span id="cb33-6">    fp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fprints_from_mol(m,fprint_params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'counts'</span>:<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>})[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb33-7">    rdkfp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataStructs.ULongSparseIntVect(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>)</span>
<span id="cb33-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k,v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> fp.counts.items():</span>
<span id="cb33-9">        rdkfp[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(k)] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> v </span>
<span id="cb33-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> rdkfp</span>
<span id="cb33-11"></span>
<span id="cb33-12">res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_DiceSimilarity'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb33-13">res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_E3FP_DiceSimilarity'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb33-14"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c1,c2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairs):</span>
<span id="cb33-15">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb33-16">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb33-17">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m1)</span>
<span id="cb33-18">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m2)</span>
<span id="cb33-19">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_DiceSimilarity'</span>].append(DataStructs.DiceSimilarity(fp1,fp2))</span>
<span id="cb33-20">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c1][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])</span>
<span id="cb33-21">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[c2][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])</span>
<span id="cb33-22">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m1)</span>
<span id="cb33-23">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m2)</span>
<span id="cb33-24">    res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_E3FP_DiceSimilarity'</span>].append(DataStructs.DiceSimilarity(fp1,fp2))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|████████████████████████████████████████████████████████████████████████| 25000/25000 [39:01&lt;00:00, 10.68it/s]</code></pre>
</div>
</div>
<div id="08bd0098" class="cell" data-execution_count="84">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pickle</span>
<span id="cb35-2">pickle.dump(res_accum,<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./results/3d_random_distances.pkl'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'wb+'</span>))</span></code></pre></div>
</div>
</section>
</section>
<section id="random-chembl-molecules" class="level1">
<h1>Random ChEMBL molecules</h1>
<p>Using the pairs of random molecules I used for the <a href="https://greglandrum.github.io/rdkit-blog/posts/2021-05-18-fingerprint-thresholds1.html">fingerprint thresholds post</a> and other blog posts about similarity.</p>
<div id="15a5f21b" class="cell" data-execution_count="61">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb36-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> gzip</span>
<span id="cb36-2">ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x.split(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">b'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> gzip.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../data/chembl35_50K.mfp0.pairs.txt.gz'</span>)]</span>
<span id="cb36-3">ms1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb36-4">ms2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb36-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,row <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(ind):</span>
<span id="cb36-6">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.AddHs(Chem.MolFromSmiles(row[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]))</span>
<span id="cb36-7">    ms1.append(m1)</span>
<span id="cb36-8">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.AddHs(Chem.MolFromSmiles(row[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]))</span>
<span id="cb36-9">    ms2.append(m2)</span></code></pre></div>
</div>
<div id="3ddc180e" class="cell" data-execution_count="62">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1">random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>)</span>
<span id="cb37-2">random.shuffle(ms2)</span></code></pre></div>
</div>
<div id="e6b792b6" class="cell" data-execution_count="63">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb38-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(ms1)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="63">
<pre><code>50000</code></pre>
</div>
</div>
<div id="97814a24" class="cell" data-execution_count="53">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb40-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb40-2">    <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> ipyparallel <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> ipp</span>
<span id="cb40-3">    rc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ipp.Client()</span>
<span id="cb40-4">    dview <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rc[:]</span>
<span id="cb40-5">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from rdkit import Chem'</span>)</span>
<span id="cb40-6">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from rdkit.Chem import rdDistGeom'</span>)</span>
<span id="cb40-7">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from rdkit.Chem import rdMolAlign'</span>)</span>
<span id="cb40-8">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from rdkit.Chem import rdMolTransforms'</span>)</span>
<span id="cb40-9">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from rdkit.Chem import rdShapeHelpers'</span>)</span>
<span id="cb40-10">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from rdkit.Chem import rdShapeAlign'</span>)</span>
<span id="cb40-11">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from rdkit.Chem import rdFingerprintGenerator'</span>)</span>
<span id="cb40-12">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from rdkit import DataStructs'</span>)</span>
<span id="cb40-13">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'from e3fp.pipeline import fprints_from_mol'</span>)</span>
<span id="cb40-14">    dview.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'import logging;logging.disable(logging.INFO)'</span>)</span>
<span id="cb40-15"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span>:</span>
<span id="cb40-16">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"could not use ipyparallel"</span>)</span>
<span id="cb40-17">    dview <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span></code></pre></div>
</div>
<p>Generate one ETKDGv3 conformer per molecule:</p>
<div id="09612e14" class="cell" data-execution_count="64">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb41-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDistGeom</span>
<span id="cb41-2">ms1c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dview.map_sync(<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:(rdDistGeom.EmbedMolecule(x,randomSeed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span>),x), ms1)</span>
<span id="cb41-3">ms2c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dview.map_sync(<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> x:(rdDistGeom.EmbedMolecule(x,randomSeed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span>),x), ms2)</span></code></pre></div>
</div>
<div id="480e2214" class="cell" data-execution_count="65">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb42-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pickle</span>
<span id="cb42-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> gzip</span>
<span id="cb42-3">pickle.dump((ms1c,ms2c),gzip.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./results/random_pairs_confs.pkl.gz'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'wb+'</span>))</span></code></pre></div>
</div>
<div id="a2eaed5b" class="cell">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb43-1">res_accum2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> defaultdict(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>)</span>
<span id="cb43-2"></span>
<span id="cb43-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m1,m2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(ms1c,ms2c)):</span>
<span id="cb43-4">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb43-5">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb43-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m1.GetNumConformers() <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m2.GetNumConformers():</span>
<span id="cb43-7">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb43-8">    rdMolTransforms.CanonicalizeConformer(m1.GetConformer())</span>
<span id="cb43-9">    rdMolTransforms.CanonicalizeConformer(m2.GetConformer())</span>
<span id="cb43-10"></span>
<span id="cb43-11">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span>
<span id="cb43-12"></span>
<span id="cb43-13">    st,ct <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m1,m2,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb43-14">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTanimoto'</span>].append(st)</span>
<span id="cb43-15">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ColorTanimoto'</span>].append(ct)</span>
<span id="cb43-16">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span>
<span id="cb43-17"></span>
<span id="cb43-18">    rdMolTransforms.CanonicalizeConformer(m1.GetConformer())</span>
<span id="cb43-19">    rdMolTransforms.CanonicalizeConformer(m2.GetConformer())</span>
<span id="cb43-20">    st,ct <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m1,m2,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>,useColors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb43-21">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_noc_ShapeTanimoto'</span>].append(st)</span>
<span id="cb43-22">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_noc_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span></code></pre></div>
</div>
<div id="889e9ec8" class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb44-1">res_accum2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pickle.load(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./results/3d_random_distances2.pkl'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rb'</span>))</span></code></pre></div>
</div>
<div id="69359a87" class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb45-1">res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_ShapeTanimoto'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb45-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m1,m2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(ms1c,ms2c)):</span>
<span id="cb45-3">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb45-4">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb45-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m1.GetNumConformers() <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m2.GetNumConformers():</span>
<span id="cb45-6">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb45-7">    rdMolTransforms.CanonicalizeConformer(m1.GetConformer())</span>
<span id="cb45-8">    rdMolTransforms.CanonicalizeConformer(m2.GetConformer())</span>
<span id="cb45-9">    opts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.ShapeInputOptions()</span>
<span id="cb45-10">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_ShapeTanimoto'</span>].append(rdShapeAlign.ScoreMol(m1,m2,opts,opts))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>50000it [00:12, 3934.62it/s]</code></pre>
</div>
</div>
<div id="867c11c6" class="cell" data-execution_count="70">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb47-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m1,m2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(ms1c,ms2c)):</span>
<span id="cb47-2">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb47-3">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb47-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m1.GetNumConformers() <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m2.GetNumConformers():</span>
<span id="cb47-5">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb47-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb47-7">        o3a <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolAlign.GetO3A(m2,m1)</span>
<span id="cb47-8">        rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Align()</span>
<span id="cb47-9">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Score()</span>
<span id="cb47-10">        res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_rmsd'</span>].append(rmsd)</span>
<span id="cb47-11">        res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_scpre'</span>].append(score)</span>
<span id="cb47-12">        res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))</span>
<span id="cb47-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>:</span>
<span id="cb47-14">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">pass</span></span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>50000it [07:06, 117.30it/s]</code></pre>
</div>
</div>
<div id="8f4235e4" class="cell" data-execution_count="71">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb49-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m1,m2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(ms1c,ms2c)):</span>
<span id="cb49-2">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb49-3">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb49-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m1.GetNumConformers() <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m2.GetNumConformers():</span>
<span id="cb49-5">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb49-6">    </span>
<span id="cb49-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb49-8">        o3a <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolAlign.GetCrippenO3A(m2,m1)</span>
<span id="cb49-9">        rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Align()</span>
<span id="cb49-10">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Score()</span>
<span id="cb49-11">        res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_rmsd'</span>].append(rmsd)</span>
<span id="cb49-12">        res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_scpre'</span>].append(score)</span>
<span id="cb49-13">        res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_TanimotoDist'</span>].append(rdShapeHelpers.ShapeTanimotoDist(m1,m2))   </span>
<span id="cb49-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>:</span>
<span id="cb49-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">pass</span></span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>50000it [03:43, 223.62it/s]</code></pre>
</div>
</div>
<div id="3dfabd2e" class="cell" data-execution_count="85">
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb51-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolDescriptors</span>
<span id="cb51-2"></span>
<span id="cb51-3">res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_score'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb51-4">res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_USR_score'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb51-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m1,m2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(ms1c,ms2c)):</span>
<span id="cb51-6">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb51-7">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb51-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m1.GetNumConformers() <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m2.GetNumConformers():</span>
<span id="cb51-9">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb51-10">    </span>
<span id="cb51-11">    usr1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m1)</span>
<span id="cb51-12">    usr2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m2)</span>
<span id="cb51-13">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_score'</span>].append(rdMolDescriptors.GetUSRScore(usr1,usr2))</span>
<span id="cb51-14">    </span>
<span id="cb51-15">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RemoveHs(m1)</span>
<span id="cb51-16">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RemoveHs(m2)</span>
<span id="cb51-17">    usr1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m1)</span>
<span id="cb51-18">    usr2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m2)</span>
<span id="cb51-19">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_USR_score'</span>].append(rdMolDescriptors.GetUSRScore(usr1,usr2))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>50000it [00:18, 2744.31it/s]</code></pre>
</div>
</div>
<div id="bbcda974" class="cell" data-execution_count="86">
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb53-1">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetAtomPairGenerator(use2D<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb53-2"></span>
<span id="cb53-3">res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_DiceSimilarity'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb53-4">res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_AP3D_DiceSimilarity'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb53-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m1,m2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(ms1c,ms2c)):</span>
<span id="cb53-6">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb53-7">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb53-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m1.GetNumConformers() <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m2.GetNumConformers():</span>
<span id="cb53-9">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb53-10">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m1)</span>
<span id="cb53-11">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m2)</span>
<span id="cb53-12">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_DiceSimilarity'</span>].append(DataStructs.DiceSimilarity(fp1,fp2))</span>
<span id="cb53-13"></span>
<span id="cb53-14">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RemoveHs(m1)</span>
<span id="cb53-15">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RemoveHs(m2)</span>
<span id="cb53-16">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m1)</span>
<span id="cb53-17">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m2)</span>
<span id="cb53-18">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_AP3D_DiceSimilarity'</span>].append(DataStructs.DiceSimilarity(fp1,fp2))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>50000it [01:12, 688.09it/s]</code></pre>
</div>
</div>
<div id="d5b1beec" class="cell" data-execution_count="87">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb55-1">res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_DiceSimilarity'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb55-2">res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_E3FP_DiceSimilarity'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb55-3"></span>
<span id="cb55-4"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m1,m2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(ms1c,ms2c)):</span>
<span id="cb55-5">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb55-6">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb55-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m1.GetNumConformers() <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> m2.GetNumConformers():</span>
<span id="cb55-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb55-9">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m1)</span>
<span id="cb55-10">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m2)</span>
<span id="cb55-11">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_DiceSimilarity'</span>].append(DataStructs.DiceSimilarity(fp1,fp2))</span>
<span id="cb55-12"></span>
<span id="cb55-13">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RemoveHs(m1)</span>
<span id="cb55-14">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RemoveHs(m2)</span>
<span id="cb55-15">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m1)</span>
<span id="cb55-16">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m2)</span>
<span id="cb55-17">    res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_E3FP_DiceSimilarity'</span>].append(DataStructs.DiceSimilarity(fp1,fp2))</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>50000it [1:49:13,  7.63it/s]</code></pre>
</div>
</div>
<div id="05eb636b" class="cell" data-execution_count="88">
<div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb57-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pickle</span>
<span id="cb57-2">pickle.dump(res_accum2,<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./results/3d_random_distances2.pkl'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'wb+'</span>))</span></code></pre></div>
</div>
<div id="0118af31" class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb58-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb58-2">plt.hist([[x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_ShapeTanimoto'</span>]],res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTanimoto'</span>]],</span>
<span id="cb58-3">         bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'aligned'</span>])</span>
<span id="cb58-4">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape tanimoto score'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb58-5">plt.legend()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d_files/figure-html/cell-40-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="bacc03dc" class="cell" data-scrolled="false" data-execution_count="99">
<div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb59-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb59-2">plt.hist([res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>],res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_TanimotoDist'</span>],res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_noc_TanimotoDist'</span>]],bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'color'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'no color'</span>])</span>
<span id="cb59-3">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape tanimoto distance'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb59-4">plt.legend()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d_files/figure-html/cell-41-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="87d00871" class="cell" data-execution_count="98">
<div class="sourceCode cell-code" id="cb60" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb60-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb60-2">plt.hist([res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>],res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_TanimotoDist'</span>],res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_TanimotoDist'</span>]],</span>
<span id="cb60-3">         bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A'</span>])</span>
<span id="cb60-4">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape tanimoto distance'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb60-5">plt.legend()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d_files/figure-html/cell-42-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="b1724537" class="cell" data-execution_count="58">
<div class="sourceCode cell-code" id="cb61" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb61-1">res_accum2.keys()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="58">
<pre><code>dict_keys(['baseline_TanimotoDist', 'shape_align_ShapeTanimoto', 'shape_align_ColorTanimoto', 'shape_align_TanimotoDist', 'shape_align_noc_ShapeTanimoto', 'shape_align_noc_TanimotoDist', 'o3a_align_rmsd', 'o3a_align_scpre', 'o3a_align_TanimotoDist', 'crippeno3a_align_rmsd', 'crippeno3a_align_scpre', 'crippeno3a_align_TanimotoDist', 'USR_score', 'noh_USR_score', 'AP3D_DiceSimilarity', 'noh_AP3D_DiceSimilarity', 'E3FP_DiceSimilarity', 'noh_E3FP_DiceSimilarity', 'baseline_ShapeTanimoto'])</code></pre>
</div>
</div>
<div id="24eda1e0" class="cell" data-execution_count="59">
<div class="sourceCode cell-code" id="cb63" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb63-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb63-2">plt.hist([res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_scpre'</span>],res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_scpre'</span>],],</span>
<span id="cb63-3">         bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A'</span>])</span>
<span id="cb63-4">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'alignment score'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb63-5">plt.legend()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d_files/figure-html/cell-44-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="summary-stats" class="level1">
<h1>Summary stats</h1>
<div id="bbf8c974" class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb64-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span></code></pre></div>
</div>
<div id="b1863315" class="cell" data-execution_count="57">
<div class="sourceCode cell-code" id="cb65" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb65-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'****   LOBSTER Set ****'</span>)</span>
<span id="cb65-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'| baseline_ShapeTanimoto</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">|'</span>,</span>
<span id="cb65-3">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' | '</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&gt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%.2f</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> np.quantile([x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_ShapeTanimoto'</span>]],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.99</span>])]),</span>
<span id="cb65-4">     <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">| SIMILARITY |'</span>)</span>
<span id="cb65-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'| shape_align_ShapeTanimoto</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">|'</span>,</span>
<span id="cb65-6">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' | '</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&gt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%.2f</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> np.quantile([x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> res_accum[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTanimoto'</span>]],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.99</span>])]),</span>
<span id="cb65-7">     <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">| SIMILARITY |'</span>)</span>
<span id="cb65-8"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_TanimotoDist'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_noc_TanimotoDist'</span>,</span>
<span id="cb65-9">         <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_TanimotoDist'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_TanimotoDist'</span>]:</span>
<span id="cb65-10">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'| </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>k<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">|'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' | '</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%.2f</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> np.quantile(res_accum[k],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>])]),</span>
<span id="cb65-11">         <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">| DISTANCE |'</span>)</span>
<span id="cb65-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'****   ChEMBL Set ****'</span>)</span>
<span id="cb65-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'| baseline_ShapeTanimoto</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">|'</span>,</span>
<span id="cb65-14">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' | '</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&gt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%.2f</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> np.quantile([x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_ShapeTanimoto'</span>]],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.99</span>])]),</span>
<span id="cb65-15">     <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">| SIMILARITY |'</span>)</span>
<span id="cb65-16"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'| shape_align_ShapeTanimoto</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">|'</span>,</span>
<span id="cb65-17">      <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' | '</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&gt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%.2f</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> np.quantile([x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> res_accum2[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTanimoto'</span>]],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.99</span>])]),</span>
<span id="cb65-18">     <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">| SIMILARITY |'</span>)</span>
<span id="cb65-19"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'baseline_TanimotoDist'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_TanimotoDist'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_noc_TanimotoDist'</span>,</span>
<span id="cb65-20">         <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'o3a_align_TanimotoDist'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'crippeno3a_align_TanimotoDist'</span>]:</span>
<span id="cb65-21">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'| </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>k<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">|'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' | '</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%.2f</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> np.quantile(res_accum2[k],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>])]),</span>
<span id="cb65-22">         <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">| DISTANCE |'</span>)</span>
<span id="cb65-23"></span>
<span id="cb65-24"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n\n\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> ------------------   No alignment  --------------------'</span>)</span>
<span id="cb65-25"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'****   LOBSTER Set ****'</span>)</span>
<span id="cb65-26"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_score'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_USR_score'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_DiceSimilarity'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_AP3D_DiceSimilarity'</span>,</span>
<span id="cb65-27">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_DiceSimilarity'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_E3FP_DiceSimilarity'</span>]:</span>
<span id="cb65-28">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'| </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>k<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">|'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' | '</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&gt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%.2f</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> np.quantile(res_accum[k],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.99</span>])]),</span>
<span id="cb65-29">         <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">| SIMILARITY |'</span>)</span>
<span id="cb65-30">    </span>
<span id="cb65-31"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'****   ChEMBL Set ****'</span>)</span>
<span id="cb65-32"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_score'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_USR_score'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_DiceSimilarity'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_AP3D_DiceSimilarity'</span>,</span>
<span id="cb65-33">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_DiceSimilarity'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'noh_E3FP_DiceSimilarity'</span>]:</span>
<span id="cb65-34">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'| </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>k<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">|'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">' | '</span>.join([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&gt;</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%.2f</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> np.quantile(res_accum2[k],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.99</span>])]),</span>
<span id="cb65-35">         <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">| SIMILARITY |'</span>)</span>
<span id="cb65-36">    </span>
<span id="cb65-37">    </span>
<span id="cb65-38">    </span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>****   LOBSTER Set ****
| baseline_ShapeTanimoto    | &gt;0.49 | &gt;0.54 | &gt;0.62 | &gt;0.69 | &gt;0.81     | SIMILARITY |
| shape_align_ShapeTanimoto | &gt;0.62 | &gt;0.66 | &gt;0.71 | &gt;0.76 | &gt;0.85     | SIMILARITY |
| baseline_TanimotoDist | &lt;0.61 | &lt;0.57 | &lt;0.52 | &lt;0.48 | &lt;0.39     | DISTANCE |
| shape_align_TanimotoDist  | &lt;0.53 | &lt;0.50 | &lt;0.46 | &lt;0.42 | &lt;0.34     | DISTANCE |
| shape_align_noc_TanimotoDist  | &lt;0.52 | &lt;0.49 | &lt;0.45 | &lt;0.42 | &lt;0.34     | DISTANCE |
| o3a_align_TanimotoDist    | &lt;0.58 | &lt;0.55 | &lt;0.51 | &lt;0.46 | &lt;0.36     | DISTANCE |
| crippeno3a_align_TanimotoDist | &lt;0.59 | &lt;0.56 | &lt;0.51 | &lt;0.47 | &lt;0.37     | DISTANCE |
****   ChEMBL Set ****
| baseline_ShapeTanimoto    | &gt;0.44 | &gt;0.48 | &gt;0.54 | &gt;0.59 | &gt;0.69     | SIMILARITY |
| shape_align_ShapeTanimoto | &gt;0.58 | &gt;0.61 | &gt;0.65 | &gt;0.69 | &gt;0.76     | SIMILARITY |
| baseline_TanimotoDist | &lt;0.64 | &lt;0.61 | &lt;0.57 | &lt;0.54 | &lt;0.47     | DISTANCE |
| shape_align_TanimotoDist  | &lt;0.55 | &lt;0.53 | &lt;0.50 | &lt;0.47 | &lt;0.42     | DISTANCE |
| shape_align_noc_TanimotoDist  | &lt;0.55 | &lt;0.53 | &lt;0.50 | &lt;0.47 | &lt;0.41     | DISTANCE |
| o3a_align_TanimotoDist    | &lt;0.61 | &lt;0.59 | &lt;0.55 | &lt;0.52 | &lt;0.46     | DISTANCE |
| crippeno3a_align_TanimotoDist | &lt;0.61 | &lt;0.59 | &lt;0.55 | &lt;0.52 | &lt;0.46     | DISTANCE |



 ------------------   No alignment  --------------------
****   LOBSTER Set ****
| USR_score | &gt;0.66 | &gt;0.71 | &gt;0.76 | &gt;0.80 | &gt;0.87     | SIMILARITY |
| noh_USR_score | &gt;0.66 | &gt;0.71 | &gt;0.76 | &gt;0.80 | &gt;0.86     | SIMILARITY |
| AP3D_DiceSimilarity   | &gt;0.49 | &gt;0.54 | &gt;0.60 | &gt;0.63 | &gt;0.70     | SIMILARITY |
| noh_AP3D_DiceSimilarity   | &gt;0.29 | &gt;0.33 | &gt;0.38 | &gt;0.43 | &gt;0.51     | SIMILARITY |
| E3FP_DiceSimilarity   | &gt;0.28 | &gt;0.31 | &gt;0.34 | &gt;0.37 | &gt;0.43     | SIMILARITY |
| noh_E3FP_DiceSimilarity   | &gt;0.23 | &gt;0.26 | &gt;0.29 | &gt;0.32 | &gt;0.38     | SIMILARITY |
****   ChEMBL Set ****
| USR_score | &gt;0.65 | &gt;0.69 | &gt;0.74 | &gt;0.78 | &gt;0.84     | SIMILARITY |
| noh_USR_score | &gt;0.65 | &gt;0.69 | &gt;0.74 | &gt;0.78 | &gt;0.84     | SIMILARITY |
| AP3D_DiceSimilarity   | &gt;0.57 | &gt;0.60 | &gt;0.65 | &gt;0.68 | &gt;0.73     | SIMILARITY |
| noh_AP3D_DiceSimilarity   | &gt;0.34 | &gt;0.37 | &gt;0.42 | &gt;0.45 | &gt;0.51     | SIMILARITY |
| E3FP_DiceSimilarity   | &gt;0.30 | &gt;0.33 | &gt;0.35 | &gt;0.38 | &gt;0.42     | SIMILARITY |
| noh_E3FP_DiceSimilarity   | &gt;0.26 | &gt;0.28 | &gt;0.30 | &gt;0.32 | &gt;0.36     | SIMILARITY |</code></pre>
</div>
</div>


</section>

 ]]></description>
  <category>datasets</category>
  <category>3d</category>
  <category>reference</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-11-30-thresholds-for-random-3d.html</guid>
  <pubDate>Sat, 29 Nov 2025 23:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/thresholds-for-random-3D-1.png" medium="image" type="image/png" height="102" width="144"/>
</item>
<item>
  <title>Working with the LOBSTER Data set III</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3.html</link>
  <description><![CDATA[ 




<p>This is my third post looking at the LOBSTER data set <a href="https://doi.org/10.1007/s10822-024-00581-1">published last year</a> by the Rarey and BioSolveIT group. Here are links for <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-11-08-working-with-lobster-1.html">Part 1</a> and <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2.html">Part 2</a>.</p>
<p>This time I look at applying the different 3D alignment algorithms the RDKit includes to the data.</p>
<div id="209d81ae" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-4">IPythonConsole.ipython_3d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> matplotlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb1-7">plt.style.use(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tableau-colorblind10'</span>)</span>
<span id="cb1-8">plt.rcParams[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'font.size'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'16'</span></span>
<span id="cb1-9"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>load_ext sql</span>
<span id="cb1-12"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.feedback<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span></code></pre></div>
</div>
<div id="f3d421d6" class="cell" data-scrolled="true" data-execution_count="10">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb2-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.09.2</code></pre>
</div>
</div>
<section id="getting-started" class="level1">
<h1>Getting started</h1>
<div id="dc28d683" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> lwreg</span>
<span id="cb4-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lwreg <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> utils</span></code></pre></div>
</div>
<p>Load our lwreg configuration from the database we created before:</p>
<div id="e184d2ce" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">config <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.configure_from_database(dbname<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'lobster_112024'</span>,dbtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'postgresql'</span>)</span>
<span id="cb5-2">lwreg.set_default_config(config)</span>
<span id="cb5-3"></span>
<span id="cb5-4">config</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>{'dbname': 'lobster_112024',
 'dbtype': 'postgresql',
 'cacheConnection': True,
 'standardization': 'none',
 'removeHs': 1,
 'useTautomerHashv2': 0,
 'registerConformers': 1,
 'numConformerDigits': 3,
 'lwregSchema': ''}</code></pre>
</div>
</div>
<p>Get a map from (nm,pdb) tuples to (molregno,confid,molblock):</p>
<div id="8ff64590" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-2">    select ligname,pdb,molregno,conf_id,molblock <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb7-3">    <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.all_ligands join conformers using (molregno,conf_id)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb7-4">ligs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb7-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> nm,pdb,mrn,cid,mb <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d:</span>
<span id="cb7-6">    mol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb,removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb7-7">    mol_noh <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb)</span>
<span id="cb7-8">    ligs[(nm,pdb.lower())] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (mrn,cid,mb,mol,mol_noh)</span></code></pre></div>
</div>
<div id="be748ca6" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">pairstats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb8-2">    select <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.pair_stats</span>
<span id="cb8-3">pairstats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(pairstats.dicts())</span></code></pre></div>
</div>
<div id="b00099e2" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> stats</span>
<span id="cb9-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> comparison_plot(pairstats,metric1,metric2,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb9-3">    x1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x[metric1] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats]</span>
<span id="cb9-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> invert1:</span>
<span id="cb9-5">        x1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> x1]</span>
<span id="cb9-6">        metric1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'1-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>metric1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span></span>
<span id="cb9-7">    x2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x[metric2] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats]</span>
<span id="cb9-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> invert2:</span>
<span id="cb9-9">        x2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> x2]</span>
<span id="cb9-10">        metric2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'1-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>metric2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span></span>
<span id="cb9-11">    r,_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.spearmanr(x1,x2)</span>
<span id="cb9-12">    tau,_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.kendalltau(x1,x2)</span>
<span id="cb9-13"></span>
<span id="cb9-14">    plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb9-15">    plt.subplot(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb9-16">    plt.scatter(x1,x2,alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>,s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb9-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> includeLine:</span>
<span id="cb9-18">        plt.plot((<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k-'</span>)</span>
<span id="cb9-19">    plt.xlabel(metric1)</span>
<span id="cb9-20">    plt.ylabel(metric2)</span>
<span id="cb9-21">    plt.title(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'rho=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>r<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">, tau=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tau<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb9-22">    plt.subplot(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb9-23">    plt.hexbin(x1,x2,cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Blues'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb9-24">    plt.tight_layout()</span></code></pre></div>
</div>
</section>
<section id="comparing-shape-similarity-metrics-in-the-lobster-data-set" class="level1">
<h1>Comparing shape-similarity metrics in the LOBSTER data set</h1>
<p>First let’s look at the metrics in the LOBSTER data set:</p>
<div id="40fd7306" class="cell" data-execution_count="34">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tanimoto_distance'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-9-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>In the LOBSTER paper they use an asymmetric definition of the Tversky metric to detect how much of the template is covered by the probe. This yield small Tversky distances (large Tversky index values) when a small template is completely covered by a large probe. The Tanimoto distance, on the other hand, is symmetric, so it’s always greater than 1-Tversky here.</p>
<p>And now the protrude distance, which looks at how much of the larger shape is <em>not</em> overlapping the smaller shape.</p>
<div id="152edde9" class="cell" data-scrolled="false" data-execution_count="35">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_protrude_distance'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-10-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>The protrude distance is, by default, symmetric: it is the fraction of the smaller of the two shapes that protrudes outside the larger of the two.</p>
</section>
<section id="shape-based-alignment" class="level1">
<h1>Shape-based alignment</h1>
<p>Let’s see what we get when we perform shape-based alignment using the crystal conformers.</p>
<p>Start by aligning the crystal conformers.</p>
<div id="571327f7" class="cell" data-execution_count="45">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdShapeAlign</span>
<span id="cb12-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdShapeHelpers</span>
<span id="cb12-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolTransforms</span>
<span id="cb12-4"></span>
<span id="cb12-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> tqdm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb12-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> d <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairstats):</span>
<span id="cb12-7">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb12-8">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb12-9">    rdMolTransforms.CanonicalizeConformer(m1.GetConformer())</span>
<span id="cb12-10">    rdMolTransforms.CanonicalizeConformer(m2.GetConformer())</span>
<span id="cb12-11">    </span>
<span id="cb12-12">    st,ct <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m1,m2,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb12-13">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTanimoto'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> st</span>
<span id="cb12-14">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ColorTanimoto'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ct</span>
<span id="cb12-15">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTversky'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1,m2,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb12-16">    </span>
<span id="cb12-17"></span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 72592/72592 [01:15&lt;00:00, 964.83it/s]</code></pre>
</div>
</div>
<div id="421db64b" class="cell" data-execution_count="37">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb14-2">                includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-12-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Look at examples where the ShapeTversky in the shape alignment is significantly lower than that in the crystal alignment:</p>
<div id="32327c0b" class="cell" data-scrolled="false" data-execution_count="39">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">pruned <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [r <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTversky'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>]</span>
<span id="cb15-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(pruned)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="39">
<pre><code>79</code></pre>
</div>
</div>
<div id="9c6ad641" class="cell" data-execution_count="40">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pruned[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb17-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>((d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>]),(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>]))</span>
<span id="cb17-3">m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb17-4">m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb17-5">m1c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1)</span>
<span id="cb17-6">m2c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2)</span>
<span id="cb17-7">rdMolTransforms.CanonicalizeConformer(m1c.GetConformer())</span>
<span id="cb17-8">rdMolTransforms.CanonicalizeConformer(m2c.GetConformer())</span>
<span id="cb17-9">st,ct <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m1c,m2c,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb17-10">tversky <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1c,m2c,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb17-11"></span>
<span id="cb17-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(st,ct)</span>
<span id="cb17-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>],tversky)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>('1D1_A_401', '4i5p') ('11G_A_401', '4i6b')
0.6968245697702556 0.11721759959746417
0.951 0.7622862091361766</code></pre>
</div>
</div>
<div id="740e66da" class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1">IPythonConsole.drawMols3D([m1,m2])</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17638177524668813" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17638177524668813" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17638177524668813 = null;
var warn = document.getElementById("3dmolwarning_17638177524668813");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17638177524668813 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17638177524668813"),{backgroundColor:"white"});
viewer_17638177524668813.zoomTo();
    viewer_17638177524668813.addModel("1D1_A_401\n     RDKit          3D\n\n 47 50  0  0  0  0  0  0  0  0999 V2000\n   13.6199    4.3394   10.4996 O   0  0  0  0  0  0  0  0  0  0  0  0\n    7.7618    4.8606    9.8156 N   0  0  0  0  0  0  0  0  0  0  0  0\n    6.3944    8.1554   10.4865 N   0  0  0  0  0  0  0  0  0  0  0  0\n    8.8228    6.9055   10.3637 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.3439    4.1903   10.1704 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.1937    7.0400   10.7008 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.4418    2.7500    9.8990 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.5846    6.7088    8.1521 C   0  0  0  0  0  0  0  0  0  0  0  0\n    5.1305    8.6178   10.3300 C   0  0  0  0  0  0  0  0  0  0  0  0\n    8.9115    4.1784    9.8385 C   0  0  0  0  0  0  0  0  0  0  0  0\n   13.2854    6.9331    9.4894 C   0  0  0  0  0  0  0  0  0  0  0  0\n    4.3546    7.6253    9.7593 C   0  0  0  0  0  0  0  0  0  0  0  0\n   11.2442   10.8146   10.4451 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0524   10.5627   11.7193 C   0  0  0  0  0  0  0  0  0  0  0  0\n    5.1599    6.5268    9.5477 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.9220    9.4513    9.8291 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.1073    9.0549   11.9258 C   0  0  0  0  0  0  0  0  0  0  0  0\n    7.7087    6.1931   10.0763 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.5167    4.9218   10.4741 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0267    6.3014   10.4072 C   0  0  0  0  0  0  0  0  0  0  0  0\n    6.4005    6.8977   10.0241 C   0  0  0  0  0  0  0  0  0  0  0  0\n   11.0272    8.4858   10.9997 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0969    4.8441   10.1252 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.5146    6.3877   10.7066 C   0  0  2  0  0  0  0  0  0  0  0  0\n    7.1705    8.6548   10.8716 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.5316    2.3851    9.7034 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0380    2.5991    9.1105 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.8225    2.2853   10.6985 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.1437    7.0883    7.4150 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.4572    5.7282    8.0029 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.6937    7.1629    8.1623 H   0  0  0  0  0  0  0  0  0  0  0  0\n    4.8102    9.5299   10.5859 H   0  0  0  0  0  0  0  0  0  0  0  0\n    8.9181    3.1963    9.6508 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.4138    7.9169    9.6149 H   0  0  0  0  0  0  0  0  0  0  0  0\n   14.1773    6.4822    9.4555 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.3815    7.6927    9.5391 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.7810   11.3615    9.8026 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.3974   11.2977   10.6676 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.6061   10.9996   12.5002 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.9774   10.9285   11.6164 H   0  0  0  0  0  0  0  0  0  0  0  0\n    4.9014    5.6518    9.1383 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.5827    9.2183    9.1156 H   0  0  0  0  0  0  0  0  0  0  0  0\n    9.9991    9.4442    9.4441 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.9105    8.8236   12.8786 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0076    8.6988   11.6755 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.1653    8.5658   11.5006 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0038    6.6046   11.5513 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 19  2  0\n  2 10  2  0\n  2 18  1  0\n  3  9  1  0\n  3 21  1  0\n  4 18  2  0\n  4 20  1  0\n  5  7  1  0\n  5 19  1  0\n  5 23  1  0\n  6 20  1  0\n  6 22  1  0\n  6 24  1  0\n  8 11  1  0\n  9 12  2  0\n 10 23  1  0\n 11 24  1  0\n 12 15  1  0\n 13 14  1  0\n 13 16  1  0\n 14 17  1  0\n 15 21  2  0\n 16 22  1  0\n 17 22  1  0\n 18 21  1  0\n 19 24  1  0\n 20 23  2  0\n  3 25  1  0\n  7 26  1  0\n  7 27  1  0\n  7 28  1  0\n  8 29  1  0\n  8 30  1  0\n  8 31  1  0\n  9 32  1  0\n 10 33  1  0\n 11 34  1  0\n 11 35  1  0\n 12 36  1  0\n 13 37  1  0\n 13 38  1  0\n 14 39  1  0\n 14 40  1  0\n 15 41  1  0\n 16 42  1  0\n 16 43  1  0\n 17 44  1  0\n 17 45  1  0\n 22 46  1  0\n 24 47  1  1\nM  END\n","sdf");
    viewer_17638177524668813.setStyle({"stick": {}});
    viewer_17638177524668813.addModel("11G_A_401\n     RDKit          3D\n\n 39 41  0  0  0  0  0  0  0  0999 V2000\n   13.7894    4.4428   10.3494 O   0  0  0  0  0  0  0  0  0  0  0  0\n    7.8726    4.6211    9.8263 N   0  0  0  0  0  0  0  0  0  0  0  0\n    8.8273    6.7054   10.3791 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.5066    4.1352   10.0896 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.2049    6.9725   10.6578 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.6873    2.7043    9.7926 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.4405    6.6667    7.9974 C   0  0  0  0  0  0  0  0  0  0  0  0\n    7.7641    5.9335   10.1031 C   0  0  0  0  0  0  0  0  0  0  0  0\n    9.0581    3.9966    9.8132 C   0  0  0  0  0  0  0  0  0  0  0  0\n   13.1998    6.9634    9.3043 C   0  0  0  0  0  0  0  0  0  0  0  0\n   11.1166   10.7637   10.5685 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0115   10.4860   11.7845 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.8537    9.4217    9.8798 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0927    8.9675   11.9237 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.6406    4.9395   10.3645 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0728    6.1742   10.3871 C   0  0  0  0  0  0  0  0  0  0  0  0\n   11.0071    8.4100   10.9985 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.2189    4.7262   10.0771 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.5553    6.3954   10.5958 C   0  0  2  0  0  0  0  0  0  0  0  0\n   10.7960    2.2839    9.6229 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.2638    2.6024    8.9819 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.1249    2.2549   10.5714 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.9324    7.0719    7.2267 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.3745    5.6773    7.8675 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.5218    7.0581    8.0493 H   0  0  0  0  0  0  0  0  0  0  0  0\n    6.8552    6.3505   10.1032 H   0  0  0  0  0  0  0  0  0  0  0  0\n    9.1110    3.0176    9.6163 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.2651    7.9563    9.4038 H   0  0  0  0  0  0  0  0  0  0  0  0\n   14.1178    6.5756    9.2220 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.5797   11.3868    9.9381 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.2528   11.1709   10.8653 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.6104   10.8876   12.6078 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.9235   10.8688   11.6371 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.5217    9.2559    9.1544 H   0  0  0  0  0  0  0  0  0  0  0  0\n    9.9307    9.3923    9.4960 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.9191    8.6940   12.8698 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.9941    8.6388   11.6417 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.1559    8.4512   11.5217 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0882    6.6568   11.4007 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 15  2  0\n  2  8  2  0\n  2  9  1  0\n  3  8  1  0\n  3 16  2  0\n  4  6  1  0\n  4 15  1  0\n  4 18  1  0\n  5 16  1  0\n  5 17  1  0\n  5 19  1  0\n  7 10  1  0\n  9 18  2  0\n 10 19  1  0\n 11 12  1  0\n 11 13  1  0\n 12 14  1  0\n 13 17  1  0\n 14 17  1  0\n 15 19  1  0\n 16 18  1  0\n  6 20  1  0\n  6 21  1  0\n  6 22  1  0\n  7 23  1  0\n  7 24  1  0\n  7 25  1  0\n  8 26  1  0\n  9 27  1  0\n 10 28  1  0\n 10 29  1  0\n 11 30  1  0\n 11 31  1  0\n 12 32  1  0\n 12 33  1  0\n 13 34  1  0\n 13 35  1  0\n 14 36  1  0\n 14 37  1  0\n 17 38  1  0\n 19 39  1  1\nM  END\n","sdf");
    viewer_17638177524668813.setStyle({"stick": {}});
    viewer_17638177524668813.setStyle({"model": 0},{"stick": {"colorscheme": "cyanCarbon"}});
    viewer_17638177524668813.setStyle({"model": 1},{"stick": {"colorscheme": "redCarbon"}});
    viewer_17638177524668813.setBackgroundColor("0xeeeeee");
    viewer_17638177524668813.zoomTo();
viewer_17638177524668813.render();
});
</script>
</div>
</div>
<div id="4b78264b" class="cell" data-scrolled="false" data-execution_count="24">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1">IPythonConsole.drawMols3D([m1c,m2c])</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17638177652198553" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17638177652198553" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17638177652198553 = null;
var warn = document.getElementById("3dmolwarning_17638177652198553");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17638177652198553 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17638177652198553"),{backgroundColor:"white"});
viewer_17638177652198553.zoomTo();
    viewer_17638177652198553.addModel("1D1_A_401\n     RDKit          3D\n\n 47 50  0  0  0  0  0  0  0  0999 V2000\n    4.0527    1.8957   -0.3913 O   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.8480    2.2030   -0.0118 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.5993   -0.9673   -0.2516 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.0374   -0.0165   -0.1681 N   0  0  0  0  0  0  0  0  0  0  0  0\n    1.8029    2.3697   -0.2356 N   0  0  0  0  0  0  0  0  0  0  0  0\n    1.3087   -0.4889   -0.3277 N   0  0  0  0  0  0  0  0  0  0  0  0\n    2.0742    3.8124   -0.1844 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6170    0.0789    2.2230 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.9181   -1.2423   -0.1069 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.6198    2.7287   -0.0662 C   0  0  0  0  0  0  0  0  0  0  0  0\n    3.3412   -0.4413    0.9841 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.5835   -0.0843    0.2526 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.8590   -4.1509    0.5110 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.7484   -4.2074   -0.7324 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.6529    0.9285    0.3431 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.6891   -2.6772    0.8870 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.0070   -2.7705   -1.1655 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.0615    0.8622   -0.0663 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.8834    1.4626   -0.3468 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.2353    0.4228   -0.2272 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.4509    0.3385    0.0102 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.9698   -1.9332   -0.4096 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.4810    1.8872   -0.1699 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.7019   -0.0105   -0.3496 C   0  0  2  0  0  0  0  0  0  0  0  0\n   -2.8782   -1.6123   -0.5044 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.2112    4.3113   -0.1063 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6496    4.0155    0.6079 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.5468    4.0926   -1.0201 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.0892   -0.2414    3.0442 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6111    1.0788    2.2104 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.6761   -0.2600    2.2260 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.3421   -2.1383   -0.2387 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.4945    3.7201   -0.0325 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.3466   -1.4407    1.0207 H   0  0  0  0  0  0  0  0  0  0  0  0\n    4.2815   -0.1019    1.0050 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -6.5659    0.0027    0.4178 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.2916   -4.6477    1.2634 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.0326   -4.5577    0.3120 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.2842   -4.7075   -1.4633 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6128   -4.6608   -0.5151 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.8138    1.8828    0.5949 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.3422   -2.4136    1.5967 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.2412   -2.4967    1.2063 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.8840   -2.6740   -2.1532 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.9339   -2.4901   -0.9160 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.1278   -1.9883   -0.9463 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.1957   -0.4183   -1.1175 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 19  2  0\n  2 10  2  0\n  2 18  1  0\n  3  9  1  0\n  3 21  1  0\n  4 18  2  0\n  4 20  1  0\n  5  7  1  0\n  5 19  1  0\n  5 23  1  0\n  6 20  1  0\n  6 22  1  0\n  6 24  1  0\n  8 11  1  0\n  9 12  2  0\n 10 23  1  0\n 11 24  1  0\n 12 15  1  0\n 13 14  1  0\n 13 16  1  0\n 14 17  1  0\n 15 21  2  0\n 16 22  1  0\n 17 22  1  0\n 18 21  1  0\n 19 24  1  0\n 20 23  2  0\n  3 25  1  0\n  7 26  1  0\n  7 27  1  0\n  7 28  1  0\n  8 29  1  0\n  8 30  1  0\n  8 31  1  0\n  9 32  1  0\n 10 33  1  0\n 11 34  1  0\n 11 35  1  0\n 12 36  1  0\n 13 37  1  0\n 13 38  1  0\n 14 39  1  0\n 14 40  1  0\n 15 41  1  0\n 16 42  1  0\n 16 43  1  0\n 17 44  1  0\n 17 45  1  0\n 22 46  1  0\n 24 47  1  6\nM  END\n","sdf");
    viewer_17638177652198553.setStyle({"stick": {}});
    viewer_17638177652198553.addModel("11G_A_401\n     RDKit          3D\n\n 39 41  0  0  0  0  0  0  0  0999 V2000\n    1.2908    3.3626   -0.0826 O   0  0  0  0  0  0  0  0  0  0  0  0\n    2.1667   -2.5110   -0.2996 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.1004   -1.9105   -0.0530 N   0  0  0  0  0  0  0  0  0  0  0  0\n    2.0042    1.1592   -0.1788 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.7924    0.3938    0.0616 N   0  0  0  0  0  0  0  0  0  0  0  0\n    3.4111    1.5729   -0.3120 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.3606    1.5738   -2.6066 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.8662   -2.8340   -0.1762 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.5819   -1.2370   -0.3026 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.9431    2.3142   -1.3879 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.4736   -0.3472   -0.4755 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.5054    0.6223    0.7143 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.0300   -0.3987   -0.9834 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.0517    0.9677    1.0295 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.9942    2.1467   -0.0644 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.2109   -0.5927   -0.0495 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.2084   -0.0364    0.2382 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.6419   -0.2106   -0.1916 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.4447    1.8204   -0.0045 C   0  0  2  0  0  0  0  0  0  0  0  0\n    3.9919    0.7619   -0.3813 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.5179    2.1309   -1.1349 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.6788    2.1070    0.4899 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.7406    1.9627   -3.4459 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.6344    1.6744   -2.6134 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.5972    0.6038   -2.5518 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.6096   -3.8005   -0.1764 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.5555   -1.0233   -0.3836 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.9375    2.2113   -1.4116 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.7060    3.2819   -1.4732 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.0797   -0.0194   -1.2002 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.7659   -1.2577   -0.1828 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.9361    0.1859    1.5043 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.0126    1.4495    0.4727 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.8867    0.2632   -1.7191 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.7990   -1.3152   -1.3100 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.8757    0.8754    2.0096 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.8443    1.9022    0.7401 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.1737   -0.8640    0.7984 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.8912    2.3272    0.7330 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 15  2  0\n  2  8  2  0\n  2  9  1  0\n  3  8  1  0\n  3 16  2  0\n  4  6  1  0\n  4 15  1  0\n  4 18  1  0\n  5 16  1  0\n  5 17  1  0\n  5 19  1  0\n  7 10  1  0\n  9 18  2  0\n 10 19  1  0\n 11 12  1  0\n 11 13  1  0\n 12 14  1  0\n 13 17  1  0\n 14 17  1  0\n 15 19  1  0\n 16 18  1  0\n  6 20  1  0\n  6 21  1  0\n  6 22  1  0\n  7 23  1  0\n  7 24  1  0\n  7 25  1  0\n  8 26  1  0\n  9 27  1  0\n 10 28  1  0\n 10 29  1  0\n 11 30  1  0\n 11 31  1  0\n 12 32  1  0\n 12 33  1  0\n 13 34  1  0\n 13 35  1  0\n 14 36  1  0\n 14 37  1  0\n 17 38  1  0\n 19 39  1  1\nM  END\n","sdf");
    viewer_17638177652198553.setStyle({"stick": {}});
    viewer_17638177652198553.setStyle({"model": 0},{"stick": {"colorscheme": "cyanCarbon"}});
    viewer_17638177652198553.setStyle({"model": 1},{"stick": {"colorscheme": "redCarbon"}});
    viewer_17638177652198553.setBackgroundColor("0xeeeeee");
    viewer_17638177652198553.zoomTo();
viewer_17638177652198553.render();
});
</script>
</div>
</div>
<p>This is a case where the asymmetry of the Tversky Index used in the Lobster paper shows. Here in the crystal alignment the second of the two molecules is more or less completely contained within the first one; that leads to a very high Tversky index that drops significantly if we re-order the molecules:</p>
<div id="9b8ce793" class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1">tversky12 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1,m2,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb21-2">tversky21 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m2,m1,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb21-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tversky12<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> -&gt; </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tversky21<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>0.95 -&gt; 0.76</code></pre>
</div>
</div>
<p>The shape alignment is symmetrical because it maximizes the overall overlap:</p>
<div id="11c56455" class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1">m1c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1)</span>
<span id="cb23-2">m2c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2)</span>
<span id="cb23-3">rdMolTransforms.CanonicalizeConformer(m1c.GetConformer())</span>
<span id="cb23-4">rdMolTransforms.CanonicalizeConformer(m2c.GetConformer())</span>
<span id="cb23-5">st2,ct2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m2c,m1c,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb23-6">tversky2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1c,m2c,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb23-7"></span>
<span id="cb23-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tversky<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> -&gt; </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tversky2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>0.76 -&gt; 0.76</code></pre>
</div>
</div>
<p>What about molecules with a very low crystal Tversky index and a high one from the shape alignment?</p>
<div id="fbd00cf0" class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1">pruned <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [r <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTversky'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>]</span>
<span id="cb25-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(pruned)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="29">
<pre><code>42</code></pre>
</div>
</div>
<div id="c16d6901" class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pruned[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb27-2">m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb27-3">m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb27-4">m1c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1)</span>
<span id="cb27-5">m2c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2)</span>
<span id="cb27-6">rdMolTransforms.CanonicalizeConformer(m1c.GetConformer())</span>
<span id="cb27-7">rdMolTransforms.CanonicalizeConformer(m2c.GetConformer())</span>
<span id="cb27-8">st,ct <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m1c,m2c,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb27-9">tversky <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1c,m2c,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb27-10"></span>
<span id="cb27-11"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(st,ct)</span>
<span id="cb27-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>],tversky)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>0.3489983628032094 0.08050058859625212
0.0 0.8264984227129337</code></pre>
</div>
</div>
<div id="b871c795" class="cell" data-execution_count="31">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1">IPythonConsole.drawMols3D([m1,m2])</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17638185210383122" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17638185210383122" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17638185210383122 = null;
var warn = document.getElementById("3dmolwarning_17638185210383122");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17638185210383122 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17638185210383122"),{backgroundColor:"white"});
viewer_17638185210383122.zoomTo();
    viewer_17638185210383122.addModel("N4Z_A_401\n     RDKit          3D\n\n 72 77  0  0  0  0  0  0  0  0999 V2000\n   36.2199  -10.7081    9.0552 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n   27.3817  -12.6095    9.7595 S   0  0  0  0  0  0  0  0  0  0  0  0\n   35.4096   -9.2060    5.5168 F   0  0  0  0  0  0  0  0  0  0  0  0\n   33.4858   -9.8570    4.9932 F   0  0  0  0  0  0  0  0  0  0  0  0\n   26.8175  -13.1097   10.9746 O   0  0  0  0  0  0  0  0  0  0  0  0\n   27.7227  -13.5288    8.7055 O   0  0  0  0  0  0  0  0  0  0  0  0\n   28.5512  -13.5624   15.4356 O   0  0  0  0  0  0  0  0  0  0  0  0\n   33.6681   -7.6959    5.6877 O   0  0  0  0  0  0  0  0  0  0  0  0\n   33.8898   -9.5898    7.1939 O   0  0  0  0  0  0  0  0  0  0  0  0\n   32.4974   -9.3694   19.4636 N   0  0  0  0  0  0  0  0  0  0  0  0\n   34.7262   -8.5287   12.0676 N   0  0  0  0  0  0  0  0  0  0  0  0\n   29.5461  -11.7556   14.1966 N   0  0  0  0  0  0  0  0  0  0  0  0\n   29.4388  -11.6124   16.4996 N   0  0  0  0  0  0  0  0  0  0  0  0\n   30.4059   -9.8858   15.2888 N   0  0  0  0  0  0  0  0  0  0  0  0\n   30.5112   -9.9849   12.9555 N   0  0  0  0  0  0  0  0  0  0  0  0\n   32.4252   -9.8165   11.0866 N   0  0  0  0  0  0  0  0  0  0  0  0\n   30.3391   -9.7008   17.6449 N   0  0  0  0  0  0  0  0  0  0  0  0\n   26.3454  -11.2645    9.1219 C   0  0  0  0  0  0  0  0  0  0  0  0\n   28.0008  -14.1448   14.2300 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.3985   -8.5488   18.2092 C   0  0  0  0  0  0  0  0  0  0  0  0\n   31.5894  -10.5531   19.6096 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.8361   -8.1651   12.7555 C   0  0  0  0  0  0  0  0  0  0  0  0\n   30.9414   -8.3582   17.6650 C   0  0  0  0  0  0  0  0  0  0  0  0\n   30.1756  -10.3268   18.9836 C   0  0  0  0  0  0  0  0  0  0  0  0\n   37.1187   -8.5879   12.3401 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.6687   -5.9855    8.8089 C   0  0  0  0  0  0  0  0  0  0  0  0\n   37.2349   -9.4037   11.2013 C   0  0  0  0  0  0  0  0  0  0  0  0\n   29.8827  -11.6054    9.1608 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.9026   -6.1187    7.4330 C   0  0  0  0  0  0  0  0  0  0  0  0\n   31.0569  -10.9011    9.4315 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.8124   -7.0968    9.6774 C   0  0  0  0  0  0  0  0  0  0  0  0\n   29.0776  -11.2062   11.4192 C   0  0  0  0  0  0  0  0  0  0  0  0\n   36.0740   -9.7303   10.4899 C   0  0  0  0  0  0  0  0  0  0  0  0\n   28.8826  -11.7375   10.1374 C   0  0  0  0  0  0  0  0  0  0  0  0\n   29.1554  -12.2946   15.3639 C   0  0  0  0  0  0  0  0  0  0  0  0\n   30.1514  -10.5453   14.1417 C   0  0  0  0  0  0  0  0  0  0  0  0\n   30.0348  -10.3797   16.4910 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.3050   -7.3953    6.9916 C   0  0  0  0  0  0  0  0  0  0  0  0\n   31.2771  -10.4141   10.7348 C   0  0  0  0  0  0  0  0  0  0  0  0\n   30.2939  -10.5517   11.7267 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.5115   -8.4444    7.8586 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.7981   -9.2932   10.9450 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.5115   -9.5735   10.1572 C   0  0  2  0  0  0  0  0  0  0  0  0\n   33.2926   -8.3469    9.2417 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.1336   -9.0653    5.8659 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.3134   -8.7579   20.2331 H   0  0  0  0  0  0  0  0  0  0  0  0\n   30.9670   -9.0953   12.9819 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.5376   -9.5242   12.0363 H   0  0  0  0  0  0  0  0  0  0  0  0\n   26.7395  -10.9142    8.2722 H   0  0  0  0  0  0  0  0  0  0  0  0\n   25.4236  -11.6093    8.9446 H   0  0  0  0  0  0  0  0  0  0  0  0\n   26.3018  -10.5284    9.7974 H   0  0  0  0  0  0  0  0  0  0  0  0\n   27.6093  -15.0404   14.4412 H   0  0  0  0  0  0  0  0  0  0  0  0\n   27.2869  -13.5463   13.8666 H   0  0  0  0  0  0  0  0  0  0  0  0\n   28.7259  -14.2500   13.5495 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.9399   -8.9989   17.4990 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.7816   -7.6445   18.3974 H   0  0  0  0  0  0  0  0  0  0  0  0\n   31.4776  -10.7495   20.5837 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.0148  -11.3374   19.1581 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.7459   -7.5895   13.5683 H   0  0  0  0  0  0  0  0  0  0  0  0\n   30.4215   -7.7528   18.2676 H   0  0  0  0  0  0  0  0  0  0  0  0\n   30.9623   -7.9726   16.7425 H   0  0  0  0  0  0  0  0  0  0  0  0\n   29.7054  -11.2043   18.8891 H   0  0  0  0  0  0  0  0  0  0  0  0\n   29.6395   -9.7231   19.5736 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.9332   -8.3104   12.8496 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.3978   -5.0978    9.1814 H   0  0  0  0  0  0  0  0  0  0  0  0\n   38.1276   -9.7436   10.9056 H   0  0  0  0  0  0  0  0  0  0  0  0\n   29.7521  -12.0208    8.2606 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.7904   -5.3520    6.8010 H   0  0  0  0  0  0  0  0  0  0  0  0\n   31.7322  -10.7446    8.7107 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.5624   -6.9885   10.6395 H   0  0  0  0  0  0  0  0  0  0  0  0\n   28.3617  -11.2891   12.1124 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.6452  -10.3861    9.5899 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 33  1  0\n  2  5  2  0\n  2  6  2  0\n  2 18  1  0\n  2 34  1  0\n  3 45  1  0\n  4 45  1  0\n  7 19  1  0\n  7 35  1  0\n  8 38  1  0\n  8 45  1  0\n  9 41  1  0\n  9 45  1  0\n 10 20  1  0\n 10 21  1  0\n 11 22  2  0\n 11 42  1  0\n 12 35  2  0\n 12 36  1  0\n 13 35  1  0\n 13 37  2  0\n 14 36  2  0\n 14 37  1  0\n 15 36  1  0\n 15 40  1  0\n 16 39  1  0\n 16 43  1  0\n 17 23  1  0\n 17 24  1  0\n 17 37  1  0\n 20 23  1  0\n 21 24  1  0\n 22 25  1  0\n 25 27  2  0\n 26 29  1  0\n 26 31  2  0\n 27 33  1  0\n 28 30  2  0\n 28 34  1  0\n 29 38  2  0\n 30 39  1  0\n 31 44  1  0\n 32 34  2  0\n 32 40  1  0\n 33 42  2  0\n 38 41  1  0\n 39 40  2  0\n 41 44  2  0\n 42 43  1  0\n 43 44  1  0\n 10 46  1  0\n 15 47  1  0\n 16 48  1  0\n 18 49  1  0\n 18 50  1  0\n 18 51  1  0\n 19 52  1  0\n 19 53  1  0\n 19 54  1  0\n 20 55  1  0\n 20 56  1  0\n 21 57  1  0\n 21 58  1  0\n 22 59  1  0\n 23 60  1  0\n 23 61  1  0\n 24 62  1  0\n 24 63  1  0\n 25 64  1  0\n 26 65  1  0\n 27 66  1  0\n 28 67  1  0\n 29 68  1  0\n 30 69  1  0\n 31 70  1  0\n 32 71  1  0\n 43 72  1  6\nM  END\n","sdf");
    viewer_17638185210383122.setStyle({"stick": {}});
    viewer_17638185210383122.addModel("5JT_A_402\n     RDKit          3D\n\n 31 33  0  0  0  0  0  0  0  0999 V2000\n   36.5041   -2.1215   21.0296 N   0  0  0  0  0  0  0  0  0  0  0  0\n   35.9148    4.4231   18.2923 N   0  0  0  0  0  0  0  0  0  0  0  0\n   36.3566    2.1132   18.3739 N   0  0  0  0  0  0  0  0  0  0  0  0\n   33.6253    5.0470   18.9094 N   0  0  0  0  0  0  0  0  0  0  0  0\n   34.7689    0.4709   18.9206 N   0  0  0  0  0  0  0  0  0  0  0  0\n   36.7684    3.3861   18.1191 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.5188    4.4301   19.2432 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.4106   -0.0916   18.8718 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.0467   -0.4305   20.3537 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.8246   -0.4775   19.3067 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.7306    3.1059   19.2582 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.0678   -1.4666   20.9262 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.5583   -1.0136   20.7452 C   0  0  1  0  0  0  0  0  0  0  0  0\n   34.6301    4.1847   18.6654 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.0755    1.7912   18.7220 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.1793    2.8845   18.8800 C   0  0  0  0  0  0  0  0  0  0  0  0\n   36.3397   -2.4717   21.9518 H   0  0  0  0  0  0  0  0  0  0  0  0\n   36.3669   -2.8559   20.3649 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.7132    6.0410   18.8436 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.7022    3.5552   17.8038 H   0  0  0  0  0  0  0  0  0  0  0  0\n   31.6499    4.8778   19.4544 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.3947   -0.9187   18.3099 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.7683    0.5776   18.4982 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.1255   -0.8180   20.3890 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.0782    0.4048   20.9026 H   0  0  0  0  0  0  0  0  0  0  0  0\n   36.7105   -0.0140   19.2847 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.8331   -1.2435   18.6640 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.0586    2.3987   19.4777 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.9388   -2.3378   20.4525 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.8869   -1.5879   21.9022 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.7404   -0.2737   21.3928 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 13  1  0\n  2  6  2  0\n  2 14  1  0\n  3  6  1  0\n  3 15  2  0\n  4  7  1  0\n  4 14  1  0\n  5  8  1  0\n  5 10  1  0\n  5 15  1  0\n  7 11  2  0\n  8  9  1  0\n  9 12  1  0\n 10 13  1  0\n 11 16  1  0\n 12 13  1  0\n 14 16  2  0\n 15 16  1  0\n  1 17  1  0\n  1 18  1  0\n  4 19  1  0\n  6 20  1  0\n  7 21  1  0\n  8 22  1  0\n  8 23  1  0\n  9 24  1  0\n  9 25  1  0\n 10 26  1  0\n 10 27  1  0\n 11 28  1  0\n 12 29  1  0\n 12 30  1  0\n 13 31  1  1\nM  END\n","sdf");
    viewer_17638185210383122.setStyle({"stick": {}});
    viewer_17638185210383122.setStyle({"model": 0},{"stick": {"colorscheme": "cyanCarbon"}});
    viewer_17638185210383122.setStyle({"model": 1},{"stick": {"colorscheme": "redCarbon"}});
    viewer_17638185210383122.setBackgroundColor("0xeeeeee");
    viewer_17638185210383122.zoomTo();
viewer_17638185210383122.render();
});
</script>
</div>
</div>
<div id="5ff7da55" class="cell" data-scrolled="false" data-execution_count="32">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb30-1">IPythonConsole.drawMols3D([m1c,m2c])</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_1763818529126895" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_1763818529126895" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_1763818529126895 = null;
var warn = document.getElementById("3dmolwarning_1763818529126895");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_1763818529126895 = $3Dmol.createViewer(document.getElementById("3dmolviewer_1763818529126895"),{backgroundColor:"white"});
viewer_1763818529126895.zoomTo();
    viewer_1763818529126895.addModel("N4Z_A_401\n     RDKit          3D\n\n 72 77  0  0  0  0  0  0  0  0999 V2000\n    3.9440   -2.0052   -2.7857 Cl  0  0  0  0  0  0  0  0  0  0  0  0\n   -0.4775    5.4335   -0.0762 S   0  0  0  0  0  0  0  0  0  0  0  0\n    7.1313   -0.4980   -1.0531 F   0  0  0  0  0  0  0  0  0  0  0  0\n    6.7279    1.5232   -0.6644 F   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.8918    5.5663   -0.2413 O   0  0  0  0  0  0  0  0  0  0  0  0\n    0.4079    6.0394   -1.0360 O   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.3343    2.4658   -1.5248 O   0  0  0  0  0  0  0  0  0  0  0  0\n    6.6275    0.1107    1.1190 O   0  0  0  0  0  0  0  0  0  0  0  0\n    4.9597    0.1264   -0.6482 O   0  0  0  0  0  0  0  0  0  0  0  0\n   -6.5625   -4.2497    0.1401 N   0  0  0  0  0  0  0  0  0  0  0  0\n    1.1203   -3.1315   -0.1754 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.4587    1.4230   -0.4366 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.5410    0.4359   -0.2762 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.7183   -0.5617    0.7559 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.6003    0.4205    0.6350 N   0  0  0  0  0  0  0  0  0  0  0  0\n    0.8497   -0.3312   -0.1505 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.8240   -1.6223    0.9321 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.0186    5.9192    1.6099 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.5835    3.6828   -1.7501 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.2998   -3.9912    0.9117 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -7.2908   -3.0756   -0.4413 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.0036   -4.4609   -0.4130 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.3289   -2.7016    1.8015 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -7.2222   -1.8000    0.4588 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.7800   -5.0909   -1.4113 C   0  0  0  0  0  0  0  0  0  0  0  0\n    3.7969   -1.2391    3.0765 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6772   -4.3188   -2.1694 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.2308    3.2947   -0.4348 C   0  0  0  0  0  0  0  0  0  0  0  0\n    5.0959   -0.7605    2.8554 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.5856    1.9451   -0.4048 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.8372   -1.2338    2.0329 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.0250    2.7570    0.2933 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.8033   -2.9559   -1.8743 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.0587    3.7072   -0.0627 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.7716    1.4587   -0.7208 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.9229    0.4334    0.3168 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.0456   -0.5768    0.5009 C   0  0  0  0  0  0  0  0  0  0  0  0\n    5.3788   -0.3105    1.5500 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.6003    0.9857   -0.1000 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.6976    1.3808    0.2589 C   0  0  0  0  0  0  0  0  0  0  0  0\n    4.4577   -0.3814    0.5295 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.9967   -2.3549   -0.8672 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.1531   -0.8812   -0.4692 C   0  0  2  0  0  0  0  0  0  0  0  0\n    3.1504   -0.8588    0.7121 C   0  0  0  0  0  0  0  0  0  0  0  0\n    6.3578    0.2821   -0.3028 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -7.1965   -4.7160    0.7569 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.2623   -0.3446    1.1830 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.1001   -0.9648    0.0410 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.9703    5.8265    1.7260 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.2850    6.8702    1.7670 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.4881    5.3287    2.2663 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.1118    4.2966   -2.3368 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.4045    4.1310   -0.8743 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.7156    3.4616   -2.1949 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.5481   -3.9018    0.2581 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.1320   -4.7769    1.5071 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -8.2510   -3.3263   -0.5641 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -6.8865   -2.8593   -1.3300 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.3590   -5.0033    0.1259 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.9429   -2.8302    2.5802 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.4102   -2.4871    2.1332 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -7.5020   -0.9993   -0.0710 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -7.8314   -1.9097    1.2441 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.6924   -6.0733   -1.5763 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.5446   -1.5876    3.9793 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.2139   -4.7340   -2.9038 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.9030    3.9759   -0.7247 H   0  0  0  0  0  0  0  0  0  0  0  0\n    5.7825   -0.7396    3.5820 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.5256    1.6633   -0.5972 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.8989   -1.5078    2.2435 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.9403    3.0496    0.5698 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.5437   -0.3740   -1.2374 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 33  1  0\n  2  5  2  0\n  2  6  2  0\n  2 18  1  0\n  2 34  1  0\n  3 45  1  0\n  4 45  1  0\n  7 19  1  0\n  7 35  1  0\n  8 38  1  0\n  8 45  1  0\n  9 41  1  0\n  9 45  1  0\n 10 20  1  0\n 10 21  1  0\n 11 22  2  0\n 11 42  1  0\n 12 35  2  0\n 12 36  1  0\n 13 35  1  0\n 13 37  2  0\n 14 36  2  0\n 14 37  1  0\n 15 36  1  0\n 15 40  1  0\n 16 39  1  0\n 16 43  1  0\n 17 23  1  0\n 17 24  1  0\n 17 37  1  0\n 20 23  1  0\n 21 24  1  0\n 22 25  1  0\n 25 27  2  0\n 26 29  1  0\n 26 31  2  0\n 27 33  1  0\n 28 30  2  0\n 28 34  1  0\n 29 38  2  0\n 30 39  1  0\n 31 44  1  0\n 32 34  2  0\n 32 40  1  0\n 33 42  2  0\n 38 41  1  0\n 39 40  2  0\n 41 44  2  0\n 42 43  1  0\n 43 44  1  0\n 10 46  1  0\n 15 47  1  0\n 16 48  1  0\n 18 49  1  0\n 18 50  1  0\n 18 51  1  0\n 19 52  1  0\n 19 53  1  0\n 19 54  1  0\n 20 55  1  0\n 20 56  1  0\n 21 57  1  0\n 21 58  1  0\n 22 59  1  0\n 23 60  1  0\n 23 61  1  0\n 24 62  1  0\n 24 63  1  0\n 25 64  1  0\n 26 65  1  0\n 27 66  1  0\n 28 67  1  0\n 29 68  1  0\n 30 69  1  0\n 31 70  1  0\n 32 71  1  0\n 43 72  1  6\nM  END\n","sdf");
    viewer_1763818529126895.setStyle({"stick": {}});
    viewer_1763818529126895.addModel("5JT_A_402\n     RDKit          3D\n\n 31 33  0  0  0  0  0  0  0  0999 V2000\n   -5.4331    2.2314   -1.0580 N   0  0  0  0  0  0  0  0  0  0  0  0\n    1.6630    2.2681   -0.4957 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.5778    2.7858    0.0029 N   0  0  0  0  0  0  0  0  0  0  0  0\n    2.1975   -0.1239   -0.4281 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.2104    1.2054    0.5976 N   0  0  0  0  0  0  0  0  0  0  0  0\n    0.6837    3.1864   -0.3180 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.5744   -1.2392   -0.1380 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.6467   -0.0132    1.2966 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.4114   -0.8565    0.2255 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.2936    2.1098    0.1831 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.3024   -0.9700    0.1905 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.6341   -0.0451   -0.3135 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.2374    1.3844   -0.8221 C   0  0  1  0  0  0  0  0  0  0  0  0\n    1.3969    0.9505   -0.2964 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.9154    1.4852    0.2493 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.1246    0.5292    0.0824 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -6.0447    1.7726   -1.7027 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.9120    2.3805   -0.1929 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.1547   -0.0762   -0.7137 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.8870    4.1599   -0.4227 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.9844   -2.1511   -0.1584 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.2529    0.2183    2.0574 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.8568   -0.5199    1.6420 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.7345   -1.7068    0.6412 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.7943   -1.0694   -0.5320 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.9006    2.9180   -0.2557 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.8180    2.3891    0.9874 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.3964   -1.6340    0.4564 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.3031    0.0522    0.4235 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.0448   -0.5516   -1.0717 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.7506    1.2829   -1.6898 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 13  1  0\n  2  6  2  0\n  2 14  1  0\n  3  6  1  0\n  3 15  2  0\n  4  7  1  0\n  4 14  1  0\n  5  8  1  0\n  5 10  1  0\n  5 15  1  0\n  7 11  2  0\n  8  9  1  0\n  9 12  1  0\n 10 13  1  0\n 11 16  1  0\n 12 13  1  0\n 14 16  2  0\n 15 16  1  0\n  1 17  1  0\n  1 18  1  0\n  4 19  1  0\n  6 20  1  0\n  7 21  1  0\n  8 22  1  0\n  8 23  1  0\n  9 24  1  0\n  9 25  1  0\n 10 26  1  0\n 10 27  1  0\n 11 28  1  0\n 12 29  1  0\n 12 30  1  0\n 13 31  1  6\nM  END\n","sdf");
    viewer_1763818529126895.setStyle({"stick": {}});
    viewer_1763818529126895.setStyle({"model": 0},{"stick": {"colorscheme": "cyanCarbon"}});
    viewer_1763818529126895.setStyle({"model": 1},{"stick": {"colorscheme": "redCarbon"}});
    viewer_1763818529126895.setBackgroundColor("0xeeeeee");
    viewer_1763818529126895.zoomTo();
viewer_1763818529126895.render();
});
</script>
</div>
</div>
<p>This is a situation described in the LOBSTER paper: the crystal ligands are in very different parts of the pocket that was identified by SIENA.</p>
<p>We can figure out what’s going on by looking at the PDB entries:</p>
<div id="253da19a" class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1">d</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="33">
<pre><code>{'ligname1': 'N4Z_A_401',
 'pdb1': '6tel',
 'ligname2': '5JT_A_402',
 'pdb2': '5mw3',
 'ensemble': '5JU_A_401-5mw4',
 'morgan_fp_tanimoto': 0.115,
 'gobbi_2d_pharmacophore_fp_tanimoto': 0.102,
 'hac_difference': 29,
 'shape_tversky_index': 0.0,
 'shape_tanimoto_distance': 1.0,
 'shape_protrude_distance': 1.0,
 'shape_align_ShapeTanimoto': 0.3489983628032094,
 'shape_align_ColorTanimoto': 0.08050058859625212,
 'shape_align_ShapeTversky': 0.8264984227129337,
 'O3A_RMSD': 0.18706387903535077,
 'O3A_score': 99.22402124958128,
 'O3A_ShapeTversky': 0.6229205175600739,
 'CrippenO3A_RMSD': 0.3775024004623698,
 'CrippenO3A_score': 64.29333441352107,
 'CrippenO3A_ShapeTversky': 0.7638085218306154}</code></pre>
</div>
</div>
<p>From looking at the PDBe pages for <a href="https://www.ebi.ac.uk/pdbe/entry/pdb/6tel?activeTab=ligands&amp;id=N4Z">ligand N4Z in 6tel</a> and <a href="https://www.ebi.ac.uk/pdbe/entry/pdb/5mw3?activeTab=ligands&amp;id=5JT">ligand 5JT in 5mw3</a> it looks like the automatic curation procedure for LOBSTER may have picked the wrong ligand for 5mw3 (which has two bound ligands), <a href="https://www.ebi.ac.uk/pdbe/entry/pdb/5mw3?activeTab=ligands&amp;id=5JJ">ligand 5JJ</a> looks like it may be a more similar pocket (or part of the pocket) to ligand N4Z.</p>
</section>
<section id="open3dalign" class="level1">
<h1>Open3DAlign</h1>
<p>What about an alternative 3D alignment algorithm? Let’s try aligning with Paolo Tosco’s <a href="https://link.springer.com/article/10.1007/s10822-011-9462-9">Open3DAlign</a>:</p>
<div id="e6ecccef" class="cell">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolAlign</span>
<span id="cb33-2"></span>
<span id="cb33-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> tqdm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb33-4"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> d <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairstats):</span>
<span id="cb33-5">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb33-6">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb33-7">    </span>
<span id="cb33-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb33-9">        o3a <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolAlign.GetO3A(m2,m1)</span>
<span id="cb33-10">        rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Align()</span>
<span id="cb33-11">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Score()</span>
<span id="cb33-12">        tversky <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1,m2,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb33-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>:</span>
<span id="cb33-14">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># we get failures for molecules that don't have MMFF94 parameters.</span></span>
<span id="cb33-15">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># set the results for those to zero</span></span>
<span id="cb33-16">        rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb33-17">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb33-18">        tversky <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb33-19">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_RMSD'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rmsd</span>
<span id="cb33-20">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_score'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> score</span>
<span id="cb33-21">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_ShapeTversky'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tversky</span>
<span id="cb33-22">    </span>
<span id="cb33-23"></span></code></pre></div>
</div>
<div id="c69a2fbc" class="cell" data-execution_count="36">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb34-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb34-2">                includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-25-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Ignoring the cases where Open3DAlign didn’t have parameters, that’s pretty similar to what we saw with the shape alignment, though there are points with a high Tversky index in the crystal structure and low Open3D align result.</p>
<div id="56c89664" class="cell" data-execution_count="42">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1">ligname1,pdb1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'1D1_A_401'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'4i5p'</span>) </span>
<span id="cb35-2">ligname2,pdb2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'11G_A_401'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'4i6b'</span>)</span>
<span id="cb35-3">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pruned[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb35-4">m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(ligname1,pdb1)][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb35-5">m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(ligname2,pdb2)][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb35-6">m1c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1)</span>
<span id="cb35-7">m2c <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2)</span>
<span id="cb35-8">rdMolTransforms.CanonicalizeConformer(m1c.GetConformer())</span>
<span id="cb35-9">rdMolTransforms.CanonicalizeConformer(m2c.GetConformer())</span>
<span id="cb35-10">st,ct <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeAlign.AlignMol(m1c,m2c,opt_param<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb35-11">stversky <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1c,m2c,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb35-12"></span>
<span id="cb35-13">m1c2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m1)</span>
<span id="cb35-14">m2c2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(m2)</span>
<span id="cb35-15">o3a <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolAlign.GetO3A(m2c2,m1c2)</span>
<span id="cb35-16">rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Align()</span>
<span id="cb35-17">o3atversky <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1c2,m2c2,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb35-18"></span>
<span id="cb35-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(stversky,o3atversky)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>0.7622862091361766 0.9846320346320346</code></pre>
</div>
</div>
<p>The shape-based alignment (we saw this above):</p>
<div id="3bbc7999" class="cell" data-execution_count="43">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1">IPythonConsole.drawMols3D([m1c,m2c])</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17638734672885962" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17638734672885962" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17638734672885962 = null;
var warn = document.getElementById("3dmolwarning_17638734672885962");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17638734672885962 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17638734672885962"),{backgroundColor:"white"});
viewer_17638734672885962.zoomTo();
    viewer_17638734672885962.addModel("1D1_A_401\n     RDKit          3D\n\n 47 50  0  0  0  0  0  0  0  0999 V2000\n    4.0527    1.8957   -0.3913 O   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.8480    2.2030   -0.0118 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.5993   -0.9673   -0.2516 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.0374   -0.0165   -0.1681 N   0  0  0  0  0  0  0  0  0  0  0  0\n    1.8029    2.3697   -0.2356 N   0  0  0  0  0  0  0  0  0  0  0  0\n    1.3087   -0.4889   -0.3277 N   0  0  0  0  0  0  0  0  0  0  0  0\n    2.0742    3.8124   -0.1844 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6170    0.0789    2.2230 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.9181   -1.2423   -0.1069 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.6198    2.7287   -0.0662 C   0  0  0  0  0  0  0  0  0  0  0  0\n    3.3412   -0.4413    0.9841 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.5835   -0.0843    0.2526 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.8590   -4.1509    0.5110 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.7484   -4.2074   -0.7324 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.6529    0.9285    0.3431 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.6891   -2.6772    0.8870 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.0070   -2.7705   -1.1655 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.0615    0.8622   -0.0663 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.8834    1.4626   -0.3468 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.2353    0.4228   -0.2272 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.4509    0.3385    0.0102 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.9698   -1.9332   -0.4096 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.4810    1.8872   -0.1699 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.7019   -0.0105   -0.3496 C   0  0  2  0  0  0  0  0  0  0  0  0\n   -2.8782   -1.6123   -0.5044 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.2112    4.3113   -0.1063 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6496    4.0155    0.6079 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.5468    4.0926   -1.0201 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.0892   -0.2414    3.0442 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6111    1.0788    2.2104 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.6761   -0.2600    2.2260 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.3421   -2.1383   -0.2387 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.4945    3.7201   -0.0325 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.3466   -1.4407    1.0207 H   0  0  0  0  0  0  0  0  0  0  0  0\n    4.2815   -0.1019    1.0050 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -6.5659    0.0027    0.4178 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.2916   -4.6477    1.2634 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.0326   -4.5577    0.3120 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.2842   -4.7075   -1.4633 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6128   -4.6608   -0.5151 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.8138    1.8828    0.5949 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.3422   -2.4136    1.5967 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.2412   -2.4967    1.2063 H   0  0  0  0  0  0  0  0  0  0  0  0\n    1.8840   -2.6740   -2.1532 H   0  0  0  0  0  0  0  0  0  0  0  0\n    2.9339   -2.4901   -0.9160 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.1278   -1.9883   -0.9463 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.1957   -0.4183   -1.1175 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 19  2  0\n  2 10  2  0\n  2 18  1  0\n  3  9  1  0\n  3 21  1  0\n  4 18  2  0\n  4 20  1  0\n  5  7  1  0\n  5 19  1  0\n  5 23  1  0\n  6 20  1  0\n  6 22  1  0\n  6 24  1  0\n  8 11  1  0\n  9 12  2  0\n 10 23  1  0\n 11 24  1  0\n 12 15  1  0\n 13 14  1  0\n 13 16  1  0\n 14 17  1  0\n 15 21  2  0\n 16 22  1  0\n 17 22  1  0\n 18 21  1  0\n 19 24  1  0\n 20 23  2  0\n  3 25  1  0\n  7 26  1  0\n  7 27  1  0\n  7 28  1  0\n  8 29  1  0\n  8 30  1  0\n  8 31  1  0\n  9 32  1  0\n 10 33  1  0\n 11 34  1  0\n 11 35  1  0\n 12 36  1  0\n 13 37  1  0\n 13 38  1  0\n 14 39  1  0\n 14 40  1  0\n 15 41  1  0\n 16 42  1  0\n 16 43  1  0\n 17 44  1  0\n 17 45  1  0\n 22 46  1  0\n 24 47  1  6\nM  END\n","sdf");
    viewer_17638734672885962.setStyle({"stick": {}});
    viewer_17638734672885962.addModel("11G_A_401\n     RDKit          3D\n\n 39 41  0  0  0  0  0  0  0  0999 V2000\n    1.2908    3.3626   -0.0826 O   0  0  0  0  0  0  0  0  0  0  0  0\n    2.1667   -2.5110   -0.2996 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.1004   -1.9105   -0.0530 N   0  0  0  0  0  0  0  0  0  0  0  0\n    2.0042    1.1592   -0.1788 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.7924    0.3938    0.0616 N   0  0  0  0  0  0  0  0  0  0  0  0\n    3.4111    1.5729   -0.3120 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.3606    1.5738   -2.6066 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.8662   -2.8340   -0.1762 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.5819   -1.2370   -0.3026 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.9431    2.3142   -1.3879 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.4736   -0.3472   -0.4755 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.5054    0.6223    0.7143 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.0300   -0.3987   -0.9834 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -3.0517    0.9677    1.0295 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.9942    2.1467   -0.0644 C   0  0  0  0  0  0  0  0  0  0  0  0\n    0.2109   -0.5927   -0.0495 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.2084   -0.0364    0.2382 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.6419   -0.2106   -0.1916 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.4447    1.8204   -0.0045 C   0  0  2  0  0  0  0  0  0  0  0  0\n    3.9919    0.7619   -0.3813 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.5179    2.1309   -1.1349 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.6788    2.1070    0.4899 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.7406    1.9627   -3.4459 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.6344    1.6744   -2.6134 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.5972    0.6038   -2.5518 H   0  0  0  0  0  0  0  0  0  0  0  0\n    0.6096   -3.8005   -0.1764 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.5555   -1.0233   -0.3836 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.9375    2.2113   -1.4116 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.7060    3.2819   -1.4732 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.0797   -0.0194   -1.2002 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.7659   -1.2577   -0.1828 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -4.9361    0.1859    1.5043 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -5.0126    1.4495    0.4727 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.8867    0.2632   -1.7191 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.7990   -1.3152   -1.3100 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.8757    0.8754    2.0096 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.8443    1.9022    0.7401 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.1737   -0.8640    0.7984 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.8912    2.3272    0.7330 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 15  2  0\n  2  8  2  0\n  2  9  1  0\n  3  8  1  0\n  3 16  2  0\n  4  6  1  0\n  4 15  1  0\n  4 18  1  0\n  5 16  1  0\n  5 17  1  0\n  5 19  1  0\n  7 10  1  0\n  9 18  2  0\n 10 19  1  0\n 11 12  1  0\n 11 13  1  0\n 12 14  1  0\n 13 17  1  0\n 14 17  1  0\n 15 19  1  0\n 16 18  1  0\n  6 20  1  0\n  6 21  1  0\n  6 22  1  0\n  7 23  1  0\n  7 24  1  0\n  7 25  1  0\n  8 26  1  0\n  9 27  1  0\n 10 28  1  0\n 10 29  1  0\n 11 30  1  0\n 11 31  1  0\n 12 32  1  0\n 12 33  1  0\n 13 34  1  0\n 13 35  1  0\n 14 36  1  0\n 14 37  1  0\n 17 38  1  0\n 19 39  1  1\nM  END\n","sdf");
    viewer_17638734672885962.setStyle({"stick": {}});
    viewer_17638734672885962.setStyle({"model": 0},{"stick": {"colorscheme": "cyanCarbon"}});
    viewer_17638734672885962.setStyle({"model": 1},{"stick": {"colorscheme": "redCarbon"}});
    viewer_17638734672885962.setBackgroundColor("0xeeeeee");
    viewer_17638734672885962.zoomTo();
viewer_17638734672885962.render();
});
</script>
</div>
</div>
<p>And the Open3D alignment:</p>
<div id="9db2e901" class="cell" data-execution_count="44">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb38-1">IPythonConsole.drawMols3D([m1c2,m2c2])</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17638734790354753" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17638734790354753" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17638734790354753 = null;
var warn = document.getElementById("3dmolwarning_17638734790354753");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17638734790354753 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17638734790354753"),{backgroundColor:"white"});
viewer_17638734790354753.zoomTo();
    viewer_17638734790354753.addModel("1D1_A_401\n     RDKit          3D\n\n 47 50  0  0  0  0  0  0  0  0999 V2000\n   13.6199    4.3394   10.4996 O   0  0  0  0  0  0  0  0  0  0  0  0\n    7.7618    4.8606    9.8156 N   0  0  0  0  0  0  0  0  0  0  0  0\n    6.3944    8.1554   10.4865 N   0  0  0  0  0  0  0  0  0  0  0  0\n    8.8228    6.9055   10.3637 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.3439    4.1903   10.1704 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.1937    7.0400   10.7008 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.4418    2.7500    9.8990 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.5846    6.7088    8.1521 C   0  0  0  0  0  0  0  0  0  0  0  0\n    5.1305    8.6178   10.3300 C   0  0  0  0  0  0  0  0  0  0  0  0\n    8.9115    4.1784    9.8385 C   0  0  0  0  0  0  0  0  0  0  0  0\n   13.2854    6.9331    9.4894 C   0  0  0  0  0  0  0  0  0  0  0  0\n    4.3546    7.6253    9.7593 C   0  0  0  0  0  0  0  0  0  0  0  0\n   11.2442   10.8146   10.4451 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0524   10.5627   11.7193 C   0  0  0  0  0  0  0  0  0  0  0  0\n    5.1599    6.5268    9.5477 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.9220    9.4513    9.8291 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.1073    9.0549   11.9258 C   0  0  0  0  0  0  0  0  0  0  0  0\n    7.7087    6.1931   10.0763 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.5167    4.9218   10.4741 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0267    6.3014   10.4072 C   0  0  0  0  0  0  0  0  0  0  0  0\n    6.4005    6.8977   10.0241 C   0  0  0  0  0  0  0  0  0  0  0  0\n   11.0272    8.4858   10.9997 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0969    4.8441   10.1252 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.5146    6.3877   10.7066 C   0  0  2  0  0  0  0  0  0  0  0  0\n    7.1705    8.6548   10.8716 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.5316    2.3851    9.7034 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0380    2.5991    9.1105 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.8225    2.2853   10.6985 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.1437    7.0883    7.4150 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.4572    5.7282    8.0029 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.6937    7.1629    8.1623 H   0  0  0  0  0  0  0  0  0  0  0  0\n    4.8102    9.5299   10.5859 H   0  0  0  0  0  0  0  0  0  0  0  0\n    8.9181    3.1963    9.6508 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.4138    7.9169    9.6149 H   0  0  0  0  0  0  0  0  0  0  0  0\n   14.1773    6.4822    9.4555 H   0  0  0  0  0  0  0  0  0  0  0  0\n    3.3815    7.6927    9.5391 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.7810   11.3615    9.8026 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.3974   11.2977   10.6676 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.6061   10.9996   12.5002 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.9774   10.9285   11.6164 H   0  0  0  0  0  0  0  0  0  0  0  0\n    4.9014    5.6518    9.1383 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.5827    9.2183    9.1156 H   0  0  0  0  0  0  0  0  0  0  0  0\n    9.9991    9.4442    9.4441 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.9105    8.8236   12.8786 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0076    8.6988   11.6755 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.1653    8.5658   11.5006 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0038    6.6046   11.5513 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 19  2  0\n  2 10  2  0\n  2 18  1  0\n  3  9  1  0\n  3 21  1  0\n  4 18  2  0\n  4 20  1  0\n  5  7  1  0\n  5 19  1  0\n  5 23  1  0\n  6 20  1  0\n  6 22  1  0\n  6 24  1  0\n  8 11  1  0\n  9 12  2  0\n 10 23  1  0\n 11 24  1  0\n 12 15  1  0\n 13 14  1  0\n 13 16  1  0\n 14 17  1  0\n 15 21  2  0\n 16 22  1  0\n 17 22  1  0\n 18 21  1  0\n 19 24  1  0\n 20 23  2  0\n  3 25  1  0\n  7 26  1  0\n  7 27  1  0\n  7 28  1  0\n  8 29  1  0\n  8 30  1  0\n  8 31  1  0\n  9 32  1  0\n 10 33  1  0\n 11 34  1  0\n 11 35  1  0\n 12 36  1  0\n 13 37  1  0\n 13 38  1  0\n 14 39  1  0\n 14 40  1  0\n 15 41  1  0\n 16 42  1  0\n 16 43  1  0\n 17 44  1  0\n 17 45  1  0\n 22 46  1  0\n 24 47  1  1\nM  END\n","sdf");
    viewer_17638734790354753.setStyle({"stick": {}});
    viewer_17638734790354753.addModel("11G_A_401\n     RDKit          3D\n\n 39 41  0  0  0  0  0  0  0  0999 V2000\n   13.6387    4.3854   10.5227 O   0  0  0  0  0  0  0  0  0  0  0  0\n    7.7586    4.8504    9.8003 N   0  0  0  0  0  0  0  0  0  0  0  0\n    8.8004    6.8956   10.3415 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.3526    4.1877   10.1938 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.1784    7.0485   10.6933 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.4693    2.7433    9.9324 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.4814    6.6218    8.0821 C   0  0  0  0  0  0  0  0  0  0  0  0\n    7.7085    6.1724   10.0463 C   0  0  0  0  0  0  0  0  0  0  0  0\n    8.9104    4.1667    9.8393 C   0  0  0  0  0  0  0  0  0  0  0  0\n   13.2126    6.9087    9.4071 C   0  0  0  0  0  0  0  0  0  0  0  0\n   11.2871   10.8364   10.5232 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.1272   10.5410   11.7736 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.9781    9.4945    9.8539 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.1261    9.0238   11.9466 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.5169    4.9397   10.4895 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0162    6.3026   10.4017 C   0  0  0  0  0  0  0  0  0  0  0  0\n   11.0436    8.5014   10.9976 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0979    4.8425   10.1265 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.4988    6.4028   10.6879 C   0  0  2  0  0  0  0  0  0  0  0  0\n   10.5636    2.3647    9.7419 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0655    2.5944    9.1435 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.8582    2.2898   10.7344 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0179    6.9844    7.3200 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.3691    5.6343    7.9705 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.5828    7.0601    8.0955 H   0  0  0  0  0  0  0  0  0  0  0  0\n    6.8226    6.6347   10.0077 H   0  0  0  0  0  0  0  0  0  0  0  0\n    8.9194    3.1822    9.6644 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.3254    7.8990    9.4883 H   0  0  0  0  0  0  0  0  0  0  0  0\n   14.1117    6.4734    9.3633 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.8015   11.4212    9.8958 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.4362   11.2932   10.7828 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.7210   10.9805   12.5748 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0619   10.8739   11.6487 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.6597    9.2791    9.1546 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0676    9.5032    9.4404 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.9085    8.7805   12.8919 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0181    8.6439   11.7015 H   0  0  0  0  0  0  0  0  0  0  0  0\n   10.1793    8.5971   11.4914 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.0183    6.6548   11.5045 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 15  2  0\n  2  8  2  0\n  2  9  1  0\n  3  8  1  0\n  3 16  2  0\n  4  6  1  0\n  4 15  1  0\n  4 18  1  0\n  5 16  1  0\n  5 17  1  0\n  5 19  1  0\n  7 10  1  0\n  9 18  2  0\n 10 19  1  0\n 11 12  1  0\n 11 13  1  0\n 12 14  1  0\n 13 17  1  0\n 14 17  1  0\n 15 19  1  0\n 16 18  1  0\n  6 20  1  0\n  6 21  1  0\n  6 22  1  0\n  7 23  1  0\n  7 24  1  0\n  7 25  1  0\n  8 26  1  0\n  9 27  1  0\n 10 28  1  0\n 10 29  1  0\n 11 30  1  0\n 11 31  1  0\n 12 32  1  0\n 12 33  1  0\n 13 34  1  0\n 13 35  1  0\n 14 36  1  0\n 14 37  1  0\n 17 38  1  0\n 19 39  1  1\nM  END\n","sdf");
    viewer_17638734790354753.setStyle({"stick": {}});
    viewer_17638734790354753.setStyle({"model": 0},{"stick": {"colorscheme": "cyanCarbon"}});
    viewer_17638734790354753.setStyle({"model": 1},{"stick": {"colorscheme": "redCarbon"}});
    viewer_17638734790354753.setBackgroundColor("0xeeeeee");
    viewer_17638734790354753.zoomTo();
viewer_17638734790354753.render();
});
</script>
</div>
</div>
<p>Here we can see that Open3DAlign can quite happily align a smaller molecule to a larger one.</p>
<p>Directly compare the O3A results to the shape alignment results:</p>
<div id="45b81c1f" class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb39-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTversky'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb39-2">                includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-29-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>There’s a reasonable number of pairs with high O3A similarities and low shape-align similarities. These are probably cases like we just saw: alignments of a small ligand to a larger one where there’s a good feature overlap between the small and large structures.</p>
<p>Aside from the alignments that didn’t work due to missing parameters (where the O3A Tversky is 0), the correlation is quite good</p>
<p>The RDKit has a variation on Open3DAlign that uses atomic contributions to the MolLogP value instead of MMFF94 atom types</p>
<div id="6e9ee47e" class="cell" data-execution_count="62">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb40-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolAlign</span>
<span id="cb40-2"></span>
<span id="cb40-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> tqdm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb40-4"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> d <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(pairstats):</span>
<span id="cb40-5">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb40-6">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>])</span>
<span id="cb40-7">    </span>
<span id="cb40-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb40-9">        o3a <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolAlign.GetCrippenO3A(m2,m1)</span>
<span id="cb40-10">        rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Align()</span>
<span id="cb40-11">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> o3a.Score()</span>
<span id="cb40-12">        tversky <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdShapeHelpers.ShapeTverskyIndex(m1,m2,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb40-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>:</span>
<span id="cb40-14">        rmsd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb40-15">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb40-16">        tversky <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb40-17">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A_RMSD'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rmsd</span>
<span id="cb40-18">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A_score'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> score</span>
<span id="cb40-19">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A_ShapeTversky'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tversky</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 72592/72592 [04:25&lt;00:00, 273.70it/s]</code></pre>
</div>
</div>
<p>It took a while to get here, so save the results:</p>
<div id="252a2d60" class="cell" data-execution_count="66">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb42-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pickle</span>
<span id="cb42-2">pickle.dump(pairstats,<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'./results/pairstats.pkl'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'wb+'</span>))</span></code></pre></div>
</div>
<p>Compare the Crippen alignment to the shape one</p>
<div id="d29a1236" class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb43-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTversky'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb43-2">                includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-32-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>This looks quite similar to the last results, but this time we don’t have failed alignments due to missing atom types.</p>
<p>Compare the two different Open3DAlign results to each other:</p>
<div id="971ac0b6" class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb44-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_ShapeTversky'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb44-2">                includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-33-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>And, finally, compare the Crippen O3A results to the crystal alignments:</p>
<div id="63bc738b" class="cell" data-scrolled="false" data-execution_count="16">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb45-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb45-2">                includeLine<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-34-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="comparing-performance-on-easy-cases" class="level1">
<h1>Comparing performance on “easy” cases</h1>
<p>The LOBSTER paper suggests that ligand pairs that have a Tversky overlap of 0.9 or higher are good starting points for studying superposition algorithms.</p>
<p>Let’s see how well the three alignment approaches we applied here do:</p>
<div id="7802e015" class="cell" data-execution_count="45">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb46-1">pruned <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [r <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>]</span>
<span id="cb46-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(pruned)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="45">
<pre><code>2653</code></pre>
</div>
</div>
<div id="ccad849e" class="cell" data-execution_count="47">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1">comparison_plot(pruned,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_align_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-36-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="7297e2a6" class="cell" data-execution_count="50">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb49-1">comparison_plot(pruned,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CrippenO3A_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-37-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Make the same plot with the default O3A alignments that did not fail due to missing parameters:</p>
<div id="a28f98b7" class="cell" data-scrolled="false" data-execution_count="51">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb50-1">comparison_plot([x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pruned <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> x[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_ShapeTversky'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-38-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Here’s what that looks like with the failed alignments included:</p>
<div id="aa9a03bb" class="cell" data-execution_count="49">
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb51-1">comparison_plot(pruned,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'shape_tversky_index'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O3A_ShapeTversky'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3_files/figure-html/cell-39-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Based on this simple analysis, it looks like the Open3DAlign methods are doing better than the shape-based alignment. Given the differences in size between some of the ligands, this isn’t terribly surprising.</p>
<p>To really dig into alignment quality, we’d probably also want to look beyond just shape overlap and considere things like how far the aligned poses are from what is observed in the crystal using some kind of RMSD measure, but that’s for a possible future post.</p>


</section>

 ]]></description>
  <category>datasets</category>
  <category>3d</category>
  <category>superposition</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-11-23-working-with-lobster-3.html</guid>
  <pubDate>Sat, 22 Nov 2025 23:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/working-with-lobster-3.png" medium="image" type="image/png" height="57" width="144"/>
</item>
<item>
  <title>Working with the LOBSTER Data set II</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2.html</link>
  <description><![CDATA[ 




<p>This post builds on the <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-11-08-working-with-lobster-1.html">last post</a> and does a little bit of work with the LOBSTER data set.</p>
<p>I had planned something a bit more ambitious for this post, but while writing it I discovered a bug in the conformer generator that it took me a while to track down (it’s still not fixed) and that ate up all the time I had set aside for working on the blog this week. I’ll do another post soon.</p>
<div id="209d81ae" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-4">IPythonConsole.ipython_3d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> matplotlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb1-7">plt.style.use(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tableau-colorblind10'</span>)</span>
<span id="cb1-8">plt.rcParams[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'font.size'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'16'</span></span>
<span id="cb1-9"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>load_ext sql</span>
<span id="cb1-12"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.feedback<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span></code></pre></div>
</div>
<section id="generating-conformers-for-the-lobster-compounds" class="level1">
<h1>Generating conformers for the LOBSTER compounds</h1>
<div id="dc28d683" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> lwreg</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lwreg <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> utils</span></code></pre></div>
</div>
<p>Load our lwreg configuration from the database we created before:</p>
<div id="e184d2ce" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">config <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.configure_from_database(dbname<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'lobster_112024'</span>,dbtype<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'postgresql'</span>)</span>
<span id="cb3-2">lwreg.set_default_config(config)</span>
<span id="cb3-3"></span>
<span id="cb3-4">config</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>{'dbname': 'lobster_112024',
 'dbtype': 'postgresql',
 'cacheConnection': True,
 'standardization': 'none',
 'removeHs': 1,
 'useTautomerHashv2': 0,
 'registerConformers': 1,
 'numConformerDigits': 3,
 'lwregSchema': ''}</code></pre>
</div>
</div>
<div id="62550701" class="cell" data-execution_count="68">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb5-2">    select molregno,ntabs <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdk.descriptors where ntabs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb5-3">mrns,ntabs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>d)</span>
<span id="cb5-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(mrns)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="68">
<pre><code>2226</code></pre>
</div>
</div>
<div id="02756259" class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">cn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>(config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>config) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&lt; lwreg provides a convenience function to get a database connection</span></span>
<span id="cb7-2">curs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cn.cursor()</span>
<span id="cb7-3"></span>
<span id="cb7-4">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'create schema generated_data'</span>)</span>
<span id="cb7-5">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'''create table generated_data.confgen</span></span>
<span id="cb7-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">   (molregno integer references hashes, </span></span>
<span id="cb7-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    conf_id integer references conformers,</span></span>
<span id="cb7-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    method text)'''</span>)</span>
<span id="cb7-9">cn.commit()</span></code></pre></div>
</div>
<div id="acf187f4" class="cell" data-scrolled="true" data-execution_count="19">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">mbs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.retrieve(ids<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>mrns)</span></code></pre></div>
</div>
<div id="848420d2" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> tqdm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb9-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDistGeom</span>
<span id="cb9-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdBase</span></code></pre></div>
</div>
<p>Generate and store conformers</p>
<div id="74aaaffd" class="cell" data-execution_count="69">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">cn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>(config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>config) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&lt; lwreg provides a convenience function to get a database connection</span></span>
<span id="cb10-2">curs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cn.cursor()</span>
<span id="cb10-3"></span>
<span id="cb10-4">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdDistGeom.ETKDGv3()</span>
<span id="cb10-5">ps.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb10-6">ps.pruneRmsThresh <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span></span>
<span id="cb10-7">ps.numThreads <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span></span>
<span id="cb10-8"></span>
<span id="cb10-9"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> mrn,nt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(mrns,ntabs)):</span>
<span id="cb10-10">    mb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mbs[mrn][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb10-11">    mol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb,removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb10-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> mol:</span>
<span id="cb10-13">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(mrn)</span>
<span id="cb10-14">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb10-15">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> rdBase.BlockLogs():</span>
<span id="cb10-16">        cids <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdDistGeom.EmbedMultipleConfs(mol,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>nt,ps)</span>
<span id="cb10-17">    reg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.register_multiple_conformers(mol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>mol,fail_on_duplicate<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb10-18">    </span>
<span id="cb10-19">    rows <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(x,y,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ETKDGv3'</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x,y <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> reg]</span>
<span id="cb10-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> rows:</span>
<span id="cb10-21">        curs.executemany(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'insert into generated_data.confgen values (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)'</span>,rows)</span>
<span id="cb10-22">        cn.commit()</span>
<span id="cb10-23">    </span>
<span id="cb10-24">    </span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>2226it [21:11,  1.75it/s]</code></pre>
</div>
</div>
<div id="0ca393d2" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-2">    select molregno,nconfs,ntabs <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-3">    (select molregno, count(conf_id) nconfs <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> generated_data.confgen group by (molregno)) t1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb12-4">      join rdk.descriptors using (molregno) where ntabs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
</div>
<p>Number of conformers generated per compound:</p>
<div id="99620489" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1">plt.Figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb13-2">plt.hist([x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d],bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&lt; it's -1 because we also have the crystal conformer</span></span>
<span id="cb13-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#plt.plot((0,1000),(0,1000),'k-');</span></span>
<span id="cb13-4">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'nConfs generated'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2_files/figure-html/cell-11-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>nTABS is designed to be an upper limit on the number of conformers; let’s check to confirm that this works:</p>
<div id="b1b792fa" class="cell" data-scrolled="false" data-execution_count="7">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1">plt.Figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb14-2">plt.scatter([x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d],[x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d])<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb14-3">plt.plot((<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>),(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k-'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb14-4">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'nTABS'</span>)</span>
<span id="cb14-5">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'nConfs generated'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2_files/figure-html/cell-12-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>That works quite well in general, but let’s look at the two outliers</p>
<div id="bd8b3c8f" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-2">    select molregno,canonical_smiles,nconfs,ntabs <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-3">    (select molregno, count(conf_id) nconfs <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> generated_data.confgen group by (molregno)) t1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-4">      join rdk.descriptors using (molregno) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-5">      join hashes using (molregno) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-6">    where ntabs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> ntabs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">600</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> nconfs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>ntabs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
</div>
<div id="594d9ad2" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(d)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<pre><code>2</code></pre>
</div>
</div>
<div id="b4dceef7" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1">Draw.MolsToGridImage([Chem.MolFromSmiles(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d],subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>),</span>
<span id="cb18-2">                    legends<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'mrn:</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">/</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="12">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2_files/figure-html/cell-15-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>The underestimate of the number of conformers for compound 2980 are due to the triple bond. This is a known limitation of the TABS algorithm that was discussed in the paper and that we’re planning on working on.</p>
<p>The “extra” conformers observed in compound 298 are due to a bug in the conformer generation code that leads to non-physical conformers, I hope to have this one fixed in a future RDKit release.</p>
<p>Actually retrieve all of the conformers to see how we did at finding the crystal conformer in our conformer ensembles:</p>
<div id="cfc50fae" class="cell" data-execution_count="50">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb19-2">    select molregno,conf_id,molblock <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> conformers order by (molregno,conf_id) asc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
</div>
<p>Find the conformer for each molecule that’s closest to the crystal structure using RMSD as the metric:</p>
<div id="02367079" class="cell" data-execution_count="51">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolAlign</span>
<span id="cb20-2">accums <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb20-3">r_mrn,r_cid,r_mb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> d.pop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb20-4">r_mol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(r_mb)</span>
<span id="cb20-5">best <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e8</span></span>
<span id="cb20-6">best_cid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb20-7"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (mrn,cid,mb) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(d):</span>
<span id="cb20-8">    mol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb)</span>
<span id="cb20-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> mrn<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span>r_mrn:</span>
<span id="cb20-10">        rms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolAlign.GetBestRMS(r_mol,mol)</span>
<span id="cb20-11">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> rms<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>best:</span>
<span id="cb20-12">            best <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rms</span>
<span id="cb20-13">            best_cid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cid</span>
<span id="cb20-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb20-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> best_cid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb20-16">            accums[r_mrn] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (best,best_cid)</span>
<span id="cb20-17">        r_mrn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mrn</span>
<span id="cb20-18">        r_cid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cid</span>
<span id="cb20-19">        r_mol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> mol</span>
<span id="cb20-20">        best <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1e8</span></span>
<span id="cb20-21">        best_cid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 81315/81315 [00:14&lt;00:00, 5507.62it/s]</code></pre>
</div>
</div>
<div id="2c917233" class="cell" data-execution_count="52">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb22-2">ax <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> plt.subplot(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb22-3">plt.hist([x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> accums.values()],bins<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb22-4">plt.plot((<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>),(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">600</span>),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k--'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb22-5">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'RMSD ($</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">AA$)'</span>)</span>
<span id="cb22-6">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>)</span>
<span id="cb22-7">ax2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ax.twinx()</span>
<span id="cb22-8">ax2.ecdf([x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> accums.values()],c<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C1'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb22-9">ax2.grid()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2_files/figure-html/cell-18-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="3c671ff6" class="cell" data-execution_count="53">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb23-2">np.quantile([x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> accums.values()],[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="53">
<pre><code>array([0.50160024, 0.69941346, 0.88421735])</code></pre>
</div>
</div>
<p>60% of molecules are within 0.50A of the crystal structure, 80% are within 0.70A, and 90% are within 0.88A. That’s pretty good.</p>
</section>
<section id="comparing-shape-similarity-approaches" class="level1">
<h1>Comparing shape-similarity approaches</h1>
<p>The RDKit has a couple of built-in methods for doing shape similarity. Let’s compare those to each other on this data set:</p>
<p>Get a map from (nm,pdb) tuples to (molregno,confid,molblock):</p>
<div id="8ff64590" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb25-2">    select ligname,pdb,molregno,conf_id,molblock <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb25-3">    <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.all_ligands join conformers using (molregno,conf_id)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb25-4">ligs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb25-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> nm,pdb,mrn,cid,mb <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d:</span>
<span id="cb25-6">    mol <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb,removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb25-7">    mol_noh <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb)</span>
<span id="cb25-8">    ligs[(nm,pdb.lower())] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (mrn,cid,mb,mol,mol_noh)</span></code></pre></div>
</div>
<div id="be748ca6" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1">pairstats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb26-2">    select <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.pair_stats</span>
<span id="cb26-3">pairstats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(pairstats.dicts())</span></code></pre></div>
</div>
<p>Note that the scores for the ligand pairs in the LOBSTER data set are each present twice: 1. Ligand1 as reference, Ligand2 as probe 1. Ligand2 as reference, Ligand1 as probe</p>
<p>We can see this here:</p>
<div id="d5aea725" class="cell" data-execution_count="40">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb27-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(pairstats)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="40">
<pre><code>72592</code></pre>
</div>
</div>
<div id="aa61ea56" class="cell" data-execution_count="42">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb29-2">    select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.pair_stats where (ligname1,pdb1)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>(ligname2,pdb2)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="42">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>36296</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="b3a94c7d" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb30-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> stats</span></code></pre></div>
</div>
<div id="b00099e2" class="cell" data-execution_count="43">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> comparison_plot(pairstats,metric1,metric2,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>):</span>
<span id="cb31-2">    x1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x[metric1] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats]</span>
<span id="cb31-3">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> invert1:</span>
<span id="cb31-4">        x1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> x1]</span>
<span id="cb31-5">        metric1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'1-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>metric1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span></span>
<span id="cb31-6">    x2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x[metric2] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats]</span>
<span id="cb31-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> invert2:</span>
<span id="cb31-8">        x2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> x2]</span>
<span id="cb31-9">        metric2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'1-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>metric2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span></span>
<span id="cb31-10">    r,_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.spearmanr(x1,x2)</span>
<span id="cb31-11">    tau,_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.kendalltau(x1,x2)</span>
<span id="cb31-12"></span>
<span id="cb31-13">    plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb31-14">    plt.subplot(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb31-15">    plt.scatter(x1,x2,alpha<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>,s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb31-16">    plt.xlabel(metric1)</span>
<span id="cb31-17">    plt.ylabel(metric2)</span>
<span id="cb31-18">    plt.title(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'rho=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>r<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">, tau=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tau<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.2f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb31-19">    plt.subplot(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb31-20">    plt.hexbin(x1,x2,cmap<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Blues'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb31-21">    plt.tight_layout()</span></code></pre></div>
</div>
<p>Calculate USR scores:</p>
<div id="57f86f7a" class="cell" data-execution_count="46">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb32-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdMolDescriptors</span>
<span id="cb32-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> d <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats:</span>
<span id="cb32-3">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb32-4">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb32-5">    usr1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m1)</span>
<span id="cb32-6">    usr2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m2)</span>
<span id="cb32-7">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSRScore(usr1,usr2)</span>
<span id="cb32-8">    </span>
<span id="cb32-9">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]</span>
<span id="cb32-10">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]</span>
<span id="cb32-11">    usr1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m1)</span>
<span id="cb32-12">    usr2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSR(m2)</span>
<span id="cb32-13">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_noh'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdMolDescriptors.GetUSRScore(usr1,usr2)    </span></code></pre></div>
</div>
<p>Calculate atom pair fingerprints using 3D distances instead of topological distances:</p>
<div id="569d51cd" class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdFingerprintGenerator</span>
<span id="cb33-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> DataStructs</span>
<span id="cb33-3">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetAtomPairGenerator(use2D<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb33-4"></span>
<span id="cb33-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> d <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats:</span>
<span id="cb33-6">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb33-7">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb33-8">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m1)</span>
<span id="cb33-9">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m2)</span>
<span id="cb33-10">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataStructs.DiceSimilarity(fp1,fp2,returnDistance<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb33-11">    </span>
<span id="cb33-12">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]</span>
<span id="cb33-13">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]</span>
<span id="cb33-14">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m1)</span>
<span id="cb33-15">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fpg.GetCountFingerprint(m2)</span>
<span id="cb33-16">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_noh'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataStructs.DiceSimilarity(fp1,fp2,returnDistance<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb33-17"> </span></code></pre></div>
</div>
<div id="da819c48" class="cell" data-execution_count="91">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb34-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2_files/figure-html/cell-28-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="5ecb15b9" class="cell" data-execution_count="92">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_noh'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_noh'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2_files/figure-html/cell-29-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Finally, try the E3FP fingerprint (a 3D analog of the Morgan fingerprint) from <a href="http://dx.doi.org/10.1021/acs.jmedchem.7b00696">this paper</a>. The code is easily pip installable directly from <a href="https://github.com/keiserlab/e3fp">the github repo</a>. Thanks to the Keiser lab team for making this so easy!</p>
<p>It takes a while to run this one:</p>
<div id="28aa5fd3" class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb36-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> logging</span>
<span id="cb36-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> e3fp.pipeline <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> fprints_from_mol</span>
<span id="cb36-3">logging.disable(logging.INFO)</span>
<span id="cb36-4"></span>
<span id="cb36-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> get_fp(m):</span>
<span id="cb36-6">    fp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> fprints_from_mol(m,fprint_params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'counts'</span>:<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>})[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb36-7">    rdkfp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataStructs.ULongSparseIntVect(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>)</span>
<span id="cb36-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k,v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> fp.counts.items():</span>
<span id="cb36-9">        rdkfp[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(k)] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> v </span>
<span id="cb36-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> rdkfp</span>
<span id="cb36-11"></span>
<span id="cb36-12"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> d <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats:</span>
<span id="cb36-13">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb36-14">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb36-15">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m1)</span>
<span id="cb36-16">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m2)</span>
<span id="cb36-17">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataStructs.DiceSimilarity(fp1,fp2,returnDistance<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb36-18">    </span>
<span id="cb36-19">    m1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]</span>
<span id="cb36-20">    m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ligs[(d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>])][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]</span>
<span id="cb36-21">    fp1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m1)</span>
<span id="cb36-22">    fp2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> get_fp(m2)</span>
<span id="cb36-23">    d[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_noh'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataStructs.DiceSimilarity(fp1,fp2,returnDistance<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb36-24"> </span></code></pre></div>
</div>
<div id="e05e84c5" class="cell" data-execution_count="93">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2_files/figure-html/cell-31-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="9a6071d9" class="cell" data-scrolled="false" data-execution_count="97">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb38-1">comparison_plot(pairstats,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_noh'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_noh'</span>,invert1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,invert2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2_files/figure-html/cell-32-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Save the results in the database so that we can work with them later:</p>
<div id="e2a924fc" class="cell" data-execution_count="98">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb39-1">cn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>(config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>config) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&lt; lwreg provides a convenience function to get a database connection</span></span>
<span id="cb39-2">curs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cn.cursor()</span>
<span id="cb39-3"></span>
<span id="cb39-4">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'create schema if not exists results'</span>)</span>
<span id="cb39-5">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'create table if not exists results.shape_scores </span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb39-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  (ligname1 text, pdb1 text, ligname2 text, pdb2 text,</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb39-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">   USR float, USR_noh float, AP3D float, AP3D_noh float, E3P float, E3P_noh float)'</span>)</span>
<span id="cb39-8">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'delete from results.shape_scores'</span>)</span>
<span id="cb39-9">rows <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [[r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname1'</span>],r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb1'</span>],r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ligname2'</span>],r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb2'</span>],<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb39-10">         r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR'</span>],r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'USR_noh'</span>],r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D'</span>],r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'AP3D_noh'</span>],r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP'</span>],r[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'E3FP_noh'</span>]] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pairstats]</span>
<span id="cb39-11">curs.executemany(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'insert into results.shape_scores values (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)'</span>,rows)</span>
<span id="cb39-12">cn.commit()</span></code></pre></div>
</div>
<p>That’s it for this week</p>


</section>

 ]]></description>
  <category>datasets</category>
  <category>3d</category>
  <category>similarity</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-11-16-working-with-lobster-2.html</guid>
  <pubDate>Sat, 15 Nov 2025 23:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/working-with-lobster-2.png" medium="image" type="image/png" height="70" width="144"/>
</item>
<item>
  <title>Working with the LOBSTER Data set I</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-11-08-working-with-lobster-1.html</link>
  <description><![CDATA[ 




<p>Last year the Rarey and BioSolveIT groups published <a href="https://doi.org/10.1007/s10822-024-00581-1">a paper</a> describing LOBSTER, a data set of small molecule overlays drawn from the PDB. The carefully curated and constructed data set is intended to be a new benchmarking set for testing molecular alignment (superposition) tools. I’m really happy to have an up-to-date replacement for the older <a href="https://doi.org/10.1021/ci400020a">“AZ set”</a>, which is what I’ve previously always used when looking at alignment. The well-written paper (as one expects from the Rarey group!) is definitely worth reading in order to understand the decisions made in the curation workflow. The code used for the curation is available in GitHub and the data set itself can be downloaded from Zenodo. There are links in the (open access) paper to both places.</p>
<p>I have a number of ideas for experiments to do using the LOBSTER set, so I wanted to get it loaded up into a database I could use for those experiments. This blog post demonstrates how I did that.</p>
<p>I’m using our <a href="https://pubs.acs.org/doi/full/10.1021/acs.jcim.4c01133">lwreg</a> tool to handle registering the compound structures and to provide the basic schema for the database. You can <code>pip install</code> it from our <a href="https://github.com/rinikerlab/lightweight-registration">github repo</a> I’ve blogged about <a href="https://greglandrum.github.io/rdkit-blog/posts/2024-10-31-lwreg-and-the-cartridge.html">using lwreg together with the RDKit PostgreSQL cartridge</a> before.</p>
<div id="209d81ae" class="cell" data-execution_count="33">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-4">IPythonConsole.ipython_3d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb1-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>load_ext sql</span>
<span id="cb1-6"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>config SqlMagic.feedback<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>The sql extension is already loaded. To reload it, use:
  %reload_ext sql</code></pre>
</div>
</div>
<section id="registering-the-structures-and-populating-the-data-base" class="level1">
<h1>Registering the structures and populating the data base</h1>
<div id="dc28d683" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> lwreg</span>
<span id="cb3-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lwreg <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> utils</span></code></pre></div>
</div>
<p>At the command line I created the postgres database:</p>
<pre><code>% createdb lobster_112024</code></pre>
<div id="e184d2ce" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">config <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.defaultConfig()</span>
<span id="cb5-2">config</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>{'dbname': './testdb.sqlt',
 'dbtype': 'sqlite3',
 'standardization': 'fragment',
 'removeHs': 1,
 'useTautomerHashv2': 0,
 'registerConformers': 0,
 'numConformerDigits': 3,
 'lwregSchema': '',
 'cacheConnection': True}</code></pre>
</div>
</div>
<p>Configure lwreg to work with the database I created. We’ll turn off standardization and switch in to “registerConformers” mode.</p>
<div id="ef4d901f" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">config[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dbtype'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'postgresql'</span></span>
<span id="cb7-2">config[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'dbname'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'lobster_112024'</span></span>
<span id="cb7-3">config[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'standardization'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'none'</span></span>
<span id="cb7-4">config[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'registerConformers'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb7-5">lwreg.set_default_config(config)</span></code></pre></div>
</div>
<div id="6257a1dd" class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">lwreg.initdb()</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>This will destroy any existing information in the registration database.
  are you sure? [yes/no]: yes</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="20">
<pre><code>True</code></pre>
</div>
</div>
<p>I downloaded the LOBSTER data from zenodo and extracted the zipfile locally.</p>
<p>Read in the SDFs containing the ligand structures and registering each 3D structure in lwreg:</p>
<div id="c6220795" class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> glob</span>
<span id="cb11-2">sdfs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> glob.glob(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'/scratch/Data/LOBSTER_112024/all_ligands/*.sdf'</span>)</span>
<span id="cb11-3">registered <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb11-4"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> sdf <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> sdfs:</span>
<span id="cb11-5">    ms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> Chem.SDMolSupplier(sdf,removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>]</span>
<span id="cb11-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">assert</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(ms)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb11-7">    m <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ms[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb11-8">    ligname <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m.GetProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'_Name'</span>)</span>
<span id="cb11-9">    ligpdb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m.GetProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'pdb_id'</span>)</span>
<span id="cb11-10">    naomi_smis <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m.GetProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'usmiles'</span>)</span>
<span id="cb11-11">    mrn,confid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lwreg.register(mol<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>m)</span>
<span id="cb11-12">    registered.append(((mrn,confid),ligname,ligpdb,naomi_smis,sdf))</span></code></pre></div>
</div>
<p>Let’s store the additional information from the LOBSTER data set in a separate table in the database:</p>
<div id="97af31fd" class="cell" data-execution_count="60">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1">cn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>(config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>config) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#&lt; lwreg provides a convenience function to get a database connection</span></span>
<span id="cb12-2">curs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cn.cursor()</span>
<span id="cb12-3"></span>
<span id="cb12-4">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'create schema lobster_data'</span>)</span>
<span id="cb12-5">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'''create table lobster_data.all_ligands</span></span>
<span id="cb12-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">   (molregno integer references hashes, </span></span>
<span id="cb12-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    conf_id integer references conformers,</span></span>
<span id="cb12-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    ligname text, pdb text, naomi_smiles text, filename text)'''</span>)</span>
<span id="cb12-9">rows <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> registered]</span>
<span id="cb12-10">curs.executemany(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'insert into lobster_data.all_ligands values (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)'</span>,rows)</span>
<span id="cb12-11">cn.commit()</span></code></pre></div>
</div>
<div id="c005775f" class="cell" data-scrolled="true" data-execution_count="61">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb13-2">    select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.all_ligands<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1 rows affected.</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="61">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>3583</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Install the RDKit catridge, load the molecules, and create an index on the molecule column so that we can do efficient substructure searching. More information about that in <a href="https://greglandrum.github.io/rdkit-blog/posts/2024-10-31-lwreg-and-the-cartridge.html">this blog post</a></p>
<div id="e12e9ba8" class="cell" data-execution_count="110">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-2">  create extension <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> exists rdkit<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-3">    create schema <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> exists rdk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb15-4">    drop table <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> exists rdk.mols cascade<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;\</span></span>
<span id="cb15-5">    select molregno,mol_from_ctab(molblock::cstring,false) m into rdk.mols <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> molblocks<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;\</span></span>
<span id="cb15-6">    create index molidx on rdk.mols using gist(m)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="110">
<pre><code>[]</code></pre>
</div>
</div>
<p>Let’s add some fingerprints so that we can do similarity searches later. We’ll add both bit- and count-based Morgan fingerprints with radius 3:</p>
<div id="e815c231" class="cell" data-execution_count="112">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb17-2">    drop table <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> exists rdk.fps<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;\</span></span>
<span id="cb17-3">  select molregno,morganbv_fp(m,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> mfp3, morgan_fp(m,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) cfp3 into rdk.fps <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdk.mols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;\</span></span>
<span id="cb17-4">    create index fps_mfp3_idx on rdk.fps using gist(mfp3)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb17-5">    create index fps_cfp3_idx on rdk.fps using gist(cfp3)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="112">
<pre><code>[]</code></pre>
</div>
</div>
<p>And now add some descriptors we might want to use:</p>
<div id="0f7166fa" class="cell" data-execution_count="114">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb19-2">    drop table <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> exists rdk.descriptors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;\</span></span>
<span id="cb19-3">  select molregno,mol_numheavyatoms(m) nhvy, mol_amw(m) amw, mol_numrotatablebonds(m) nrot into rdk.descriptors <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdk.mols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="114">
<pre><code>[]</code></pre>
</div>
</div>
<p>Add a column with nTABS, a measure of molecular flexibility developed by Jessica Braun in our lab (Jessica is also the other author/developer of lwreg). There’s <a href="https://pubs.acs.org/doi/10.1021/acs.jcim.4c01513">a paper</a> and a <a href="https://github.com/rinikerlab/TorsionAngularBinStrings">github repo</a> from which you can install the code.</p>
<div id="ee0ec1c9" class="cell" data-execution_count="120">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tabs</span>
<span id="cb21-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb21-3">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb21-4">  select molregno,molblock <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> molblocks</span>
<span id="cb21-5"></span>
<span id="cb21-6">ntabs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb21-7"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> mrn,mb <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm.tqdm(d):</span>
<span id="cb21-8">    m <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(mb,removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb21-9">    nt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tabs.GetnTABS(m)</span>
<span id="cb21-10">    ntabs.append((mrn,nt))  </span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 3218/3218 [00:13&lt;00:00, 245.40it/s]</code></pre>
</div>
</div>
<p>Add the nTABS values to the database:</p>
<div id="69a2b6a2" class="cell" data-execution_count="123">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1">cn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>(config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>config)</span>
<span id="cb23-2">curs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cn.cursor()</span>
<span id="cb23-3"></span>
<span id="cb23-4">intabs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(y,x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x,y <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ntabs]</span>
<span id="cb23-5">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'alter table rdk.descriptors add column ntabs int'</span>)</span>
<span id="cb23-6">curs.executemany(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'update rdk.descriptors set ntabs=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> where molregno=</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>,intabs)</span>
<span id="cb23-7">cn.commit()</span></code></pre></div>
</div>
<p>Load data about pairs of aligned ligands from the LOBSTER data:</p>
<div id="fd406172" class="cell" data-execution_count="63">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1">ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x.strip().split(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">';'</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'/scratch/Data/LOBSTER_112024/stats/pair_stats.csv'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'r'</span>)]</span>
<span id="cb24-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(ind)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="63">
<pre><code>72593</code></pre>
</div>
</div>
<div id="41003d68" class="cell" data-execution_count="64">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1">ind.pop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="64">
<pre><code>['query_file',
 'query',
 'pdb',
 'template_file',
 'template',
 'template_pdb',
 'ensemble',
 'morgan_fp_tanimoto',
 'gobbi_2D_pharmacophore_fp_tanimoto',
 'hac_difference',
 'shape_tversky_index',
 'shape_tanimoto_distance',
 'shape_protrude_distance',
 '']</code></pre>
</div>
</div>
<p>And add that to the database too:</p>
<div id="f5ba4bd9" class="cell" data-execution_count="80">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1">cn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> utils.<span class="ex" style="color: null;
background-color: null;
font-style: inherit;">connect</span>(config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>config)</span>
<span id="cb28-2">curs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> cn.cursor()</span>
<span id="cb28-3"></span>
<span id="cb28-4">curs.execute(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'''create table lobster_data.pair_stats</span></span>
<span id="cb28-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">   (ligname1 text, pdb1 text, ligname2 text, pdb2 text, ensemble text,</span></span>
<span id="cb28-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    morgan_fp_tanimoto float, gobbi_2D_pharmacophore_fp_tanimoto float,</span></span>
<span id="cb28-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    hac_difference int, shape_tversky_index float, shape_tanimoto_distance float,</span></span>
<span id="cb28-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    shape_protrude_distance float)'''</span>)</span>
<span id="cb28-9">rows <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>],x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>],<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>]),<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>]),<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>])),<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>]),<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb28-10">         <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span>]),<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>]),) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ind]</span>
<span id="cb28-11">curs.executemany(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'insert into lobster_data.pair_stats values (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">,</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%s</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)'</span>,rows)</span>
<span id="cb28-12">cn.commit()</span></code></pre></div>
</div>
</section>
<section id="composition-of-the-database" class="level1">
<h1>Composition of the database</h1>
<p>Now let’s look at what we’ve got:</p>
<div id="026bb964" class="cell" data-scrolled="true" data-execution_count="22">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb29-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(registered)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="22">
<pre><code>3583</code></pre>
</div>
</div>
<p>Number of unique molregnos:</p>
<div id="28697517" class="cell" data-execution_count="128">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb31-2">  select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> hashes</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="128">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>3218</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Number of conformers (this is the number we registered):</p>
<div id="12006a66" class="cell" data-execution_count="129">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb32-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb32-2">  select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> conformers</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="129">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>3583</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Number of pairs of aligned molecules:</p>
<div id="789394cf" class="cell" data-scrolled="true" data-execution_count="125">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb33-2">  select count(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.pair_stats</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="125">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>72592</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>Number of unique naomi_smiles:</p>
<div id="577bd08f" class="cell" data-execution_count="126">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb34-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb34-2">  select count(distinct naomi_smiles) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.all_ligands</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="126">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>3212</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>We’ll look at the mismatch between the number of naomi_smiles and the number of molecule hashes below</p>
</section>
<section id="retrieving-info-from-the-database" class="level1">
<h1>Retrieving info from the database</h1>
<p>Get all of the molecule hashes for one particular molregno using lwreg functions:</p>
<div id="1795724e" class="cell" data-execution_count="32">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1">lwreg.retrieve(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">id</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>,as_hashes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="32">
<pre><code>{12: {'fullhash': '6d22ca45dd6fb8f29d263532db4f913e7da47d6a',
  'formula': 'C18H16N6',
  'canonical_smiles': 'CN(c1cnc2nc(N)nc(N)c2c1)c1cccc2ccccc12',
  'no_stereo_smiles': 'CN(c1cnc2nc(N)nc(N)c2c1)c1cccc2ccccc12',
  'tautomer_hash': 'CN([C]1[CH][N][C]2[N][C]([N])[N][C]([N])[C]2[CH]1)[C]1[CH][CH][CH][C]2[CH][CH][CH][CH][C]21_4_0',
  'no_stereo_tautomer_hash': 'CN([C]1[CH][N][C]2[N][C]([N])[N][C]([N])[C]2[CH]1)[C]1[CH][CH][CH][C]2[CH][CH][CH][CH][C]21_4_0',
  'escape': '',
  'sgroup_data': '[]'}}</code></pre>
</div>
</div>
<p>We can see here that the Hs have been removed by lwreg when calculating the hashes used for comparing 2D molecules to each other.</p>
<p>The molecules stored in the database still have their Hs. We can see this by retrieving the registered mol block for the molecule:</p>
<div id="c8a2c152" class="cell" data-execution_count="48">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1">l <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromMolBlock(lwreg.retrieve(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">id</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb37-2">l</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17625757959049916" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17625757959049916" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17625757959049916 = null;
var warn = document.getElementById("3dmolwarning_17625757959049916");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17625757959049916 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17625757959049916"),{backgroundColor:"white"});
viewer_17625757959049916.zoomTo();
    viewer_17625757959049916.removeAllModels();
    viewer_17625757959049916.addModel("33M_D_301\n     RDKit          3D\n\n 40 43  0  0  0  0  0  0  0  0999 V2000\n    9.7259   -3.2696   11.6873 N   0  0  0  0  0  0  0  0  0  0  0  0\n   10.4981    0.3535   14.5814 N   0  0  0  0  0  0  0  0  0  0  0  0\n   13.4437   -0.6379   10.9108 N   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0654   -1.4201   13.1743 N   0  0  0  0  0  0  0  0  0  0  0  0\n   11.5849   -1.9173   11.4370 N   0  0  0  0  0  0  0  0  0  0  0  0\n   14.8172    2.3874   12.4647 N   0  0  0  0  0  0  0  0  0  0  0  0\n   14.5309    3.3039   13.6379 C   0  0  0  0  0  0  0  0  0  0  0  0\n   14.2607    0.4413   11.2177 C   0  0  0  0  0  0  0  0  0  0  0  0\n   15.2482    4.2831    7.7614 C   0  0  0  0  0  0  0  0  0  0  0  0\n   14.1349    4.0651    8.6112 C   0  0  0  0  0  0  0  0  0  0  0  0\n   18.3005    2.5330   11.1357 C   0  0  0  0  0  0  0  0  0  0  0  0\n   17.2061    2.3421   12.0385 C   0  0  0  0  0  0  0  0  0  0  0  0\n   16.5568    3.9787    8.1964 C   0  0  0  0  0  0  0  0  0  0  0  0\n   18.0993    3.1132    9.8675 C   0  0  0  0  0  0  0  0  0  0  0  0\n   14.3683    3.5412    9.8793 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.8406    1.0733   13.0796 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.4450   -2.2057   12.1256 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.8450   -0.3774   13.5337 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.3603   -0.8452   11.7042 C   0  0  0  0  0  0  0  0  0  0  0  0\n   13.9843    1.3211   12.2560 C   0  0  0  0  0  0  0  0  0  0  0  0\n   15.8990    2.6320   11.6222 C   0  0  0  0  0  0  0  0  0  0  0  0\n   16.7856    3.4355    9.4819 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0162   -0.0199   12.7773 C   0  0  0  0  0  0  0  0  0  0  0  0\n   15.6745    3.1891   10.3410 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0542   -3.8100   10.9126 H   0  0  0  0  0  0  0  0  0  0  0  0\n    8.8677   -3.5133   12.1390 H   0  0  0  0  0  0  0  0  0  0  0  0\n   11.0651    1.1287   14.8600 H   0  0  0  0  0  0  0  0  0  0  0  0\n    9.6692    0.1305   15.0943 H   0  0  0  0  0  0  0  0  0  0  0  0\n   15.2152    4.0326   13.6663 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.6209    3.7051   13.5331 H   0  0  0  0  0  0  0  0  0  0  0  0\n   14.5678    2.7786   14.4880 H   0  0  0  0  0  0  0  0  0  0  0  0\n   15.0826    0.5916   10.6682 H   0  0  0  0  0  0  0  0  0  0  0  0\n   15.1056    4.6563    6.8447 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.2070    4.2817    8.3078 H   0  0  0  0  0  0  0  0  0  0  0  0\n   19.2201    2.2507   11.4088 H   0  0  0  0  0  0  0  0  0  0  0  0\n   17.3735    2.0028   12.9641 H   0  0  0  0  0  0  0  0  0  0  0  0\n   17.3315    4.1495    7.5876 H   0  0  0  0  0  0  0  0  0  0  0  0\n   18.8710    3.2914    9.2570 H   0  0  0  0  0  0  0  0  0  0  0  0\n   13.5903    3.4055   10.4927 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.6333    1.6684   13.8560 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 17  1  0\n  2 18  1  0\n  3  8  2  0\n  3 19  1  0\n  4 17  2  0\n  4 18  1  0\n  5 17  1  0\n  5 19  2  0\n  6  7  1  0\n  6 20  1  0\n  6 21  1  0\n  8 20  1  0\n  9 10  2  0\n  9 13  1  0\n 10 15  1  0\n 11 12  1  0\n 11 14  2  0\n 12 21  2  0\n 13 22  2  0\n 14 22  1  0\n 15 24  2  0\n 16 20  2  0\n 16 23  1  0\n 18 23  2  0\n 19 23  1  0\n 21 24  1  0\n 22 24  1  0\n  1 25  1  0\n  1 26  1  0\n  2 27  1  0\n  2 28  1  0\n  7 29  1  0\n  7 30  1  0\n  7 31  1  0\n  8 32  1  0\n  9 33  1  0\n 10 34  1  0\n 11 35  1  0\n 12 36  1  0\n 13 37  1  0\n 14 38  1  0\n 15 39  1  0\n 16 40  1  0\nM  END\n","sdf");
    viewer_17625757959049916.setStyle({"stick": {}});
    viewer_17625757959049916.setBackgroundColor("0xeeeeee");
    viewer_17625757959049916.zoomTo();
viewer_17625757959049916.render();
});
</script>
</div>
<div class="cell-output cell-output-display" data-execution_count="48">

</div>
</div>
<p>Ligands that appear multiple times in the data set will have the same molregno but different confids. Let’s find those:</p>
<div id="b5873f43" class="cell" data-execution_count="40">
<div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb38-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> collections <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> defaultdict</span>
<span id="cb38-2"></span>
<span id="cb38-3">repeated_ligands <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> defaultdict(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>)</span>
<span id="cb38-4"></span>
<span id="cb38-5">seen <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>()</span>
<span id="cb38-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> tpl <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> registered:</span>
<span id="cb38-7">    mrn,cid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tpl[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb38-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> mrn <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> seen:</span>
<span id="cb38-9">        repeated_ligands[mrn].append(cid)</span>
<span id="cb38-10">    seen.add(mrn)</span>
<span id="cb38-11"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(repeated_ligands)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="40">
<pre><code>219</code></pre>
</div>
</div>
<p>Get the molregnos for the molecules that have more than two conformers present:</p>
<div id="4956cc46" class="cell" data-execution_count="41">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb40-1">[k <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k,v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> repeated_ligands.items() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(v)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="41">
<pre><code>[84,
 180,
 110,
 77,
 68,
 92,
 78,
 503,
 583,
 53,
 384,
 568,
 296,
 565,
 281,
 228,
 715,
 1016,
 1335,
 1123,
 795,
 454,
 936,
 1029,
 1317,
 267,
 1601]</code></pre>
</div>
</div>
<p>Each entry in the <code>repeated_ligands</code> dictionary includes the conf_ids of the conformers:</p>
<div id="932a5ea3" class="cell" data-execution_count="42">
<div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb42-1">repeated_ligands[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">84</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="42">
<pre><code>[207, 215, 802, 968, 2150]</code></pre>
</div>
</div>
<p>We can get the conformers themselves by calling <code>lwreg.retrieve()</code> with a list of (molregno,conf_id) tuples:</p>
<div id="997c6b8e" class="cell" data-execution_count="43">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb44-1">k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">84</span></span>
<span id="cb44-2">confs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lwreg.retrieve(ids<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[(k,x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> repeated_ligands[k]])</span></code></pre></div>
</div>
<div id="98de086c" class="cell" data-execution_count="47">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb45-1">confs[(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">84</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">207</span>)]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="47">
<pre><code>('AZM_E_303\n     RDKit          3D\n\n  0  0  0  0  0  0  0  0  0  0999 V3000\nM  V30 BEGIN CTAB\nM  V30 COUNTS 19 19 0 0 0\nM  V30 BEGIN ATOM\nM  V30 1 S 9.311200 -40.854300 -76.602200 0\nM  V30 2 S 10.494600 -38.554800 -74.957200 0\nM  V30 3 O 9.730100 -37.268900 -74.826900 0\nM  V30 4 O 10.160500 -39.565300 -73.911800 0\nM  V30 5 O 8.149900 -43.309900 -77.530300 0\nM  V30 6 N 12.092600 -38.206500 -74.914100 0\nM  V30 7 N 10.151000 -38.844000 -77.734300 0\nM  V30 8 N 9.709100 -39.647400 -78.701300 0\nM  V30 9 N 8.662000 -41.847200 -79.186200 0\nM  V30 10 C 7.602600 -44.129200 -79.665200 0\nM  V30 11 C 10.029000 -39.323500 -76.470700 0\nM  V30 12 C 9.187000 -40.841600 -78.320400 0\nM  V30 13 C 8.151000 -43.081400 -78.698500 0\nM  V30 14 H 12.620400 -39.051600 -74.999600 0\nM  V30 15 H 12.320600 -37.593200 -75.670300 0\nM  V30 16 H 8.653100 -41.677800 -80.171700 0\nM  V30 17 H 7.673700 -43.787600 -80.602400 0\nM  V30 18 H 8.131400 -44.973300 -79.577000 0\nM  V30 19 H 6.643600 -44.312500 -79.449200 0\nM  V30 END ATOM\nM  V30 BEGIN BOND\nM  V30 1 1 1 11\nM  V30 2 1 1 12\nM  V30 3 2 2 3\nM  V30 4 2 2 4\nM  V30 5 1 2 6\nM  V30 6 1 2 11\nM  V30 7 2 5 13\nM  V30 8 1 7 8\nM  V30 9 2 7 11\nM  V30 10 2 8 12\nM  V30 11 1 9 12\nM  V30 12 1 9 13\nM  V30 13 1 10 13\nM  V30 14 1 6 14\nM  V30 15 1 6 15\nM  V30 16 1 9 16\nM  V30 17 1 10 17\nM  V30 18 1 10 18\nM  V30 19 1 10 19\nM  V30 END BOND\nM  V30 END CTAB\nM  END\n',
 'mol')</code></pre>
</div>
</div>
<p>Convert all of those mol blocks into RDKit molecules:</p>
<div id="5a7d19e1" class="cell" data-execution_count="50">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb47-1">conf_mols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromMolBlock(confs[x][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> confs]</span></code></pre></div>
</div>
<p>And look at the first couple:</p>
<div id="9dcf04f8" class="cell" data-scrolled="true" data-execution_count="51">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1">conf_mols[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17625759327529206" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17625759327529206" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17625759327529206 = null;
var warn = document.getElementById("3dmolwarning_17625759327529206");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17625759327529206 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17625759327529206"),{backgroundColor:"white"});
viewer_17625759327529206.zoomTo();
    viewer_17625759327529206.removeAllModels();
    viewer_17625759327529206.addModel("AZM_E_303\n     RDKit          3D\n\n 19 19  0  0  0  0  0  0  0  0999 V2000\n    9.3112  -40.8543  -76.6022 S   0  0  0  0  0  0  0  0  0  0  0  0\n   10.4946  -38.5548  -74.9572 S   0  0  0  0  0  0  0  0  0  0  0  0\n    9.7301  -37.2689  -74.8269 O   0  0  0  0  0  0  0  0  0  0  0  0\n   10.1605  -39.5653  -73.9118 O   0  0  0  0  0  0  0  0  0  0  0  0\n    8.1499  -43.3099  -77.5303 O   0  0  0  0  0  0  0  0  0  0  0  0\n   12.0926  -38.2065  -74.9141 N   0  0  0  0  0  0  0  0  0  0  0  0\n   10.1510  -38.8440  -77.7343 N   0  0  0  0  0  0  0  0  0  0  0  0\n    9.7091  -39.6474  -78.7013 N   0  0  0  0  0  0  0  0  0  0  0  0\n    8.6620  -41.8472  -79.1862 N   0  0  0  0  0  0  0  0  0  0  0  0\n    7.6026  -44.1292  -79.6652 C   0  0  0  0  0  0  0  0  0  0  0  0\n   10.0290  -39.3235  -76.4707 C   0  0  0  0  0  0  0  0  0  0  0  0\n    9.1870  -40.8416  -78.3204 C   0  0  0  0  0  0  0  0  0  0  0  0\n    8.1510  -43.0814  -78.6985 C   0  0  0  0  0  0  0  0  0  0  0  0\n   12.6204  -39.0516  -74.9996 H   0  0  0  0  0  0  0  0  0  0  0  0\n   12.3206  -37.5932  -75.6703 H   0  0  0  0  0  0  0  0  0  0  0  0\n    8.6531  -41.6778  -80.1717 H   0  0  0  0  0  0  0  0  0  0  0  0\n    7.6737  -43.7876  -80.6024 H   0  0  0  0  0  0  0  0  0  0  0  0\n    8.1314  -44.9733  -79.5770 H   0  0  0  0  0  0  0  0  0  0  0  0\n    6.6436  -44.3125  -79.4492 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 11  1  0\n  1 12  1  0\n  2  3  2  0\n  2  4  2  0\n  2  6  1  0\n  2 11  1  0\n  5 13  2  0\n  7  8  1  0\n  7 11  2  0\n  8 12  2  0\n  9 12  1  0\n  9 13  1  0\n 10 13  1  0\n  6 14  1  0\n  6 15  1  0\n  9 16  1  0\n 10 17  1  0\n 10 18  1  0\n 10 19  1  0\nM  END\n","sdf");
    viewer_17625759327529206.setStyle({"stick": {}});
    viewer_17625759327529206.setBackgroundColor("0xeeeeee");
    viewer_17625759327529206.zoomTo();
viewer_17625759327529206.render();
});
</script>
</div>
<div class="cell-output cell-output-display" data-execution_count="51">

</div>
</div>
<div id="bd90af6d" class="cell" data-execution_count="52">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb49-1">conf_mols[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17625759407695315" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17625759407695315" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17625759407695315 = null;
var warn = document.getElementById("3dmolwarning_17625759407695315");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17625759407695315 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17625759407695315"),{backgroundColor:"white"});
viewer_17625759407695315.zoomTo();
    viewer_17625759407695315.removeAllModels();
    viewer_17625759407695315.addModel("AZM_A_262\n     RDKit          3D\n\n 19 19  0  0  0  0  0  0  0  0999 V2000\n    1.6042   70.8027   56.8891 S   0  0  0  0  0  0  0  0  0  0  0  0\n    3.2937   72.4472   58.7703 S   0  0  0  0  0  0  0  0  0  0  0  0\n    3.3750   71.0598   59.2463 O   0  0  0  0  0  0  0  0  0  0  0  0\n    2.6993   73.3963   59.6063 O   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.1118   69.2710   56.3052 O   0  0  0  0  0  0  0  0  0  0  0  0\n    4.7494   72.8722   58.3430 N   0  0  0  0  0  0  0  0  0  0  0  0\n    2.3388   73.1756   56.3776 N   0  0  0  0  0  0  0  0  0  0  0  0\n    1.7295   72.7252   55.2524 N   0  0  0  0  0  0  0  0  0  0  0  0\n    0.2324   70.8034   54.6557 N   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.1568   68.8399   54.2799 C   0  0  0  0  0  0  0  0  0  0  0  0\n    2.3965   72.2531   57.2632 C   0  0  0  0  0  0  0  0  0  0  0  0\n    1.2369   71.4841   55.3669 C   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.3062   69.6543   55.1481 C   0  0  0  0  0  0  0  0  0  0  0  0\n    4.7374   73.8154   58.0111 H   0  0  0  0  0  0  0  0  0  0  0  0\n    5.3614   72.8037   59.1309 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.0888   71.1666   53.7811 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -1.2236   69.2703   53.3798 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -0.7590   67.9274   54.1844 H   0  0  0  0  0  0  0  0  0  0  0  0\n   -2.0693   68.7653   54.6822 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 11  1  0\n  1 12  1  0\n  2  3  2  0\n  2  4  2  0\n  2  6  1  0\n  2 11  1  0\n  5 13  2  0\n  7  8  1  0\n  7 11  2  0\n  8 12  2  0\n  9 12  1  0\n  9 13  1  0\n 10 13  1  0\n  6 14  1  0\n  6 15  1  0\n  9 16  1  0\n 10 17  1  0\n 10 18  1  0\n 10 19  1  0\nM  END\n","sdf");
    viewer_17625759407695315.setStyle({"stick": {}});
    viewer_17625759407695315.setBackgroundColor("0xeeeeee");
    viewer_17625759407695315.zoomTo();
viewer_17625759407695315.render();
});
</script>
</div>
<div class="cell-output cell-output-display" data-execution_count="52">

</div>
</div>
<p>Let’s get all the pairs of conformers that have a shape Tversky score of between 0.8 and 0.9:</p>
<div id="3004fe51" class="cell" data-execution_count="131">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb50-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql select ligname1,pdb1,ligname2,pdb2 <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.pair_stats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb50-2">  where shape_tversky_index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> shape_tversky_index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb50-3">d[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> * postgresql://localhost/lobster_112024</code></pre>
</div>
<div class="cell-output cell-output-display" data-execution_count="131">
<pre><code>[('03V_A_2002', '3u3k', 'NPO_A_300', '3u3r'),
 ('NPO_A_300', '3u3r', '03V_A_2002', '3u3k'),
 ('052_A_809', '4zeg', 'O23_A_901', '3wzk'),
 ('HS4_A_0', '3f17', 'KLG_A_0', '3rts'),
 ('NGH_A_306', '5lab', 'HS3_A_0', '3f16')]</code></pre>
</div>
</div>
<div id="08d3f61f" class="cell" data-execution_count="132">
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb53-1">accum <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb53-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> ln1,pdb1,ln2,pdb2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>]:</span>
<span id="cb53-3">    d1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb53-4">        select molregno,conf_id <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.all_ligands where <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb53-5">      ligname<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>:ln1 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> upper(pdb)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>upper(:pdb1)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb53-6">    d2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb53-7">        select molregno,conf_id <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lobster_data.all_ligands where <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb53-8">      ligname<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>:ln2 <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> upper(pdb)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>upper(:pdb2)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb53-9">    accum.append((<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">tuple</span>(d1[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]),<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">tuple</span>(d2[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])))</span></code></pre></div>
</div>
<p>Look at a couple of pairs:</p>
<div id="6d744b8b" class="cell" data-execution_count="133">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb54-1">confs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lwreg.retrieve(ids<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>accum[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb54-2">mols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromMolBlock(v[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> confs.values()]</span>
<span id="cb54-3">IPythonConsole.drawMols3D(mols)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_1762590943943505" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_1762590943943505" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_1762590943943505 = null;
var warn = document.getElementById("3dmolwarning_1762590943943505");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_1762590943943505 = $3Dmol.createViewer(document.getElementById("3dmolviewer_1762590943943505"),{backgroundColor:"white"});
viewer_1762590943943505.zoomTo();
    viewer_1762590943943505.addModel("NPO_A_300\n     RDKit          3D\n\n 15 15  0  0  0  0  0  0  0  0999 V2000\n   21.0209   -9.4413   -8.6788 O   0  0  0  0  0  0  0  0  0  0  0  0\n   19.1617  -10.3654   -8.1724 O   0  0  0  0  0  0  0  0  0  0  0  0\n   17.3203   -4.2462   -8.3844 O   0  0  0  0  0  0  0  0  0  0  0  0\n   19.7773   -9.3329   -8.4209 N   0  0  0  0  0  0  0  0  0  0  0  0\n   17.2130   -6.5832   -7.9112 C   0  0  0  0  0  0  0  0  0  0  0  0\n   19.2987   -5.6143   -8.7848 C   0  0  0  0  0  0  0  0  0  0  0  0\n   17.8074   -7.8731   -8.0839 C   0  0  0  0  0  0  0  0  0  0  0  0\n   19.9162   -6.8961   -8.8336 C   0  0  0  0  0  0  0  0  0  0  0  0\n   17.9756   -5.4596   -8.2752 C   0  0  0  0  0  0  0  0  0  0  0  0\n   19.1712   -8.0164   -8.4316 C   0  0  0  0  0  0  0  0  0  0  0  0\n   17.8291   -3.4245   -8.6412 H   0  0  0  0  0  0  0  0  0  0  0  0\n   16.2878   -6.4843   -7.5447 H   0  0  0  0  0  0  0  0  0  0  0  0\n   19.7988   -4.8125   -9.1120 H   0  0  0  0  0  0  0  0  0  0  0  0\n   17.2467   -8.6912   -7.9560 H   0  0  0  0  0  0  0  0  0  0  0  0\n   20.8603   -6.9965   -9.1474 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1  4  1  0\n  2  4  2  0\n  3  9  1  0\n  4 10  1  0\n  5  7  2  0\n  5  9  1  0\n  6  8  1  0\n  6  9  2  0\n  7 10  1  0\n  8 10  2  0\n  3 11  1  0\n  5 12  1  0\n  6 13  1  0\n  7 14  1  0\n  8 15  1  0\nM  CHG  2   1  -1   4   1\nM  END\n","sdf");
    viewer_1762590943943505.setStyle({"stick": {}});
    viewer_1762590943943505.addModel("03V_A_2002\n     RDKit          3D\n\n 19 20  0  0  0  0  0  0  0  0999 V2000\n   17.5350   -4.1210   -8.5300 O   0  0  0  0  0  0  0  0  0  0  0  0\n   19.7620  -10.0480   -8.3760 C   0  0  0  0  0  0  0  0  0  0  0  0\n   20.7300   -9.2250   -8.9320 C   0  0  0  0  0  0  0  0  0  0  0  0\n   18.5460   -9.5000   -7.9700 C   0  0  0  0  0  0  0  0  0  0  0  0\n   20.5370   -7.8440   -9.0980 C   0  0  0  0  0  0  0  0  0  0  0  0\n   16.9720   -6.2430   -7.8640 C   0  0  0  0  0  0  0  0  0  0  0  0\n   17.1840   -7.6000   -7.6970 C   0  0  0  0  0  0  0  0  0  0  0  0\n   19.1320   -5.8930   -8.8520 C   0  0  0  0  0  0  0  0  0  0  0  0\n   17.9000   -5.4150   -8.4340 C   0  0  0  0  0  0  0  0  0  0  0  0\n   18.3440   -8.1490   -8.1160 C   0  0  0  0  0  0  0  0  0  0  0  0\n   19.3660   -7.2630   -8.7090 C   0  0  0  0  0  0  0  0  0  0  0  0\n   18.1600   -3.4540   -8.9355 H   0  0  0  0  0  0  0  0  0  0  0  0\n   19.9349  -11.0269   -8.2672 H   0  0  0  0  0  0  0  0  0  0  0  0\n   21.5963   -9.6295   -9.2251 H   0  0  0  0  0  0  0  0  0  0  0  0\n   17.8323  -10.0810   -7.5788 H   0  0  0  0  0  0  0  0  0  0  0  0\n   21.2605   -7.2848   -9.5027 H   0  0  0  0  0  0  0  0  0  0  0  0\n   16.1051   -5.8515   -7.5555 H   0  0  0  0  0  0  0  0  0  0  0  0\n   16.4828   -8.1702   -7.2689 H   0  0  0  0  0  0  0  0  0  0  0  0\n   19.8237   -5.2835   -9.2393 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1  9  1  0\n  2  3  2  0\n  2  4  1  0\n  3  5  1  0\n  4 10  2  0\n  5 11  2  0\n  6  7  2  0\n  6  9  1  0\n  7 10  1  0\n  8  9  2  0\n  8 11  1  0\n 10 11  1  0\n  1 12  1  0\n  2 13  1  0\n  3 14  1  0\n  4 15  1  0\n  5 16  1  0\n  6 17  1  0\n  7 18  1  0\n  8 19  1  0\nM  END\n","sdf");
    viewer_1762590943943505.setStyle({"stick": {}});
    viewer_1762590943943505.setStyle({"model": 0},{"stick": {"colorscheme": "cyanCarbon"}});
    viewer_1762590943943505.setStyle({"model": 1},{"stick": {"colorscheme": "redCarbon"}});
    viewer_1762590943943505.setBackgroundColor("0xeeeeee");
    viewer_1762590943943505.zoomTo();
viewer_1762590943943505.render();
});
</script>
</div>
</div>
<div id="dd8cbae9" class="cell" data-execution_count="134">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb55-1">confs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lwreg.retrieve(ids<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>accum[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>])</span>
<span id="cb55-2">mols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromMolBlock(v[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> v <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> confs.values()]</span>
<span id="cb55-3">IPythonConsole.drawMols3D(mols)</span></code></pre></div>
<div class="cell-output cell-output-display">
<div id="3dmolviewer_17625909441996996" style="position: relative; width: 400px; height: 400px;">
        <p id="3dmolwarning_17625909441996996" style="background-color:#ffcccc;color:black">3Dmol.js failed to load for some reason.  Please check your browser console for error messages.<br></p>
        </div>
<script>

var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    //this is to ignore the existence of requirejs amd
    var savedexports, savedmodule;
    if (typeof exports !== 'undefined') savedexports = exports;
    else exports = {}
    if (typeof module !== 'undefined') savedmodule = module;
    else module = {}

    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
        exports = savedexports;
        module = savedmodule;
        resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
};

if(typeof $3Dmolpromise === 'undefined') {
$3Dmolpromise = null;
  $3Dmolpromise = loadScriptAsync('https://cdnjs.cloudflare.com/ajax/libs/3Dmol/2.4.0/3Dmol-min.js');
}

var viewer_17625909441996996 = null;
var warn = document.getElementById("3dmolwarning_17625909441996996");
if(warn) {
    warn.parentNode.removeChild(warn);
}
$3Dmolpromise.then(function() {
viewer_17625909441996996 = $3Dmol.createViewer(document.getElementById("3dmolviewer_17625909441996996"),{backgroundColor:"white"});
viewer_17625909441996996.zoomTo();
    viewer_17625909441996996.addModel("O23_A_901\n     RDKit          3D\n\n 47 51  0  0  0  0  0  0  0  0999 V2000\n   34.4176   35.0742   60.9436 S   0  0  0  0  0  0  0  0  0  0  0  0\n   40.0834   33.9291   72.5931 O   0  0  0  0  0  0  0  0  0  0  0  0\n   34.0852   36.4313   65.5407 N   0  0  0  0  0  0  0  0  0  0  0  0\n   33.4391   34.7497   64.0005 N   0  0  0  0  0  0  0  0  0  0  0  0\n   34.9217   32.7720   65.7440 N   0  0  0  0  0  0  0  0  0  0  0  0\n   38.5749   32.6195   73.5874 N   0  0  0  0  0  0  0  0  0  0  0  0\n   35.4384   34.4712   67.0663 N   0  0  0  0  0  0  0  0  0  0  0  0\n   35.3949   36.2455   60.1127 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.7420   36.7530   66.6826 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.7964   35.6924   63.1068 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.6691   32.2609   66.7518 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.4170   35.7755   67.4430 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.1946   37.4750   60.7514 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.3100   37.4823   61.8375 C   0  0  0  0  0  0  0  0  0  0  0  0\n   39.0840   31.2609   75.5992 C   0  0  0  0  0  0  0  0  0  0  0  0\n   40.2784   31.2576   74.6463 C   0  0  0  0  0  0  0  0  0  0  0  0\n   37.4502   32.2377   70.8839 C   0  0  0  0  0  0  0  0  0  0  0  0\n   38.3107   34.4396   70.3971 C   0  0  0  0  0  0  0  0  0  0  0  0\n   36.7258   32.2624   69.6846 C   0  0  0  0  0  0  0  0  0  0  0  0\n   37.5904   34.4523   69.1964 C   0  0  0  0  0  0  0  0  0  0  0  0\n   39.0261   33.3078   72.5314 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.7778   36.2210   62.0839 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.0802   35.1430   65.1216 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.7930   34.1082   65.9362 C   0  0  0  0  0  0  0  0  0  0  0  0\n   39.4129   32.5029   74.7733 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.9869   33.3386   67.5629 C   0  0  0  0  0  0  0  0  0  0  0  0\n   38.2421   33.3297   71.2433 C   0  0  0  0  0  0  0  0  0  0  0  0\n   36.7804   33.3721   68.8395 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.4065   33.7751   63.7793 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.6722   32.1899   73.5623 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.9885   36.0739   59.3264 H   0  0  0  0  0  0  0  0  0  0  0  0\n   34.7478   37.7039   66.9923 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.4366   36.4572   63.6412 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.0446   35.2339   62.6330 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.9357   31.3061   66.8828 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.8913   36.0497   68.2796 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.6597   38.3064   60.4474 H   0  0  0  0  0  0  0  0  0  0  0  0\n   34.0876   38.2975   62.3723 H   0  0  0  0  0  0  0  0  0  0  0  0\n   39.2326   31.2635   76.5881 H   0  0  0  0  0  0  0  0  0  0  0  0\n   38.2716   30.7183   75.3856 H   0  0  0  0  0  0  0  0  0  0  0  0\n   40.2483   30.7114   73.8092 H   0  0  0  0  0  0  0  0  0  0  0  0\n   41.2092   31.2566   75.0117 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.3998   31.4380   71.4822 H   0  0  0  0  0  0  0  0  0  0  0  0\n   38.8749   35.2257   70.6494 H   0  0  0  0  0  0  0  0  0  0  0  0\n   36.1630   31.4756   69.4311 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.6567   35.2432   68.5881 H   0  0  0  0  0  0  0  0  0  0  0  0\n   39.7726   33.3229   75.2184 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1  8  1  0\n  1 22  1  0\n  2 21  2  0\n  3  9  1  0\n  3 23  2  0\n  4 10  1  0\n  4 23  1  0\n  5 11  1  0\n  5 24  2  0\n  6 21  1  0\n  6 25  1  0\n  7 12  1  0\n  7 24  1  0\n  7 26  1  0\n  8 13  2  0\n  9 12  2  0\n 10 22  1  0\n 11 26  2  0\n 13 14  1  0\n 14 22  2  0\n 15 16  1  0\n 15 25  1  0\n 16 25  1  0\n 17 19  2  0\n 17 27  1  0\n 18 20  1  0\n 18 27  2  0\n 19 28  1  0\n 20 28  2  0\n 21 27  1  0\n 23 24  1  0\n 26 28  1  0\n  4 29  1  0\n  6 30  1  0\n  8 31  1  0\n  9 32  1  0\n 10 33  1  0\n 10 34  1  0\n 11 35  1  0\n 12 36  1  0\n 13 37  1  0\n 14 38  1  0\n 15 39  1  0\n 15 40  1  0\n 16 41  1  0\n 16 42  1  0\n 17 43  1  0\n 18 44  1  0\n 19 45  1  0\n 20 46  1  0\n 25 47  1  0\nM  END\n","sdf");
    viewer_17625909441996996.setStyle({"stick": {}});
    viewer_17625909441996996.addModel("052_A_809\n     RDKit          3D\n\n 70 75  0  0  0  0  0  0  0  0999 V2000\n   40.1840   33.4120   72.3030 O   0  0  0  0  0  0  0  0  0  0  0  0\n   32.8570   37.7140   58.4980 O   0  0  0  0  0  0  0  0  0  0  0  0\n   34.8520   37.7650   66.6390 O   0  0  0  0  0  0  0  0  0  0  0  0\n   34.6790   32.5550   65.6200 N   0  0  0  0  0  0  0  0  0  0  0  0\n   33.2290   34.3320   63.8290 N   0  0  0  0  0  0  0  0  0  0  0  0\n   35.4510   35.5900   67.1950 N   0  0  0  0  0  0  0  0  0  0  0  0\n   38.4820   32.4860   73.5300 N   0  0  0  0  0  0  0  0  0  0  0  0\n   34.6320   33.8980   65.7760 N   0  0  0  0  0  0  0  0  0  0  0  0\n   33.3330   36.4720   60.7720 N   0  0  0  0  0  0  0  0  0  0  0  0\n   38.6720   35.6260   70.8310 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.1170   38.0630   58.9880 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.5810   36.3280   58.3750 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.4770   34.7370   62.4590 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.3960   32.0420   66.6880 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.4600   37.2180   60.2260 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.6950   35.6310   59.7390 C   0  0  0  0  0  0  0  0  0  0  0  0\n   33.2040   36.1970   62.2350 C   0  0  0  0  0  0  0  0  0  0  0  0\n   36.5590   38.9290   70.2690 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.2080   38.6410   70.2070 C   0  0  0  0  0  0  0  0  0  0  0  0\n   37.3470   38.7880   69.1180 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.6460   38.2270   68.9970 C   0  0  0  0  0  0  0  0  0  0  0  0\n   36.7890   38.3780   67.9190 C   0  0  0  0  0  0  0  0  0  0  0  0\n   39.9940   31.0170   74.7160 C   0  0  0  0  0  0  0  0  0  0  0  0\n   38.8100   31.2260   75.5940 C   0  0  0  0  0  0  0  0  0  0  0  0\n   36.8180   31.9250   69.4390 C   0  0  0  0  0  0  0  0  0  0  0  0\n   37.5880   31.9260   70.6180 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.0810   36.1100   65.2730 C   0  0  0  0  0  0  0  0  0  0  0  0\n   37.3120   34.2390   69.2080 C   0  0  0  0  0  0  0  0  0  0  0  0\n   39.0210   33.0420   72.3450 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.8110   36.4590   66.3770 C   0  0  0  0  0  0  0  0  0  0  0  0\n   34.0050   34.7950   64.9490 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.4290   38.1090   67.8650 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.3490   34.2610   66.8460 C   0  0  0  0  0  0  0  0  0  0  0  0\n   39.2870   32.3600   74.6980 C   0  0  0  0  0  0  0  0  0  0  0  0\n   38.0660   34.2950   70.3760 C   0  0  0  0  0  0  0  0  0  0  0  0\n   36.7020   33.0700   68.7100 C   0  0  0  0  0  0  0  0  0  0  0  0\n   38.1930   33.1070   71.0920 C   0  0  0  0  0  0  0  0  0  0  0  0\n   35.8600   33.1160   67.4310 C   0  0  0  0  0  0  0  0  0  0  0  0\n   32.4776   33.6961   64.0052 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.5302   32.1798   73.5471 H   0  0  0  0  0  0  0  0  0  0  0  0\n   39.1787   35.4880   71.6820 H   0  0  0  0  0  0  0  0  0  0  0  0\n   39.2925   35.9683   70.1254 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.9405   36.2908   70.9827 H   0  0  0  0  0  0  0  0  0  0  0  0\n   34.1170   39.0310   59.2388 H   0  0  0  0  0  0  0  0  0  0  0  0\n   34.8031   37.9025   58.2784 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.2359   35.9175   57.7404 H   0  0  0  0  0  0  0  0  0  0  0  0\n   31.6535   36.2077   58.0210 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.8858   34.2033   61.8543 H   0  0  0  0  0  0  0  0  0  0  0  0\n   34.4344   34.5517   62.2377 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.5506   31.0742   66.8866 H   0  0  0  0  0  0  0  0  0  0  0  0\n   35.1743   36.5658   59.9723 H   0  0  0  0  0  0  0  0  0  0  0  0\n   34.8050   37.8301   60.9376 H   0  0  0  0  0  0  0  0  0  0  0  0\n   31.7759   35.3888   60.0497 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.2387   34.7993   59.6262 H   0  0  0  0  0  0  0  0  0  0  0  0\n   32.2788   36.4202   62.5419 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.8658   36.7479   62.7435 H   0  0  0  0  0  0  0  0  0  0  0  0\n   36.9705   39.2352   71.1274 H   0  0  0  0  0  0  0  0  0  0  0  0\n   34.6369   38.7285   71.0232 H   0  0  0  0  0  0  0  0  0  0  0  0\n   38.3261   38.9864   69.1637 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.6700   38.0145   68.9499 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.3584   38.2772   67.1032 H   0  0  0  0  0  0  0  0  0  0  0  0\n   39.9265   30.4081   73.9256 H   0  0  0  0  0  0  0  0  0  0  0  0\n   40.9057   30.9396   75.1195 H   0  0  0  0  0  0  0  0  0  0  0  0\n   38.9290   31.2896   76.5849 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.9498   30.7581   75.3909 H   0  0  0  0  0  0  0  0  0  0  0  0\n   36.3591   31.0888   69.1387 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.7075   31.0739   71.1276 H   0  0  0  0  0  0  0  0  0  0  0  0\n   33.6175   36.8038   64.7217 H   0  0  0  0  0  0  0  0  0  0  0  0\n   37.1951   35.0846   68.6871 H   0  0  0  0  0  0  0  0  0  0  0  0\n   39.7261   33.1654   75.0961 H   0  0  0  0  0  0  0  0  0  0  0  0\n  1 29  2  0\n  2 11  1  0\n  2 12  1  0\n  3 30  1  0\n  3 32  1  0\n  4  8  1  0\n  4 14  2  0\n  5 13  1  0\n  5 31  1  0\n  6 30  2  0\n  6 33  1  0\n  7 29  1  0\n  7 34  1  0\n  8 31  1  0\n  8 33  1  0\n  9 15  1  0\n  9 16  1  0\n  9 17  1  0\n 10 35  1  0\n 11 15  1  0\n 12 16  1  0\n 13 17  1  0\n 14 38  1  0\n 18 19  2  0\n 18 20  1  0\n 19 21  1  0\n 20 22  2  0\n 21 32  2  0\n 22 32  1  0\n 23 24  1  0\n 23 34  1  0\n 24 34  1  0\n 25 26  2  0\n 25 36  1  0\n 26 37  1  0\n 27 30  1  0\n 27 31  2  0\n 28 35  1  0\n 28 36  2  0\n 29 37  1  0\n 33 38  2  0\n 35 37  2  0\n 36 38  1  0\n  5 39  1  0\n  7 40  1  0\n 10 41  1  0\n 10 42  1  0\n 10 43  1  0\n 11 44  1  0\n 11 45  1  0\n 12 46  1  0\n 12 47  1  0\n 13 48  1  0\n 13 49  1  0\n 14 50  1  0\n 15 51  1  0\n 15 52  1  0\n 16 53  1  0\n 16 54  1  0\n 17 55  1  0\n 17 56  1  0\n 18 57  1  0\n 19 58  1  0\n 20 59  1  0\n 21 60  1  0\n 22 61  1  0\n 23 62  1  0\n 23 63  1  0\n 24 64  1  0\n 24 65  1  0\n 25 66  1  0\n 26 67  1  0\n 27 68  1  0\n 28 69  1  0\n 34 70  1  0\nM  END\n","sdf");
    viewer_17625909441996996.setStyle({"stick": {}});
    viewer_17625909441996996.setStyle({"model": 0},{"stick": {"colorscheme": "cyanCarbon"}});
    viewer_17625909441996996.setStyle({"model": 1},{"stick": {"colorscheme": "redCarbon"}});
    viewer_17625909441996996.setBackgroundColor("0xeeeeee");
    viewer_17625909441996996.zoomTo();
viewer_17625909441996996.render();
});
</script>
</div>
</div>
</section>
<section id="look-at-duplicate-mismatches" class="level1">
<h1>Look at duplicate mismatches</h1>
<p>According to lwreg we have 3218 unique structures, while the canonical SMILES provided with the Lobster data set (generated using the Hamburg group’s Naomi package) says we have 3212. Let’s look at the mismatches</p>
<div id="6bbc59d4" class="cell" data-execution_count="138">
<div class="sourceCode cell-code" id="cb56" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb56-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb56-2">  select count(distinct molregno) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> hashes join lobster_data.all_ligands using (molregno)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="138">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">count</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>3218</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="422f46a1" class="cell" data-execution_count="140">
<div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb57-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb57-2">  select naomi_smiles,count(distinct molregno) cnt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb57-3">     <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> hashes join lobster_data.all_ligands using (molregno)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb57-4">    group by (naomi_smiles) order by cnt desc limit <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="140">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">naomi_smiles</th>
<th data-quarto-table-cell-role="th">cnt</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>S=P(OP(=O)(OP(=O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O)O)(O)O</td>
<td>3</td>
</tr>
<tr class="even">
<td>P(=O)(OP(=O)(O)O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O</td>
<td>2</td>
</tr>
<tr class="odd">
<td>P1(=O)(O[C@@H]2[C@H](O[C@@H](N3C=4N=C(NC(=O)C4N=C3)N)[C@@H]2O)CO1)O</td>
<td>2</td>
</tr>
<tr class="even">
<td>P(=O)(OP(=O)(O)O)(OCC(=O)N1[C@H](C(=O)O)CCC1)O</td>
<td>2</td>
</tr>
<tr class="odd">
<td>P(=O)(O)([C@@H](N)C)C[C@H](C(=O)N[C@H](C(=O)O)C)Cc1ccc(c2ccccc2)cc1</td>
<td>2</td>
</tr>
<tr class="even">
<td>P1(=O)(O[C@@H]2[C@H](O[C@@H](N3C=4NC(=NC(=O)C4N=C3)N)[C@@H]2O)CO1)O</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Brc1c2OC(=O)C(Br)=C(c2ccc1O)C</td>
<td>1</td>
</tr>
<tr class="even">
<td>BrC=1C2=NC(=O)C(C=3Nc4c(cc(cc4)C(=O)O)C3NO)=C2C=CC1</td>
<td>1</td>
</tr>
<tr class="odd">
<td>Brc1c2c(OC(=O)C=C2CP(=O)(O)O)cc(c1)C</td>
<td>1</td>
</tr>
<tr class="even">
<td>Brc1c(Br)c(Br)c2N(C(=Nc2c1Br)N(C)C)CC(=O)O</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="4f165de0" class="cell" data-execution_count="145">
<div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb58-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>sql postgresql:<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//</span>localhost<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>lobster_112024 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb58-2">  select molregno,canonical_smiles,naomi_smiles <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb58-3">      (select naomi_smiles,count(distinct molregno) cnt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb58-4">         <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> hashes join lobster_data.all_ligands using (molregno)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb58-5">         group by (naomi_smiles)) downsel join lobster_data.all_ligands using (naomi_smiles) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb58-6">          join hashes using (molregno)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb58-7">    where cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> order by naomi_smiles asc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb58-8">d</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="145">
<table class="caption-top table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th">molregno</th>
<th data-quarto-table-cell-role="th">canonical_smiles</th>
<th data-quarto-table-cell-role="th">naomi_smiles</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>927</td>
<td>Nc1nc2c(ncn2[C@@H]2O[C@@H]3CO[P@@](=O)(O)O[C@H]3[C@H]2O)c(=O)[nH]1</td>
<td>P1(=O)(O[C@@H]2[C@H](O[C@@H](N3C=4N=C(NC(=O)C4N=C3)N)[C@@H]2O)CO1)O</td>
</tr>
<tr class="even">
<td>1363</td>
<td>Nc1nc2c(ncn2[C@@H]2O[C@@H]3CO[P@](=O)(O)O[C@H]3[C@H]2O)c(=O)[nH]1</td>
<td>P1(=O)(O[C@@H]2[C@H](O[C@@H](N3C=4N=C(NC(=O)C4N=C3)N)[C@@H]2O)CO1)O</td>
</tr>
<tr class="odd">
<td>1363</td>
<td>Nc1nc2c(ncn2[C@@H]2O[C@@H]3CO[P@](=O)(O)O[C@H]3[C@H]2O)c(=O)[nH]1</td>
<td>P1(=O)(O[C@@H]2[C@H](O[C@@H](N3C=4N=C(NC(=O)C4N=C3)N)[C@@H]2O)CO1)O</td>
</tr>
<tr class="even">
<td>927</td>
<td>Nc1nc2c(ncn2[C@@H]2O[C@@H]3CO[P@@](=O)(O)O[C@H]3[C@H]2O)c(=O)[nH]1</td>
<td>P1(=O)(O[C@@H]2[C@H](O[C@@H](N3C=4N=C(NC(=O)C4N=C3)N)[C@@H]2O)CO1)O</td>
</tr>
<tr class="odd">
<td>2226</td>
<td>Nc1nc(=O)c2ncn([C@@H]3O[C@@H]4CO[P@](=O)(O)O[C@H]4[C@H]3O)c2[nH]1</td>
<td>P1(=O)(O[C@@H]2[C@H](O[C@@H](N3C=4NC(=NC(=O)C4N=C3)N)[C@@H]2O)CO1)O</td>
</tr>
<tr class="even">
<td>91</td>
<td>Nc1nc(=O)c2ncn([C@@H]3O[C@@H]4CO[P@@](=O)(O)O[C@H]4[C@H]3O)c2[nH]1</td>
<td>P1(=O)(O[C@@H]2[C@H](O[C@@H](N3C=4NC(=NC(=O)C4N=C3)N)[C@@H]2O)CO1)O</td>
</tr>
<tr class="odd">
<td>2558</td>
<td>C[C@H](NC(=O)[C@H](Cc1ccc(-c2ccccc2)cc1)C[P@](=O)(O)[C@H](C)N)C(=O)O</td>
<td>P(=O)(O)([C@@H](N)C)C[C@H](C(=O)N[C@H](C(=O)O)C)Cc1ccc(c2ccccc2)cc1</td>
</tr>
<tr class="even">
<td>82</td>
<td>C[C@H](NC(=O)[C@H](Cc1ccc(-c2ccccc2)cc1)C[P@@](=O)(O)[C@H](C)N)C(=O)O</td>
<td>P(=O)(O)([C@@H](N)C)C[C@H](C(=O)N[C@H](C(=O)O)C)Cc1ccc(c2ccccc2)cc1</td>
</tr>
<tr class="odd">
<td>627</td>
<td>Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO[P@](=O)(O)OP(=O)(O)O)[C@@H](O)[C@H]1O</td>
<td>P(=O)(OP(=O)(O)O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O</td>
</tr>
<tr class="even">
<td>1957</td>
<td>Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO[P@@](=O)(O)OP(=O)(O)O)[C@@H](O)[C@H]1O</td>
<td>P(=O)(OP(=O)(O)O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O</td>
</tr>
<tr class="odd">
<td>2256</td>
<td>O=C(O)[C@@H]1CCCN1C(=O)CO[P@@](=O)(O)OP(=O)(O)O</td>
<td>P(=O)(OP(=O)(O)O)(OCC(=O)N1[C@H](C(=O)O)CCC1)O</td>
</tr>
<tr class="even">
<td>892</td>
<td>O=C(O)[C@@H]1CCCN1C(=O)CO[P@](=O)(O)OP(=O)(O)O</td>
<td>P(=O)(OP(=O)(O)O)(OCC(=O)N1[C@H](C(=O)O)CCC1)O</td>
</tr>
<tr class="odd">
<td>2877</td>
<td>Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO[P@@](=O)(O)O[P@](=O)(O)OP(O)(O)=S)[C@@H](O)[C@H]1O</td>
<td>S=P(OP(=O)(OP(=O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O)O)(O)O</td>
</tr>
<tr class="even">
<td>1559</td>
<td>Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO[P@@](=O)(O)O[P@@](=O)(O)OP(O)(O)=S)[C@@H](O)[C@H]1O</td>
<td>S=P(OP(=O)(OP(=O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O)O)(O)O</td>
</tr>
<tr class="odd">
<td>1559</td>
<td>Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO[P@@](=O)(O)O[P@@](=O)(O)OP(O)(O)=S)[C@@H](O)[C@H]1O</td>
<td>S=P(OP(=O)(OP(=O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O)O)(O)O</td>
</tr>
<tr class="even">
<td>2139</td>
<td>Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO[P@](=O)(O)O[P@](=O)(O)OP(O)(O)=S)[C@@H](O)[C@H]1O</td>
<td>S=P(OP(=O)(OP(=O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O)O)(O)O</td>
</tr>
<tr class="odd">
<td>2877</td>
<td>Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO[P@@](=O)(O)O[P@](=O)(O)OP(O)(O)=S)[C@@H](O)[C@H]1O</td>
<td>S=P(OP(=O)(OP(=O)(OC[C@H]1O[C@@H](N2c3ncnc(N)c3N=C2)[C@H](O)[C@@H]1O)O)O)(O)O</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>These are all because the RDKit recognizes chiral phosphates and the Naomi SMILES seem not to.</p>
<p>Here are the first few molecules demonstrating this:</p>
<div id="3da091e3" class="cell" data-execution_count="148">
<div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb59-1">ms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb59-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> row <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> d:</span>
<span id="cb59-3">    rm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(row[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb59-4">    nm <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(row[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>])</span>
<span id="cb59-5">    ms.extend((rm,nm))</span>
<span id="cb59-6">Draw.MolsToGridImage(ms[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>],molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,subImgSize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="148">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-11-08-working-with-lobster-1_files/figure-html/cell-40-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>I’m going to wrap this up here. I’ll come back to the LOBSTER data set and this database again in future posts.</p>


</section>

 ]]></description>
  <category>datasets</category>
  <category>3d</category>
  <category>lwreg</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-11-08-working-with-lobster-1.html</guid>
  <pubDate>Fri, 07 Nov 2025 23:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/working-with-lobster-1.png" medium="image" type="image/png" height="110" width="144"/>
</item>
<item>
  <title>How long does it take to do common tasks in the RDKit?</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-10-31-how-long-does-it-take.html</link>
  <description><![CDATA[ 




<p>Years ago (back in the sourceforge days), I used to maintain a page with info about how long it took to do common operations in the RDKit. This was useful for both reference purposes and to track the evolution of RDKit performance over time. At some point I stopped doing this, but a <a href="https://github.com/rdkit/rdkit/pull/8865">recently merged PR</a> got me thinking about this again (<span class="citation" data-cites="Andrew">@Andrew</span>: thanks for that contribution!).</p>
<p>I’m going to use this notebook to explain and run some new benchmarks (these are different from the PR mentioned above, which is meant to run as part of the RDKit build process). The results, including historical results, are <a href="https://github.com/rdkit/rdkit/wiki/How-Long-Things-Take">tabulated in the wiki</a> I will update this post as I add more benchmarks.</p>
<p>Let me know if you have ideas for interesting and useful benchmarks I should add!</p>
<div id="e7a50bf7-c33d-418c-90e2-176747025afd" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>load_ext sql</span></code></pre></div>
</div>
<section id="get-the-smiles-we-will-be-working-with" class="level1">
<h1>Get the SMILES we will be working with:</h1>
<p>Get 10000 random compounds from ChEMBL that have a pchembl_value &gt;= 9 and don’t have multiple components.</p>
<p>This stuff doesn’t need to be run every time, so I’m not saving the cells as code.</p>
<pre><code>d = %sql postgresql://localhost/chembl_36 \
  select distinct(canonical_smiles) canonical_smiles,chembl_id from compound_structures tablesample bernoulli(20) repeatable (123892) \
    join chembl_id_lookup on (molregno=entity_id and entity_type='COMPOUND') \
    join activities using (molregno) \
    where activities.pchembl_value&gt;=9 and \
    position('.' in canonical_smiles)=0 \
    limit 10000;</code></pre>
<p>The <code>distinct(canonical_smiles)</code> query I did orders the results, so it looks like all of the compounds have isotopes specified. This is not actually the case:</p>
<p>Make sure all of those convert cleanly into molecules:</p>
<pre><code>sum(1 for x,y in d if Chem.MolFromSmiles(x) is not None)</code></pre>
<pre><code>with open('../data/chembl36_very_active.txt','w+') as outf:
    outf.write('chembl_id canonical_smiles\n')
    for smi,cid in d:
        outf.write(f'{cid} {smi}\n')</code></pre>
</section>
<section id="run-the-benchmarks" class="level1">
<h1>Run the benchmarks</h1>
<div id="54d12953-db70-4ad7-8636-3fc85302ba46" class="cell" data-execution_count="48">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb5-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.09.1</code></pre>
</div>
</div>
<div id="0a210106-9868-42de-9654-b4dcc8d8b72f" class="cell" data-execution_count="46">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../data/chembl36_very_active.txt'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'r'</span>) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> inf:</span>
<span id="cb7-2">    ls <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x.strip().split() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> inf]</span>
<span id="cb7-3">    ls.pop(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb7-4">    data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [(smi,cid) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> cid,smi <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ls]</span>
<span id="cb7-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(data)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="46">
<pre><code>10000</code></pre>
</div>
</div>
<section id="construct-molecule-from-smiles" class="level2">
<h2 class="anchored" data-anchor-id="construct-molecule-from-smiles">Construct molecule from SMILES</h2>
<div id="ee572a1f-1a9d-432b-9018-78088adbea60" class="cell" data-execution_count="47">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit ms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(smi) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> smi,cid <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> data]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1.2 s ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="5877125f-d8d0-479d-b9b3-a90b59b5b916" class="cell" data-execution_count="50">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">ms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(smi) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> smi,cid <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> data]</span></code></pre></div>
</div>
</section>
<section id="generate-canonical-smiles" class="level2">
<h2 class="anchored" data-anchor-id="generate-canonical-smiles">Generate canonical SMILES</h2>
<div id="62f06ed5-8cb8-4b28-9460-4560e2382716" class="cell" data-execution_count="51">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [Chem.MolToSmiles(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>658 ms ± 2.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="6d2249c2-1885-4c59-a9ab-43e69049b376" class="cell" data-execution_count="52">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDepictor</span></code></pre></div>
</div>
</section>
<section id="generating-2d-coordinates" class="level2">
<h2 class="anchored" data-anchor-id="generating-2d-coordinates">Generating 2D coordinates</h2>
<div id="db7ade8d-87be-46b0-8af7-8b545028e796" class="cell" data-execution_count="53">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">rdDepictor.SetPreferCoordGen(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb15-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [rdDepictor.Compute2DCoords(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>47.4 s ± 482 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="8afc354c-9179-4c5b-a768-3b051cb718c2" class="cell" data-execution_count="54">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1">rdDepictor.SetPreferCoordGen(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb17-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [rdDepictor.Compute2DCoords(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1min 55s ± 334 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="2b762085-5531-486e-96ce-c1a724b8c6f6" class="cell" data-execution_count="55">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1">ms2d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.Mol(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms]</span>
<span id="cb19-2">_ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [rdDepictor.Compute2DCoords(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms2d]</span></code></pre></div>
</div>
</section>
<section id="writing-mol-blocks" class="level2">
<h2 class="anchored" data-anchor-id="writing-mol-blocks">Writing mol blocks</h2>
<div id="99e7d198-7447-4dea-a98f-58ca666938f2" class="cell" data-execution_count="71">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [Chem.MolToMolBlock(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms2d]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>877 ms ± 9.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="a74584f8-01d8-4a47-845e-87d84d947de7" class="cell" data-execution_count="70">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [Chem.MolToV3KMolBlock(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms2d]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1.08 s ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="455618ca-4665-4834-87ac-5c0a7b532fdc" class="cell" data-execution_count="72">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1">mbs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolToMolBlock(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms2d]</span>
<span id="cb24-2">mbs3k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolToV3KMolBlock(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms2d]</span></code></pre></div>
</div>
</section>
<section id="parsing-mol-blocks" class="level2">
<h2 class="anchored" data-anchor-id="parsing-mol-blocks">Parsing mol blocks</h2>
<div id="2293ae67-5313-4186-bc2d-a5b03caddf8e" class="cell" data-execution_count="73">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb25-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [Chem.MolFromMolBlock(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> mbs]</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[08:31:09] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:11] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:12] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:14] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:16] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:17] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:19] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:21] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>1.71 s ± 17.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="b64ac742-343d-49ed-a3f9-6a92102140c4" class="cell" data-scrolled="true" data-execution_count="74">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [Chem.MolFromMolBlock(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> mbs3k]</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[08:31:23] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:24] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:26] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:28] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:30] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:32] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:34] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored
[08:31:36] Warning: ambiguous stereochemistry - zero final chiral volume - at atom 54 ignored</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>1.87 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
</section>
<section id="addingremoving-hs" class="level2">
<h2 class="anchored" data-anchor-id="addingremoving-hs">Adding/removing Hs</h2>
<div id="d1a1dc6d-a785-4ca5-aa04-c2493d0b6835" class="cell" data-execution_count="61">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb31-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [Chem.AddHs(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>456 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="ef813de2-345e-4438-8561-f17587e6435a" class="cell" data-execution_count="62">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb33-1">mhs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.AddHs(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms]</span></code></pre></div>
</div>
<div id="91e41d4d-beb0-49d5-a1f7-77e173b4bb74" class="cell" data-execution_count="64">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb34-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [Chem.RemoveHs(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> mhs]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1.52 s ± 5.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
</section>
<section id="conformer-generation" class="level2">
<h2 class="anchored" data-anchor-id="conformer-generation">Conformer generation</h2>
<div id="65138c71-ac33-4714-968c-ecd17360f6a1" class="cell" data-execution_count="65">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb36-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDistGeom</span></code></pre></div>
</div>
<div id="a8de82b9-eab3-426c-96fb-0d19e965bd5a" class="cell" data-execution_count="69">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb37-1">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdDistGeom.EmbedParameters()</span>
<span id="cb37-2">ps.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb37-3"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [rdDistGeom.EmbedMolecule(m,ps) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> mhs[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>]]</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[08:14:36] UFFTYPER: Unrecognized charge state for atom: 38
[08:15:19] UFFTYPER: Unrecognized charge state for atom: 38
[08:16:03] UFFTYPER: Unrecognized charge state for atom: 38
[08:16:48] UFFTYPER: Unrecognized charge state for atom: 38
[08:17:32] UFFTYPER: Unrecognized charge state for atom: 38
[08:18:16] UFFTYPER: Unrecognized charge state for atom: 38
[08:19:00] UFFTYPER: Unrecognized charge state for atom: 38
[08:19:45] UFFTYPER: Unrecognized charge state for atom: 38</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>44.1 s ± 170 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="87b892da-758a-4280-8878-84a05f65846b" class="cell" data-execution_count="75">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb40-1">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdDistGeom.ETKDGv3()</span>
<span id="cb40-2">ps.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xf00d</span></span>
<span id="cb40-3"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [rdDistGeom.EmbedMolecule(m,ps) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> mhs[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>]]</span></code></pre></div>
<div class="cell-output cell-output-stderr">
<pre><code>[08:36:46] UFFTYPER: Unrecognized charge state for atom: 38
[08:37:52] UFFTYPER: Unrecognized charge state for atom: 38
[08:38:58] UFFTYPER: Unrecognized charge state for atom: 38
[08:40:03] UFFTYPER: Unrecognized charge state for atom: 38
[08:41:09] UFFTYPER: Unrecognized charge state for atom: 38
[08:42:14] UFFTYPER: Unrecognized charge state for atom: 38
[08:43:20] UFFTYPER: Unrecognized charge state for atom: 38
[08:44:25] UFFTYPER: Unrecognized charge state for atom: 38</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>1min 5s ± 245 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
</section>
<section id="generate-fingerprints" class="level2">
<h2 class="anchored" data-anchor-id="generate-fingerprints">Generate fingerprints</h2>
<div id="1924d4aa-3a57-42a1-b8bf-fca2e13cac48" class="cell" data-execution_count="76">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb43-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdFingerprintGenerator</span></code></pre></div>
</div>
<div id="e1999206-08cb-44b9-958a-02d4f897cd97" class="cell" data-execution_count="77">
<div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb44-1">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetMorganGenerator(radius<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb44-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit fpg.GetFingerprints(ms)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>497 ms ± 3.53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="68245d3d-a8d5-4447-bd35-2030be0d0a11" class="cell" data-execution_count="78">
<div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb46-1">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetMorganGenerator(radius<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb46-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit fpg.GetFingerprints(ms)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>379 ms ± 4.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="d572cb28-a781-4a87-babf-7605f47a9b66" class="cell" data-execution_count="79">
<div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb48-1">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetRDKitFPGenerator()</span>
<span id="cb48-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit fpg.GetFingerprints(ms)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>13.1 s ± 71.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="ebb317df-5c4b-41bb-90a0-ef595f4e7a5a" class="cell" data-execution_count="83">
<div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb50-1">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetRDKitFPGenerator(maxPath<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb50-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit fpg.GetFingerprints(ms)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>4.48 s ± 27.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="3ca77027-550e-4253-85c5-1bd925ad4c8b" class="cell" data-execution_count="81">
<div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb52-1">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetAtomPairGenerator()</span>
<span id="cb52-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit fpg.GetFingerprints(ms)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1.65 s ± 8.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="c362713c-1747-4819-90e9-48b75465549e" class="cell" data-execution_count="82">
<div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb54-1">fpg <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdFingerprintGenerator.GetTopologicalTorsionGenerator()</span>
<span id="cb54-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit fpg.GetFingerprints(ms)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>1.36 s ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>
<div id="10e61299-230a-4f1b-87dc-0db07cea1d19" class="cell" data-execution_count="98">
<div class="sourceCode cell-code" id="cb56" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb56-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>timeit [Chem.PatternFingerprint(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2.49 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</code></pre>
</div>
</div>


</section>
</section>

 ]]></description>
  <category>optimization</category>
  <category>reference</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-10-31-how-long-does-it-take.html</guid>
  <pubDate>Thu, 30 Oct 2025 23:00:00 GMT</pubDate>
</item>
<item>
  <title>Displaying atom maps and highlighting with reactions</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions.html</link>
  <description><![CDATA[ 




<p>This one was inspired by an example I did this week for the class I’m teaching: I wanted to apply an RDKit reaction to a set of reactants and then draw the reactants and the products of the reaction as a normal reaction with the mapped atoms indicated. Doing that made me realize that, with a bit more code, I could produce some other useful views of a reaction.</p>
<p>In addition to the visualizations themselves, this post has some potentially useful details about what kind of extra information is available in the products of reactions.</p>
<div id="60f925e4-4d20-4752-ad92-119e08cb19f4" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdChemReactions</span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb1-7"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.09.1</code></pre>
</div>
</div>
<p>The first example reaction, adapted from the SI for http://pubs.acs.org/doi/abs/10.1021/ci200379p. I frequently use this paper as a source of reaction SMARTS definitions for real reactions; left to my own devices I would just use amide bond formation all the time, and that gets pretty boring pretty quick.</p>
<div id="b2035753" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># adapted example from the SI for: http://pubs.acs.org/doi/abs/10.1021/ci200379p</span></span>
<span id="cb3-2">sma <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[Br,I;$(*c1ccccc1)]-[c:1]:[c:2]-[OH1:3].[CH1:5]#[C;$(C-[#6]):4]&gt;&gt;[c:1]1:[c:2]-[O:3]-[C:4]=[C:5]-1'</span></span>
<span id="cb3-3">rxn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(sma)</span>
<span id="cb3-4">rxn</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-3-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>A set of reactants for the reaction:</p>
<div id="820ada2b" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">r1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1cc(I)c(O)cc1'</span>)</span>
<span id="cb4-2">r2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C1CCC1C#C'</span>)</span>
<span id="cb4-3">Draw.MolsToGridImage([r1,r2],molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-4-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Run the reaction and show the product:</p>
<div id="8edfad59" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">reactants <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (r1,r2)</span>
<span id="cb5-2"></span>
<span id="cb5-3">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rxn.RunReactants(reactants)</span>
<span id="cb5-4">ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-5-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Look at the properties set on one of the atoms in the product that came from a mapped atom:</p>
<div id="f0543c0c" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">prod <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb6-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'mapped atom:'</span>,prod.GetAtomWithIdx(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>).GetPropsAsDict(includePrivate<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span>
<span id="cb6-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'unmapped atom:'</span>,prod.GetAtomWithIdx(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>).GetPropsAsDict(includePrivate<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>mapped atom: {'old_mapno': 1, 'react_atom_idx': 2, 'react_idx': 0}
unmapped atom: {'react_atom_idx': 6, 'react_idx': 0}</code></pre>
</div>
</div>
<p>Here’s what those mean: 1. <code>old_mapno</code>: the atom map number for the atom (obviously only present on mapped atoms) 2. <code>react_idx</code>: which reactant the atom came from 3. <code>react_atom_idx</code>: the index of the atom in its reactant</p>
<p>What I did for the course was set the atom map numbers on the reactants and products, combine them into a new reaction, and then display that reaction.</p>
<p>Start by setting the atom map numbers:</p>
<div id="3731b082" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># since we're going to modify things, copy them first:</span></span>
<span id="cb8-2">prod <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Mol(ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb8-3">reactants <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (Chem.Mol(r1),Chem.Mol(r2))</span>
<span id="cb8-4"></span>
<span id="cb8-5"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> at <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> prod.GetAtoms():</span>
<span id="cb8-6">    pd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> at.GetPropsAsDict()</span>
<span id="cb8-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'old_mapno'</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pd:</span>
<span id="cb8-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb8-9">    r <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> reactants[pd[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'react_idx'</span>]]</span>
<span id="cb8-10">    rat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r.GetAtomWithIdx(pd[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'react_atom_idx'</span>])</span>
<span id="cb8-11">    rat.SetAtomMapNum(pd[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'old_mapno'</span>])</span>
<span id="cb8-12">    at.SetAtomMapNum(pd[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'old_mapno'</span>])</span></code></pre></div>
</div>
<p>Now create the reaction and display it:</p>
<div id="ac6d10c7" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">nrxn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ChemicalReaction()</span>
<span id="cb9-2">nrxn.AddReactantTemplate(reactants[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span>
<span id="cb9-3">nrxn.AddReactantTemplate(reactants[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb9-4">nrxn.AddProductTemplate(prod)</span>
<span id="cb9-5"></span>
<span id="cb9-6">IPythonConsole.molSize <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">600</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">150</span></span>
<span id="cb9-7">nrxn</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-8-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Once the atom-mapping information is there, The reaction drawing code can also highlight the atoms based upon which reactant they came from. Unfortunately this causes the atom-mapping information to not be displayed:</p>
<div id="0b9bb8d0" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">IPythonConsole.highlightByReactant <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb10-2">nrxn</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-9-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="902579e8" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">IPythonConsole.highlightByReactant <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span></code></pre></div>
</div>
<p>At this point it makes sense to take what we know and write a function to draw the highlighted reaction. By working directly with a MolDraw2DCairo object instead of using the notebook integration we more easily control what’s going on.</p>
<div id="8ce8c947" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> IPython.display <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Image</span>
<span id="cb12-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> drawHighlightedReaction(rxn, reacts, prods, </span>
<span id="cb12-3">                            includeAtomMaps<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, highlightAllAtoms<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb12-4">                            mapAllAtoms<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb12-5">                            highlightColors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>,</span>
<span id="cb12-6">                            size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">900</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>), annotationFontScale<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.74</span>,</span>
<span id="cb12-7">                            drawOptions<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>):</span>
<span id="cb12-8">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">''' draws a specific reaction with the reactants and products highlighted</span></span>
<span id="cb12-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Returns an Image object with the drawing.    </span></span>
<span id="cb12-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    </span></span>
<span id="cb12-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    Arguments</span></span>
<span id="cb12-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    rxn: the reaction object (not currently used)</span></span>
<span id="cb12-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    reacts: a sequence of molecules. The reactants used in the reaction</span></span>
<span id="cb12-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    prods: a sequence of molecules. The products from the reaction</span></span>
<span id="cb12-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    includeAtomMaps: bool. Whether or not atom map numbers should be included in the output</span></span>
<span id="cb12-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    highlightAllAtoms: bool. Whether or not to highlight all reactant/product atoms in the output. </span></span>
<span id="cb12-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                           If True, non-mapped atoms will be highlighted. </span></span>
<span id="cb12-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                           If False, only the mapped atoms will be highlighted</span></span>
<span id="cb12-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    mapAllAtoms: bool. Whether or not to include atom mapping numbers on all atoms.</span></span>
<span id="cb12-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                           If True, non-mapped atoms will have negative atom map numbers displayed</span></span>
<span id="cb12-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    highlightColors: sequence of 3-tuples. Controls the colors used for highlighting the reactants.</span></span>
<span id="cb12-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                           The values should go from 0-1. The sequence should have (at least) len(reacts) </span></span>
<span id="cb12-23"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                           elements.</span></span>
<span id="cb12-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    size: tuple. Controls the size of the output image.</span></span>
<span id="cb12-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    annotationFontScale: float. Controls the size of the atom map notes (if being drawn)</span></span>
<span id="cb12-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    drawOptions: a MolDraw2DOptions object. Used as the draw options for the rendering. </span></span>
<span id="cb12-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">                           Overrides annotationFontScale if provided.</span></span>
<span id="cb12-28"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    </span></span>
<span id="cb12-29"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">    '''</span></span>
<span id="cb12-30">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># make copies of all the reactants and the products since we will modify them</span></span>
<span id="cb12-31">    reacts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.Mol(r) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> reacts]</span>
<span id="cb12-32">    prods <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.Mol(p) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> prods]</span>
<span id="cb12-33">    </span>
<span id="cb12-34">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># find the largest atom map number, used to initialize the negative atom map numbers</span></span>
<span id="cb12-35">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#  when we are doing highlightAllAtoms</span></span>
<span id="cb12-36">    negVal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb12-37">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> mapAllAtoms:</span>
<span id="cb12-38">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> prod <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> prods:</span>
<span id="cb12-39">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> at <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> prod.GetAtoms():</span>
<span id="cb12-40">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> at.HasProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'old_mapno'</span>):</span>
<span id="cb12-41">                    negVal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(negVal,<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> at.GetIntProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'old_mapno'</span>))</span>
<span id="cb12-42">    negVal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb12-43"></span>
<span id="cb12-44">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># loop over each of the products and set the atom map and note information</span></span>
<span id="cb12-45">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#  in both the product atoms and corresponding reactant atoms.</span></span>
<span id="cb12-46">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> prod <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> prods:</span>
<span id="cb12-47">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> at <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> prod.GetAtoms():</span>
<span id="cb12-48">            pd <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> at.GetPropsAsDict()</span>
<span id="cb12-49">            mno <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'old_mapno'</span>,negVal)</span>
<span id="cb12-50">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> mno<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb12-51">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> highlightAllAtoms:</span>
<span id="cb12-52">                    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb12-53">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb12-54">                    negVal <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb12-55">            </span>
<span id="cb12-56">            r <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> reacts[pd[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'react_idx'</span>]]</span>
<span id="cb12-57">            rat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> r.GetAtomWithIdx(pd[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'react_atom_idx'</span>])</span>
<span id="cb12-58">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> tat <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> at,rat:</span>
<span id="cb12-59">                tat.SetAtomMapNum(mno)</span>
<span id="cb12-60">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> includeAtomMaps <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> (mno<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">or</span> mapAllAtoms):</span>
<span id="cb12-61">                    tat.SetProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'atomNote'</span>,<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>(mno))    </span>
<span id="cb12-62">    </span>
<span id="cb12-63">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create the reaction we'll actually render:</span></span>
<span id="cb12-64">    nrxn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ChemicalReaction()</span>
<span id="cb12-65">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> react <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> reacts:</span>
<span id="cb12-66">        nrxn.AddReactantTemplate(react)</span>
<span id="cb12-67">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> prod <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> prods:</span>
<span id="cb12-68">        nrxn.AddProductTemplate(prod)</span>
<span id="cb12-69"></span>
<span id="cb12-70">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># and draw it</span></span>
<span id="cb12-71">    d2d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Draw.MolDraw2DCairo(size[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],size[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb12-72">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> drawOptions <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>:</span>
<span id="cb12-73">        d2d.SetDrawOptions(drawOptions)</span>
<span id="cb12-74">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb12-75">        d2d.drawOptions().annotationFontScale<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>annotationFontScale</span>
<span id="cb12-76">    d2d.DrawReaction(nrxn, highlightByReactant<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>, highlightColorsReactants<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>highlightColors)</span>
<span id="cb12-77">    d2d.FinishDrawing()</span>
<span id="cb12-78">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> Image(d2d.GetDrawingText())</span></code></pre></div>
</div>
<div id="38efc9b1" class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1">r1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1cc(I)c(O)cc1'</span>)</span>
<span id="cb13-2">r2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C1CCC1C#C'</span>)</span>
<span id="cb13-3">reactants <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (r1,r2)</span>
<span id="cb13-4">ps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rxn.RunReactants(reactants)</span>
<span id="cb13-5"></span>
<span id="cb13-6">drawHighlightedReaction(rxn,reactants,ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-12-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="a06fe58b" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1">drawHighlightedReaction(rxn,reactants,ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],highlightAllAtoms<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="12">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-13-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Include negative atom map numbers for atoms that were not in the reaction definition;</p>
<div id="cb4fad50" class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">drawHighlightedReaction(rxn,reactants,ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],mapAllAtoms<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="13">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-14-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="8e789b8a" class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1">drawHighlightedReaction(rxn,reactants,ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],includeAtomMaps<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="14">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-15-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Change the highlighting</p>
<div id="74431e91" class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1">drawHighlightedReaction(rxn,reactants,ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],highlightColors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>), (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>)])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="15">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-16-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Provide our own draw options. Here we play with dark mode:</p>
<div id="dc6cd568" class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDepictor</span>
<span id="cb18-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> r <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> reactants:</span>
<span id="cb18-3">    rdDepictor.Compute2DCoords(r)</span>
<span id="cb18-4">dopts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Draw.MolDrawOptions()</span>
<span id="cb18-5">Draw.SetDarkMode(dopts)</span>
<span id="cb18-6">dopts.annotationFontScale <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.8</span></span>
<span id="cb18-7">drawHighlightedReaction(rxn,reactants,ps[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],drawOptions<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>dopts,</span>
<span id="cb18-8">                       highlightColors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>), (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="16">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-17-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Do another reaction from the same paper:</p>
<div id="d634a819" class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1">rxn <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdChemReactions.ReactionFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'[c:1](-[C;$(C-c1ccccc1):2](=[OD1:3])-[OH1]):[c:4](-[NH2:5]).[N;!H0;!$(N-N);!$(N-C=N);!$(N(-C=O)-C=O):6]-[C;H1,$(C-[#6]):7]=[OD1]&gt;&gt;[c:4]2:[c:1]-[C:2](=[O:3])-[N:6]-[C:7]=[N:5]-2'</span>)</span>
<span id="cb19-2">reactants <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1c(C(=O)O)c(N)ccc1'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CCC(=O)Nc1ccccc1'</span>)]</span>
<span id="cb19-3">prods <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rxn.RunReactants(reactants)</span>
<span id="cb19-4">drawHighlightedReaction(rxn,reactants,prods[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="17">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions_files/figure-html/cell-18-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>



 ]]></description>
  <category>drawing</category>
  <category>tutorial</category>
  <category>reactions</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-10-17-displaying-atom-maps-with-reactions.html</guid>
  <pubDate>Thu, 16 Oct 2025 22:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/displaying-atom-maps-1.png" medium="image" type="image/png" height="32" width="144"/>
</item>
<item>
  <title>Similarity search time and thresholds</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-10-10-similarity-search-time-and-thresholds-1.html</link>
  <description><![CDATA[ 




<p>This is an updated and modified version of a <a href="https://rdkit.blogspot.com/2015/08/impact-of-threshold-on-similarity.html">post I wrote back in 2015</a>. # Impact of the threshold on similarity search times</p>
<p>A <a href="http://chembl.blogspot.ch/2015/08/lsh-based-similarity-search-in-mongodb.html">ChEMBL blog post</a> included the (to me) somewhat surprising result that similarity search times while using the RDKit cartridge did not show much of a dependency on the similarity threshold being used. I wanted to investigate this a bit more closely.</p>
<p>I’m using a local install of ChEMBL35 for these tests. Since I don’t have the patience to wait for searches to complete with 1000 molecules, I will randomly select just 10:</p>
<pre><code>chembl_35=# select * into temporary table foo from rdk.fps order by random() limit 10;
SELECT 10
chembl_35=# select molregno,m from rdk.mols join foo using (molregno);
molregno |                                                      m                                                      
----------+-------------------------------------------------------------------------------------------------------------
    4111 | C=C1CC2(CCCCCCCCC2)OC1=O
    4131 | CC1Cc2cc3c(cc2C(c2ccc(N)cc2)=NN1C=O)OCO3
    4185 | CCOC(=O)/C=C(C)/C(F)=C/C=C(C)/C=C/c1c(C)cc(OC)c(C)c1C
    4256 | CC(C)(C)c1cc(CCc2cccnc2)cc(C(C)(C)C)c1O
    4292 | O=C(CCCCCCCCCBr)CC(=O)N[C@H]1CCOC1=O
    4334 | CCCC[C@@H](C[C@@H](CCc1ccc(-c2ccc(F)cc2)cc1)C(=O)N[C@H](C(=O)Nc1ccccc1)C(C)(C)C)C(=O)O
    4335 | COc1ccc(/C=C/c2cc(C(C)(C)C)c(O)c(C(C)(C)C)c2)cc1
    4389 | Cc1ccc2c(c1)C(=O)N(CC(C)(C)C[N+](C)(C)CCCCCC[N+](C)(C)CC(C)(C)CN1C(=O)c3cccc4cccc(c34)C1=O)C2=O.[Br-].[Br-]
    4591 | O=C(NCCCn1ccnc1)c1cc2ccccc2[nH]1
    4599 | O=Cc1ccn(-c2cc3c(cc2Cl)nc(O)n2nc(C(=O)O)cc32)c1
(10 rows)</code></pre>
<p>And now look at timing results for a number of threshold values:</p>
<pre><code>chembl_35=# set rdkit.tanimoto_threshold=0.4;select count(*) from rdk.fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.202 ms
count 
-------
3431
(1 row)

Time: 2446.118 ms (00:02.446)
chembl_35=# set rdkit.tanimoto_threshold=0.5;select count(*) from rdk.fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.245 ms
count 
-------
814
(1 row)

Time: 2258.420 ms (00:02.258)
chembl_35=# set rdkit.tanimoto_threshold=0.6;select count(*) from rdk.fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.146 ms
count 
-------
261
(1 row)

Time: 2005.068 ms (00:02.005)
chembl_35=# set rdkit.tanimoto_threshold=0.7;select count(*) from rdk.fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.304 ms
count 
-------
122
(1 row)

Time: 1702.164 ms (00:01.702)
chembl_35=# set rdkit.tanimoto_threshold=0.8;select count(*) from rdk.fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.147 ms
count 
-------
    35
(1 row)

Time: 1272.436 ms (00:01.272)
chembl_35=# set rdkit.tanimoto_threshold=0.9;select count(*) from rdk.fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.207 ms
count 
-------
    21
(1 row)

Time: 645.774 ms
chembl_35=# set rdkit.tanimoto_threshold=0.95;select count(*) from rdk.fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.101 ms
count 
-------
    17
(1 row)

Time: 268.827 ms
chembl_35=# set rdkit.tanimoto_threshold=0.99;select count(*) from rdk.fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.242 ms
count 
-------
    14
(1 row)

Time: 54.065 ms</code></pre>
<p>In the previous version of the post, there were some interesting differences here. That’s no longer the case. this is probably at least partially because both Postgres and the cartridge have seen some improvements in the last ten years (for example, index-only scans are now possible). I guess the bigger part is because I’m using a more capable machine with a bunch more RAM.</p>
<section id="a-larger-test-zinc" class="level2">
<h2 class="anchored" data-anchor-id="a-larger-test-zinc">A larger test: ZINC</h2>
<p>To see if there are any trends for a larger dataset, I grabbed a copy of the <a href="http://zinc.docking.org/subsets/all-clean">ZINC All Clean set</a>. After removing the molecules that the RDKit doesn’t like (there are actually only 14 molecules in the full set that the RDKit rejected), this leaves 16.4 million molecules.</p>
<p>For the record, here’s how I built the database:</p>
<pre><code>(rdkit_build) glandrum@stoat:/scratch/RDKit_git/LocalData$ createdb zinc
(rdkit_build) glandrum@stoat:/scratch/RDKit_git/LocalData$ psql -c 'create table raw_data (id SERIAL, smiles text, zinc_id char(12))' zinc
CREATE TABLE
(rdkit_build) glandrum@stoat:/scratch/RDKit_git/LocalData$ cd Zinc
(rdkit_build) glandrum@stoat:/scratch/RDKit_git/LocalData/Zinc$ zcat zinc_all_clean.smi.gz | sed '1d; s/\\/\\\\/g' |psql -c "copy raw_data (smiles,zinc_id) from stdin with delimiter ' '" zinc
COPY 16403864
(rdkit_build) glandrum@stoat:/scratch/RDKit_git/LocalData/Zinc$ psql zinc
psql (14.19 (Ubuntu 14.19-0ubuntu0.22.04.1))
Type "help" for help.

zinc=# \timing
Timing is on.
zinc=# create extension rdkit;
CREATE EXTENSION
Time: 21.754 ms
zinc=# select * into mols from (select id,mol_from_smiles(smiles::cstring) m from raw_data) tmp where m is not null;
  ... snip ...
SELECT 16403848
Time: 945195.267 ms (15:45.195)
zinc=# select id,morganbv_fp(m) as mfp2 into fps from mols;
SELECT 16403848
Time: 155185.639 ms (02:35.186)
zinc=# create index fps_mfp2_idx on fps using gist(mfp2);
CREATE INDEX
Time: 140542.861 ms (02:20.543)
zinc=# select * into temporary table foo from fps tablesample bernoulli (1) repeatable (123456) limit 10;
SELECT 10
Time: 11.977 ms</code></pre>
<p>Here are the 10 random molecules:</p>
<pre><code>zinc=# select id,m from mols join foo using (id);
id    |                             m                             
------+-----------------------------------------------------------
12317 | CCc1nnc(NC(=O)[C@H](CC)Oc2ccccc2)s1
385   | CCOC(=O)c1ccccc1C(=O)OCC
24317 | CC(C)(C)c1ccc([C@@]23OC(=O)C(C)(C)[C@@H]2OC(=O)C3(C)C)cc1
48    | CC(C)(C)[NH2+]C[C@H](O)COc1cc(Cl)ccc1Cl
225   | CC[C@@]1(CO)CCC[NH+]2CCc3c([nH]c4ccccc34)[C@@H]21
90    | C[NH2+]C[C@@H](OC)c1cccc(C(F)(F)F)c1
24406 | CCOC1(OCC)[NH+]=C(N)[C@@]2(C#N)C3(CCCCC3)[C@@]12C#N
24738 | Cc1ccc(S(=O)(=O)Nc2ccccc2C#N)cc1
356   | CCc1c(O)c(=O)ccn1CC
8     | O=C([O-])[C@@H](O)c1ccccc1
(10 rows)

Time: 5167.539 ms (00:05.168)</code></pre>
<p>Now the searches:</p>
<pre><code>zinc=# set rdkit.tanimoto_threshold=0.4; select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.133 ms
count 
-------
55762
(1 row)

Time: 15931.816 ms (00:15.932)
zinc=# set rdkit.tanimoto_threshold=0.5; select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.183 ms
count 
-------
4486
(1 row)

Time: 13616.273 ms (00:13.616)
zinc=# set rdkit.tanimoto_threshold=0.6; select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.205 ms
count 
-------
623
(1 row)

Time: 11555.706 ms (00:11.556)
zinc=# set rdkit.tanimoto_threshold=0.7; select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.213 ms
count 
-------
165
(1 row)

Time: 8509.171 ms (00:08.509)
zinc=# set rdkit.tanimoto_threshold=0.8; select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.097 ms
count 
-------
    50
(1 row)

Time: 4717.467 ms (00:04.717)
zinc=# set rdkit.tanimoto_threshold=0.9; select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.097 ms
count 
-------
    20
(1 row)

Time: 1026.567 ms (00:01.027)
zinc=# set rdkit.tanimoto_threshold=0.95; select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.103 ms
count 
-------
    20
(1 row)

Time: 195.971 ms
zinc=# set rdkit.tanimoto_threshold=0.99; select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
SET
Time: 0.096 ms
count 
-------
    20
(1 row)

Time: 34.866 ms</code></pre>
<p>I’m going to look at two interesting steps in this data: from a threshold of 0.4 to 0.6, and from 0.6 to 0.99. In the first case, the number of hits goes from 55762 to 623, while the time goes from 15.9s to 11.6s. In the second case, the number of hits goes from 623 to 20, while the time goes from 11.6s to 34.9ms. In the first case the number if hits drops by almost two orders of magnitude while the time only drops by about 25%. In the second case, the number of hits drops by a factor of 30, while the time drops by a factor of almost 39.</p>
<p>Look at the output from <code>EXPLAIN ANALYZE</code> for these queries to see if we can figure out what’s going on:</p>
<pre><code>zinc=# set rdkit.tanimoto_threshold=0.4;
SET
Time: 0.152 ms
zinc=# explain (analyze on, buffers on) select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
                                                                    QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=1376580.94..1376580.95 rows=1 width=8) (actual time=16239.641..16239.642 rows=1 loops=1)
Buffers: shared hit=9573 read=2374633, local hit=1
-&gt;  Nested Loop  (cost=0.42..1324498.62 rows=20832924 width=0) (actual time=8.908..16236.122 rows=55762 loops=1)
        Buffers: shared hit=9573 read=2374633, local hit=1
        -&gt;  Seq Scan on foo  (cost=0.00..22.70 rows=1270 width=32) (actual time=0.004..0.016 rows=10 loops=1)
            Buffers: local hit=1
        -&gt;  Index Only Scan using fps_mfp2_idx on fps fps1  (cost=0.42..878.85 rows=16404 width=65) (actual time=6.231..1622.310 rows=5576 loops=10)
            Index Cond: (mfp2 % foo.mfp2)
            Heap Fetches: 0
            Buffers: shared hit=9573 read=2374633
Planning Time: 0.052 ms
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.126 ms, Inlining 1.992 ms, Optimization 3.165 ms, Emission 3.565 ms, Total 8.848 ms
Execution Time: 16239.872 ms
(16 rows)

Time: 16240.250 ms (00:16.240)
zinc=# set rdkit.tanimoto_threshold=0.6;
SET
Time: 0.114 ms
zinc=# explain (analyze on, buffers on) select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
                                                                    QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=1376580.94..1376580.95 rows=1 width=8) (actual time=13299.969..13299.970 rows=1 loops=1)
Buffers: shared hit=4685 read=1995775, local hit=1
-&gt;  Nested Loop  (cost=0.42..1324498.62 rows=20832924 width=0) (actual time=112.152..13299.772 rows=758 loops=1)
        Buffers: shared hit=4685 read=1995775, local hit=1
        -&gt;  Seq Scan on foo  (cost=0.00..22.70 rows=1270 width=32) (actual time=0.004..0.013 rows=10 loops=1)
            Buffers: local hit=1
        -&gt;  Index Only Scan using fps_mfp2_idx on fps fps1  (cost=0.42..878.85 rows=16404 width=65) (actual time=112.027..1329.058 rows=76 loops=10)
            Index Cond: (mfp2 % foo.mfp2)
            Heap Fetches: 0
            Buffers: shared hit=4685 read=1995775
Planning Time: 0.052 ms
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.126 ms, Inlining 2.130 ms, Optimization 3.170 ms, Emission 3.601 ms, Total 9.027 ms
Execution Time: 13300.146 ms
(16 rows)

Time: 13300.681 ms (00:13.301)
zinc=# set rdkit.tanimoto_threshold=0.99;
SET
Time: 0.253 ms
zinc=# explain (analyze on, buffers on) select count(*) from fps fps1 cross join foo where fps1.mfp2%foo.mfp2;
                                                                QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=1376580.94..1376580.95 rows=1 width=8) (actual time=50.800..50.801 rows=1 loops=1)
Buffers: shared hit=14558 read=5872, local hit=1
-&gt;  Nested Loop  (cost=0.42..1324498.62 rows=20832924 width=0) (actual time=11.722..50.787 rows=12 loops=1)
        Buffers: shared hit=14558 read=5872, local hit=1
        -&gt;  Seq Scan on foo  (cost=0.00..22.70 rows=1270 width=32) (actual time=0.003..0.010 rows=10 loops=1)
            Buffers: local hit=1
        -&gt;  Index Only Scan using fps_mfp2_idx on fps fps1  (cost=0.42..878.85 rows=16404 width=65) (actual time=2.392..4.188 rows=1 loops=10)
            Index Cond: (mfp2 % foo.mfp2)
            Heap Fetches: 0
            Buffers: shared hit=14558 read=5872
Planning Time: 0.052 ms
JIT:
Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.123 ms, Inlining 2.134 ms, Optimization 3.203 ms, Emission 3.554 ms, Total 9.014 ms
Execution Time: 50.972 ms
(16 rows)

Time: 51.323 ms</code></pre>
<p>` The big difference I see here is the number of buffers that actually need to be read when executing the queries. At the 0.4 threshold, we need to read 2.37 million buffers, at 0.7 it’s just under 2 million, and at 0.99 we only need to read 5872. This is presumably because the index is pruning so many rows at the higher thresholds. I’m definitely not an expert at interpreting this output, so if anyone has other ideas or explanations, please let me know!</p>
</section>
<section id="an-aside-chemfp-performance" class="level2">
<h2 class="anchored" data-anchor-id="an-aside-chemfp-performance">An aside: chemfp performance</h2>
<p>The previous version of this post included a performance comparison with <a href="http://chemfp.com/">chemfp</a>. Since I no longer have access to the licensed version of chemfp, I’m going to skip that here. I assume it’s dramatically quicker than the cartridge. :-)</p>


</section>

 ]]></description>
  <category>cartridge</category>
  <category>questions</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-10-10-similarity-search-time-and-thresholds-1.html</guid>
  <pubDate>Thu, 09 Oct 2025 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Rendering intramolecular H bonds in 2D</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds.html</link>
  <description><![CDATA[ 




<p>I had the idea for this short post while working on last week’s post about <a href="https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1.html">drawing simple protein–ligand interaction diagrams</a>. It shows a quick way to force the RDKit’s 2D coordinate generator to put non-bonded atoms close to each other in a drawing. I use the examples of rendering intramolecular H bonds and chelators.</p>
<div id="c54c234f-5563-49a7-afca-db59ed7ade39" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDepictor</span></code></pre></div>
</div>
<div id="070c2de1-feaf-4ac6-a63e-4d54108555be" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1">IPythonConsole.drawOptions.addAtomIndices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb2-2">IPythonConsole.molSize <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">350</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span></span></code></pre></div>
</div>
<p>Let’s start with a simple molecule that forms intramolecular hydrogen bonds, malonaldehyde (I also used this as the example in the blog post on <a href="https://greglandrum.github.io/rdkit-blog/posts/2024-07-28-confgen-and-intramolecular-hbonds.html">generating 3D conformers which include intramolecular H bonds</a>).</p>
<p>The 2D coordinate generation code generates an extended 2D conformer for malonaldehyde:</p>
<div id="5b8b871e-2201-492e-9e90-08c08f7bb7d8" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">m <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'O=CCCO'</span>)</span>
<span id="cb3-2">m</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-4-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>This type of extended conformer looks completely reasonable and is, for most molecules, almost certainly the right way to draw chains, but in this case I want to actually show the H bond.</p>
<p>To do so, I start by adding the H which will be involved in the H bond to the molecule:</p>
<div id="8a40158a-4ba9-442f-b477-3145bddb9926" class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RWMol(m)</span>
<span id="cb4-2">aid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m2.AddAtom(Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb4-3">m2.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,aid,Chem.BondType.SINGLE)</span>
<span id="cb4-4">m2</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="18">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-5-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>We still have an extended conformer. I can force this to change by doing something non-physical and adding a single bond between the H and O0:</p>
<div id="2bfbe7c9" class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">bid1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m2.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,aid,Chem.BondType.SINGLE)</span>
<span id="cb5-2">m2</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-6-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>That has the coordinates I wanted, but having the H bond drawn as a single bond is definitely not desirable.</p>
<p>To solve this I’m going to explicitly generate the coordinates for the molecule with the single bond present (instead of relying up on the notebook code to generate a temporary set of coordinates), then remove the single bond and render the molecule:</p>
<div id="9fdad754" class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">rdDepictor.Compute2DCoords(m2)</span>
<span id="cb6-2">m2.RemoveBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,aid)</span>
<span id="cb6-3">m2</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="20">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-7-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>That’s what I was looking for.</p>
<p>If I want to actually include the H bond in the drawing in a way that won’t make chemists cringe, I can add a hydrogen bond:</p>
<div id="d83d7b9a-b770-45b4-b365-8c1efd4cc325" class="cell" data-scrolled="true" data-execution_count="21">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">bid1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m2.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,aid,Chem.BondType.HYDROGEN)</span>
<span id="cb7-2">m2</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="21">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-8-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Note that simply adding the hydrogen bond to start with would not have worked since zero-order bonds are not included in the RDKit’s ring-finding code by default:</p>
<div id="91026155-f773-48ab-ad0e-2445247e0659" class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">m3 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RWMol(m)</span>
<span id="cb8-2">aid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m3.AddAtom(Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb8-3">m3.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,aid,Chem.BondType.SINGLE)</span>
<span id="cb8-4">m3.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,aid,Chem.BondType.HYDROGEN)</span>
<span id="cb8-5">m3</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="22">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-9-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Here’s a more complex molecule where I want to show two intramolecular H bonds:</p>
<div id="91fe8bef-5c53-47de-809b-d5287d498519" class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">m <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c1cccc(C(=O)OC)c1NC(=O)C1CCCCC1(=O)'</span>)</span>
<span id="cb9-2">m</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="23">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-10-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>In this case I would like to indicate H bonds between the H on N10 and O6 as well as the H on C0 and O12.</p>
<div id="4e1d8519-acef-4610-ab26-b882036cffc1" class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add the Hs:</span></span>
<span id="cb10-2">m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RWMol(m)</span>
<span id="cb10-3">aid1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m2.AddAtom(Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb10-4">m2.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,aid1,Chem.BondType.SINGLE)</span>
<span id="cb10-5">aid2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m2.AddAtom(Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb10-6">m2.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,aid2,Chem.BondType.SINGLE)</span>
<span id="cb10-7"></span>
<span id="cb10-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add single bonds for the intramolecular H bonds:</span></span>
<span id="cb10-9">bid1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m2.AddBond(aid1,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>,Chem.BondType.SINGLE)</span>
<span id="cb10-10">bid2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m2.AddBond(aid2,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,Chem.BondType.SINGLE)</span>
<span id="cb10-11"></span>
<span id="cb10-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># generate coordinates:</span></span>
<span id="cb10-13">rdDepictor.Compute2DCoords(m2)</span>
<span id="cb10-14"></span>
<span id="cb10-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># convert to H bonds:</span></span>
<span id="cb10-16">m2.GetBondWithIdx(bid1<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>).SetBondType(Chem.BondType.HYDROGEN)</span>
<span id="cb10-17">m2.GetBondWithIdx(bid2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>).SetBondType(Chem.BondType.HYDROGEN)</span>
<span id="cb10-18"></span>
<span id="cb10-19">m2</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="26">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-11-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>We can use a similar trick to get conformers of chelators that have the chelating atoms in the expected geometry.</p>
<div id="5ebae17a" class="cell" data-execution_count="35">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">m <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'NCCN'</span>)</span>
<span id="cb11-2">m</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="35">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-12-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>This time I’m going to add an extra atom (to stand in for the atom which will eventually be chelated), form bonds to that, and then remove that atom after generating 2D coordinates:</p>
<div id="506c1ba0" class="cell" data-execution_count="36">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1">m2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RWMol(m)</span>
<span id="cb12-2">aid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> m2.AddAtom(Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>))</span>
<span id="cb12-3">m2.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>,aid,Chem.BondType.SINGLE)</span>
<span id="cb12-4">m2.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,aid,Chem.BondType.SINGLE)</span>
<span id="cb12-5">rdDepictor.Compute2DCoords(m2)</span>
<span id="cb12-6">m2.RemoveAtom(aid)</span>
<span id="cb12-7">m2</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="36">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds_files/figure-html/cell-13-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>



 ]]></description>
  <category>drawing</category>
  <category>exploration</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-10-04-rendering-intramolecular-h-bonds.html</guid>
  <pubDate>Fri, 03 Oct 2025 22:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/rendering-intramolecular-h-bonds.png" medium="image" type="image/png" height="123" width="144"/>
</item>
<item>
  <title>Drawing simple protein–ligand interaction diagrams with the RDKit</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1.html</link>
  <description><![CDATA[ 




<p>I’m a big fan of 2D protein–ligand interaction diagrams; when well done these plots can provide an information-dense view of the structure that is still easy to understand.</p>
<p>A really nice example of this is the work on <a href="https://doi.org/10.1021/ml100164p">PoseView</a> from Matthias Rarey’s group in Hamburg. Here’s an example from that publication:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://pubs.acs.org/cms/10.1021/ml100164p/asset/images/medium/ml-2010-00164p_0005.gif" class="img-fluid figure-img"></p>
<figcaption>image</figcaption>
</figure>
</div>
<p>Though I would love to have an RDKit implementation of PoseView, actually implementing something like that is deeply nontrivial, so I’ve never really done anything in that direction. This week I had a random idea for a way to provide basic protein–ligand interaction diagrams using existing RDKit functionality. This post is an exploration of that.</p>
<p>Here’s an example of what you get with this:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1_files/figure-html/121dfaf4-1-image-2.png" class="img-fluid figure-img"></p>
<figcaption>image-2.png</figcaption>
</figure>
</div>
<p>To be clear: I am fully aware that this is not nearly as good as what PoseView and similar tools can do, but I think it’s still quite useful. I have a few ideas for straightforward changes to the backend code to improve the plots that I’m also going to take a look at.</p>
<div id="9bb879a5-1a40-40b3-b679-56f84618d0b8" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDepictor</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> IPython.display <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> SVG</span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-6">IPythonConsole.molSize <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">400</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">400</span></span>
<span id="cb1-7"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb1-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.03.6</code></pre>
</div>
</div>
<section id="initial-exploration" class="level1">
<h1>Initial exploration</h1>
<p>Start with the ligand for a recent PDB structure, 8yqe:</p>
<div id="12f226e6-d0bc-4b2f-a807-30922590055d" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># from https://www.ebi.ac.uk/pdbe/entry/pdb/8yqe?activeTab=ligands</span></span>
<span id="cb3-2">lig <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CCS(=O)(=O)N1CCC(CC1)NC(=O)c2c(cn[nH]2)NC(=O)c3c(ccc(c3F)C4CCOCC4)F'</span>)</span>
<span id="cb3-3">IPythonConsole.drawOptions.addAtomIndices <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb3-4">lig</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1_files/figure-html/cell-3-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>The strategy to draw the interaction diagrams is to add a new dummy atom to the ligand molecule for each residue it’s interacting with and then connect that to the interacting ligand atom with a zero-order bond. The standard RDKit 2D coordinate generation code will then do something sensible with this.</p>
<p>I include <code>hbond</code> interactions that are around 3.0<img src="https://latex.codecogs.com/png.latex?%5CAA"> or less from the list of interactions on the <a href="https://www.ebi.ac.uk/pdbe/entry/pdb/8yqe?activeTab=ligands">ligand page</a></p>
<div id="a620928d-d684-4115-b786-6f934b2c4840" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">lig_with_interactions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RWMol(lig)</span>
<span id="cb4-2">leu83 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb4-3">leu83.SetProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'atomLabel'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Leu 83'</span>)</span>
<span id="cb4-4">aid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lig_with_interactions.AddAtom(leu83)</span>
<span id="cb4-5">lig_with_interactions.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span>,aid,Chem.BondType.ZERO)</span>
<span id="cb4-6">lig_with_interactions.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span>,aid,Chem.BondType.ZERO)</span>
<span id="cb4-7"></span>
<span id="cb4-8">hoh407 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb4-9">hoh407.SetProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'atomLabel'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'H2O 407'</span>)</span>
<span id="cb4-10">aid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lig_with_interactions.AddAtom(hoh407)</span>
<span id="cb4-11">lig_with_interactions.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,aid,Chem.BondType.ZERO)</span>
<span id="cb4-12"></span>
<span id="cb4-13">hoh441 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb4-14">hoh441.SetProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'atomLabel'</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'H2O 441'</span>)</span>
<span id="cb4-15">aid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lig_with_interactions.AddAtom(hoh441)</span>
<span id="cb4-16">lig_with_interactions.AddBond(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>,aid,Chem.BondType.ZERO)</span>
<span id="cb4-17"></span>
<span id="cb4-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># this doesn't work because the ring centroid doesn't get put in the middle of the ring</span></span>
<span id="cb4-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ring_center = Chem.Atom(0)</span></span>
<span id="cb4-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># aid = lig_with_interactions.AddAtom(ring_center)</span></span>
<span id="cb4-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># for rid in (22,23,24,25,26,27):</span></span>
<span id="cb4-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#     lig_with_interactions.AddBond(rid,aid,Chem.BondType.ZERO)</span></span>
<span id="cb4-23"></span>
<span id="cb4-24">lig_with_interactions</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1_files/figure-html/cell-4-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="making-it-easier-to-use" class="level1">
<h1>Making it easier to use</h1>
<p>That doesn’t look terrible. Write a function to automate the process and include highlights around the residue pseudo-atoms:</p>
<div id="4b76011f-6f90-41c9-8bb9-dbaeda91e53a" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> draw_ligand_with_interactions(lig,lig_name,interactions,size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">400</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">400</span>)):   </span>
<span id="cb5-2">    lig_with_interactions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.RWMol(lig)</span>
<span id="cb5-3">    </span>
<span id="cb5-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add pseudo-atoms (and bonds to them) for the interacting residues:</span></span>
<span id="cb5-5">    pts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb5-6">    clrs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb5-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (aname,oaids) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> interactions:</span>
<span id="cb5-8">        res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.Atom(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb5-9">        res.SetProp(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'atomLabel'</span>,aname)</span>
<span id="cb5-10">        aid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lig_with_interactions.AddAtom(res)</span>
<span id="cb5-11">        pts.append(aid)</span>
<span id="cb5-12">        clrs[aid] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.3</span>)</span>
<span id="cb5-13">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> oaid <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> oaids:</span>
<span id="cb5-14">            lig_with_interactions.AddBond(aid,oaid,Chem.BondType.ZERO)</span>
<span id="cb5-15">   </span>
<span id="cb5-16">    d2d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Draw.MolDraw2DSVG(size[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>],size[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb5-17">    </span>
<span id="cb5-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># set the draw options so that we end up with circles under the pseudo-atoms:</span></span>
<span id="cb5-19">    d2d.drawOptions().circleAtoms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb5-20">    d2d.drawOptions().fillHighlights <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb5-21">    d2d.drawOptions().continuousHighlight <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb5-22">    d2d.drawOptions().highlightRadius <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span></span>
<span id="cb5-23">    </span>
<span id="cb5-24">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># now draw and return the result</span></span>
<span id="cb5-25">    d2d.DrawMolecule(lig_with_interactions,legend<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>lig_name,</span>
<span id="cb5-26">                     highlightAtoms<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>pts,highlightAtomColors<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>clrs)</span>
<span id="cb5-27">    d2d.FinishDrawing()</span>
<span id="cb5-28">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> SVG(d2d.GetDrawingText())</span>
<span id="cb5-29"></span>
<span id="cb5-30"></span></code></pre></div>
</div>
<div id="3ef3c749-22a9-4e5d-a9a8-a7175a3513cd" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># from https://www.ebi.ac.uk/pdbe/entry/pdb/8yqe?activeTab=ligands</span></span>
<span id="cb6-2">lig <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CCS(=O)(=O)N1CCC(CC1)NC(=O)c2c(cn[nH]2)NC(=O)c3c(ccc(c3F)C4CCOCC4)F'</span>)</span>
<span id="cb6-3">interactions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb6-4">    (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'LYS 89'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,)),</span>
<span id="cb6-5">    (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ASP 86'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,)),</span>
<span id="cb6-6">    (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'HOH 444'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,)),</span>
<span id="cb6-7">    (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'HOH 441'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>,)),</span>
<span id="cb6-8">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'HOH 407'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>,)),</span>
<span id="cb6-9">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'LEU 83'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">18</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span>)),</span>
<span id="cb6-10">)</span>
<span id="cb6-11">draw_ligand_with_interactions(lig,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'A1D60 from 8yqe'</span>,interactions)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1_files/figure-html/cell-6-output-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Do a couple more examples using other recent PDB structures:</p>
<div id="88709c52" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># from: https://www.ebi.ac.uk/pdbe/entry/pdb/9uhc?activeTab=ligands</span></span>
<span id="cb7-2">lig <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CCC(=O)Nc1ccc2c(c1)n(cc2c3c[nH]c4c3nc(cn4)c5cnn(c5)CCN6CCOCC6)C'</span>)</span>
<span id="cb7-3">interactions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb7-4">    (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ALA 564'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">22</span>,)),</span>
<span id="cb7-5">    (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'GLU 562'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>,)),</span>
<span id="cb7-6">    (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ASP 641'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,)),</span>
<span id="cb7-7">    (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CYS 488'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,)),</span>
<span id="cb7-8">)</span>
<span id="cb7-9">draw_ligand_with_interactions(lig,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'A1EPE from 9uhc'</span>,interactions)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1_files/figure-html/cell-7-output-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="8bc287f8" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># from https://www.ebi.ac.uk/pdbe/entry/pdb/9vci?activeTab=ligands</span></span>
<span id="cb8-2">lig <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'CC(=O)NCc1cc(c(cc1F)O)Oc2ccc(cc2)N3c4c(ncnc4N(C3=O)C5CCNCC5)N'</span>)</span>
<span id="cb8-3">interactions <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb8-4">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'GLY 9'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,)),</span>
<span id="cb8-5">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ASP 135'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>,)),</span>
<span id="cb8-6">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'THR 48'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>,)),</span>
<span id="cb8-7">        (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'HOH 637'</span>,(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">29</span>,)),</span>
<span id="cb8-8">)</span>
<span id="cb8-9">draw_ligand_with_interactions(lig,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'A1ERR from 9vci'</span>,interactions)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1_files/figure-html/cell-8-output-1.svg" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>There’s clearly room for improvement here (and I have a couple of ideas already), but I think this is already useful as-is.</p>


</section>

 ]]></description>
  <category>drawing</category>
  <category>exploration</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-09-26-drawing-interactions-1.html</guid>
  <pubDate>Thu, 25 Sep 2025 22:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/drawing-interactions-1.png" medium="image" type="image/png" height="123" width="144"/>
</item>
<item>
  <title>2025 RDKit UGM Recap</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-09-14-UGM-recap.html</link>
  <description><![CDATA[ 




<p>The 2025 RDKit UGM took place last week (September 10-12, 2025) in Prague. This year’s installment of the meeting was organized by Martin Šícho at the University of Chemistry and Technology, Prague. From my perspective the meeting went very well: the logistics were smooth, the technology mostly worked - we had some problems with sound on the first day, but got past those (why is it always sound?), and the program had a good balance of talks and breaks. Many thanks to Martin and the Prague group for putting together a great meeting and to the companies that sponsored the meeting (listed in the <a href="https://github.com/rdkit/UGM_2025">github repo</a>) for enabling us to continue to run the meeting without having to charge registration fees.</p>
<p>We had 120 registered attendees; as usual we were at capacity (the limiting factor was the size of the lecture hall). This year we only had a couple of no-shows (a common problem with free meetings). We had the usual good mix of academic and industrial attendees and of people in different stages of their careers. The meeting was live streamed over zoom and there were typically 40-60 people watching the talks remotely and participating via the discord server.</p>
<p>There were 19 standard talks (we did a mix of 20- and 30-minute slots this year and I think that worked pretty well), 10 lightning talks, and 20 posters. On Friday we had the hackathon and three workshops. I haven’t heard much feedback about the workshops yet, but there were a number of pull requests submitted to the RDKit repo during the hackathon and in the next couple of days, so it looks like that was productive.</p>
<p>The github repo is here: <a href="https://github.com/rdkit/UGM_2025">https://github.com/rdkit/UGM_2025</a>. The slides from the presentations will show up in the repo as speakers send them to me or submit PRs themselves. I will be uploading the videos of the talks to <a href="https://www.youtube.com/playlist?list=PLugOo5eIVY3GDYcuR5kKIXwHIKKP8Vj0B">the YouTube playlist</a> over the next few weeks as I have time to process them.</p>
<p>I, unsurprisingly, really enjoyed the meeting. I was exhausted at the end of each day, and totally wiped out on Friday, but that’s 100 percent expected. Some personal thoughts/impressions:</p>
<ul>
<li>I had never been to Prague before and very much enjoyed the bits of it I saw. The UGM itself kept me busy during the days, but the area around the campus was quite lively and I did get to see some other parts while running (pro tip: don’t try and run along the river in the afternoon… there are way too many people out and about! Early mornings were a lot more runnable).</li>
<li>As always, it was great to see old friends and meet new people. My tendency, particularly during meetings like this, is to spend most of my time talking to people I already know, but I think I did a decent job this year of tempering that.</li>
<li>It was really nice to have all four RDKit maintainers (Brian, Paolo, Ricardo, and myself) together in one place. I’m not sure when the last time that happened was.</li>
<li>The quality of the talks was good. We had a couple that were more marketing-heavy than I would of liked, but they were definitely the exception. As usual, there was a nice mix of academic and industrial, method-development and application, deeply technical and high-level.</li>
<li>One recurring theme this year was the handling of sequence-based entities and/or polymers. There were several talks that touched on this topic from different angles and I’m really curious to see how this will evolve in the RDKit. This is an area where I don’t have a lot of personal experience, so it’s going to be fun for me to see how the RDKit community tackles this.</li>
</ul>
<p>I came back from the meeting with a long list of things I want to work on in the RDKit, and that’s a really good sign. I hope that other attendees came away similarly inspired!</p>
<p>I will announce the dates and locations of the 2026 European UGM as soon as they are finalized. If you are interested in hosting a future UGM, please get in touch with me.</p>



 ]]></description>
  <category>general</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-09-14-UGM-recap.html</guid>
  <pubDate>Sat, 13 Sep 2025 22:00:00 GMT</pubDate>
</item>
<item>
  <title>The 2025 RDKit UGM is next week</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-09-07-UGM-upcoming.html</link>
  <description><![CDATA[ 




<p>The 2025 RDKit UGM takes place next week (September 10-12, 2025) in Prague. The github repo is here: <a href="https://github.com/rdkit/UGM_2025">https://github.com/rdkit/UGM_2025</a>.</p>
<p>The UGM is “sold out”, so we can’t accept any last-minute registrations, but we will be live streaming it over zoom. I will be posting the links to the zoom sessions in the repo as well as in the <a href="https://discord.gg/5cAr6x66Ug">UGM discord server</a>.</p>
<p>The upcoming meeting is my official excuse for why there is no real blog post this week (I also took a few days off to run an ultra-marathon, but I’m not allowed to use that as an excuse for skipping a blog post. ;-)). Hopefully I’ll manage to put something real together next week after the UGM.</p>



 ]]></description>
  <category>general</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-09-07-UGM-upcoming.html</guid>
  <pubDate>Sat, 06 Sep 2025 22:00:00 GMT</pubDate>
</item>
<item>
  <title>Scaling conformer generation</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-08-30-confgen-scaling.html</link>
  <description><![CDATA[ 




<p>This week the question came up in the lab of how well the conformer generation code scales with the number of threads used. Since generating conformers is embarassingly parallel - the conformers don’t depend on each other (this isn’t quite true if you are doing RMS pruning) - in a perfect world you’d expect more or less linear scaling. So theoretically using 4 times as many threads should take 1/4 as much time as long as the number of threads is less than the number of conformers being generated. In reality, things don’t quite work out this way since the individual conformers can take different amounts of time to generate due to the stochastic nature of the RDKit’s distance-geometry-based algorithm.</p>
<p>I decided to do a quick test to see how well things scale on my machine.</p>
<blockquote class="blockquote">
<p>Aside: this post is looking at runtimes of conformer generation for individual molecules. We can further increase the speed of conformer generation for large sets of molecules by splitting the work across multiple machines. That’s a topic for a different post</p>
</blockquote>
<div id="2f30cd5b" class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> gzip</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDistGeom</span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> time</span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> tqdm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> tqdm</span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> matplotlib <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb1-9"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%</span>matplotlib inline</span>
<span id="cb1-10">plt.style.use(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'tableau-colorblind10'</span>)</span>
<span id="cb1-11"></span>
<span id="cb1-12"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb1-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.03.5</code></pre>
</div>
</div>
<p>Start by loading some molecules. This is a set of 370 molecules that are present in both the <a href="https://www.crystallography.net/cod/">COD</a> and <a href="https://www.ebi.ac.uk/chembl/">ChEMBL</a>. I use these (as well as some other COD subsets) a lot for confgen testing:</p>
<div id="44b269e5" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">ms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [x <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> Chem.ForwardSDMolSupplier(gzip.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../data/COD_2025Jan13.organic.chembl_selected.sdf.gz'</span>),</span>
<span id="cb3-2">                                          removeHs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>]</span>
<span id="cb3-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(ms)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<pre><code>370</code></pre>
</div>
</div>
<p>For the purposes of this post we’ll just use 100 of the COD molecules. Add Hs to the mols and create the subset:</p>
<div id="7e2cac99" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1">ms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.AddHs(m) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> ms]</span>
<span id="cb5-2">ms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> ms[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>]</span></code></pre></div>
</div>
<p>My linux box has 8 performance cores (each of which can run two hyperthreads) and 8 “efficient” cores, so python thinks I have 24 CPUs available:</p>
<div id="6aec1e72" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> multiprocessing</span>
<span id="cb6-2">multiprocessing.cpu_count()</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<pre><code>24</code></pre>
</div>
</div>
<p>I will try thread counts up to 10 to check scaling; a bit beyond the number of physical performance cores on my machine.</p>
<p>Now try generating both 100 conformer and 400 conformer sets for each molecule using different thread counts and track how long it takes:</p>
<div id="3ab047dd" class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1">threadCounts <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>]</span>
<span id="cb8-2"></span>
<span id="cb8-3">params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rdDistGeom.ETKDGv3()</span>
<span id="cb8-4">params.randomSeed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bn" style="color: #AD0000;
background-color: null;
font-style: inherit;">0xa100f</span></span>
<span id="cb8-5"></span>
<span id="cb8-6">tgtConfs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">400</span>)</span>
<span id="cb8-7">accum <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb8-8"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> tgt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tgtConfs:</span>
<span id="cb8-9">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Doing </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tgt<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> conformers"</span>)</span>
<span id="cb8-10">    accum[tgt] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb8-11">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> tc <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> threadCounts:</span>
<span id="cb8-12">        params.numThreads <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tc</span>
<span id="cb8-13">        accum[tgt][tc] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb8-14">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Doing </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>tc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> threads"</span>)</span>
<span id="cb8-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> tqdm(ms):</span>
<span id="cb8-16">            t1 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time()</span>
<span id="cb8-17">            rdDistGeom.EmbedMultipleConfs(m,tgt,params)</span>
<span id="cb8-18">            t2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> time.time()</span>
<span id="cb8-19">            accum[tgt][tc].append(t2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>t1)</span>
<span id="cb8-20">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\t</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(accum[tgt][tc])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Doing 100 conformers
Doing 1 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [02:25&lt;00:28,  1.65s/it][16:37:31] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [02:37&lt;00:00,  2.19it/s][16:37:43] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:39&lt;00:00,  1.59s/it]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    159.0
Doing 2 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [01:17&lt;00:14,  1.15it/s][16:39:02] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [01:24&lt;00:00,  3.86it/s][16:39:08] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:25&lt;00:00,  1.18it/s]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    85.0
Doing 4 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [00:42&lt;00:08,  2.12it/s][16:39:51] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [00:45&lt;00:00,  6.30it/s][16:39:55] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:46&lt;00:00,  2.17it/s]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    46.1
Doing 6 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [00:29&lt;00:05,  2.93it/s][16:40:25] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [00:32&lt;00:00,  9.02it/s][16:40:27] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:32&lt;00:00,  3.09it/s]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    32.3
Doing 8 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [00:24&lt;00:04,  3.57it/s][16:40:52] UFFTYPER: Unrecognized charge state for atom: 0
 96%|██████████████████████████████████████████████████████████████████████████████████████████████    | 96/100 [00:26&lt;00:00,  8.34it/s][16:40:54] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:26&lt;00:00,  3.74it/s]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    26.7
Doing 10 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [00:25&lt;00:04,  3.44it/s][16:41:19] UFFTYPER: Unrecognized charge state for atom: 0
 96%|██████████████████████████████████████████████████████████████████████████████████████████████    | 96/100 [00:26&lt;00:00,  7.85it/s][16:41:21] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:27&lt;00:00,  3.67it/s]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    27.2
Doing 400 conformers
Doing 1 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [09:37&lt;01:48,  6.40s/it][16:50:59] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [10:25&lt;00:03,  1.82s/it][16:51:47] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [10:29&lt;00:00,  6.30s/it]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    629.9
Doing 2 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [05:04&lt;00:57,  3.37s/it][16:56:56] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [05:29&lt;00:01,  1.04it/s][16:57:21] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [05:32&lt;00:00,  3.32s/it]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    332.1
Doing 4 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [02:38&lt;00:29,  1.76s/it][17:00:02] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [02:51&lt;00:00,  2.01it/s][17:00:15] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:52&lt;00:00,  1.72s/it]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    172.4
Doing 6 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [01:48&lt;00:20,  1.22s/it][17:02:04] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [01:57&lt;00:00,  2.87it/s][17:02:14] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:58&lt;00:00,  1.18s/it]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    118.4
Doing 8 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [01:30&lt;00:17,  1.01s/it][17:03:44] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [01:37&lt;00:00,  3.39it/s][17:03:52] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:38&lt;00:00,  1.02it/s]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    98.1
Doing 10 threads</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code> 83%|█████████████████████████████████████████████████████████████████████████████████▎                | 83/100 [01:36&lt;00:18,  1.10s/it][17:05:29] UFFTYPER: Unrecognized charge state for atom: 0
 98%|████████████████████████████████████████████████████████████████████████████████████████████████  | 98/100 [01:44&lt;00:00,  3.17it/s][17:05:37] UFFTYPER: Unrecognized charge state for atom: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:45&lt;00:00,  1.05s/it]</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>    105.4</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code></code></pre>
</div>
</div>
<p>Calculate the relative runtime for each molecule with each threadcount. If everything is scaling perfectly, we’d expect doubling the number of threads to cut the runtime in half.</p>
<div id="89efa499" class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb35-1">factors <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb35-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> nc <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> accum:</span>
<span id="cb35-3">    factors[nc] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {}</span>
<span id="cb35-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> tc <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> accum[nc]:</span>
<span id="cb35-5">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> tc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:</span>
<span id="cb35-6">            factors[nc][tc] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(ms)</span>
<span id="cb35-7">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb35-8">            factors[nc][tc] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb35-9">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(ms):</span>
<span id="cb35-10">                factors[nc][tc].append(accum[nc][tc][i]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>accum[nc][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>][i])</span>
<span id="cb35-11">            </span></code></pre></div>
</div>
<p>Now plot the results.</p>
<div id="f66fba4d" class="cell" data-execution_count="49">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb36-1">plt.figure(figsize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>))</span>
<span id="cb36-2">xp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span>x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(threadCounts))]</span>
<span id="cb36-3">widths <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(threadCounts))]</span>
<span id="cb36-4">plt.boxplot([factors[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>][tc] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> tc <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> threadCounts],positions<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>xp,widths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>widths,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'100 conformers'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb36-5">xp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span>x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(threadCounts))]</span>
<span id="cb36-6">plt.boxplot([factors[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">400</span>][tc] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> tc <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> threadCounts],positions<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>xp,widths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>widths,label<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'400 conformers'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb36-7">plt.xticks(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(threadCounts) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),[<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> threadCounts])<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb36-8">plt.xlabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'number of threads'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb36-9">plt.ylabel(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'scaling factor'</span>)</span>
<span id="cb36-10">plt.plot((<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.7</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.3</span>),(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k--'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb36-11">plt.plot((<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.7</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.3</span>),(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25</span>),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k--'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb36-12">plt.plot((<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.7</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.3</span>),(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k--'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb36-13">plt.plot((<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.7</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.3</span>),(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k--'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb36-14">plt.plot((<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.7</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">6.3</span>),(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'k--'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div>
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-30-confgen-scaling_files/figure-html/cell-9-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>The left-hand boxplot in each column is the results for 100 conformers, the right-hand boxplot is for 400 conformers. The dashed line in each column shows what you would expect for perfect scaling.</p>
<p>We can see that on my machine with these molecules the scaling is pretty good up to about 6 threads but that by the time we get to 10 threads we’re deviating pretty strongly from perfect scaling. In fact, the runtime for 10 threads isn’t massively better than it is for 8 (the number of physical performance cores in my machine). Unsurprisingly, the scaling is better for 400 conformers than it is for 100, but even with 400 conformers it’s not really worth increasing from 8 to 10 threads on my machine.</p>
<p>Based on this analysis, I’d probably go with using either 6 or 8 threads for future confgen work on this machine.</p>



 ]]></description>
  <category>conformers</category>
  <category>questions</category>
  <category>technical</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-08-30-confgen-scaling.html</guid>
  <pubDate>Fri, 29 Aug 2025 22:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/confgen-scaling-1.png" medium="image" type="image/png" height="76" width="144"/>
</item>
<item>
  <title>How the 2D/3D flag in Mol blocks is used</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-08-22-interpreting-the-2d3d-flag.html</link>
  <description><![CDATA[ 




<p>[This is an expanded version of a section which will be added to the RDKit documentation to clarify an important detail about the way Mol and SD files are parsed.]</p>
<section id="background" class="level1">
<h1>Background</h1>
<p>Mol blocks, Mol files, and SD files can describe 2D or 3D molecules and include a flag (called “dimensional code” in the spec) to indicate whether the coordinates are 2D or 3D. The flag shows up in the second line of the Mol block, and is present in both V2000:</p>
<pre><code>nitrogen
     RDKit          2D

  2  1  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0000   -1.5000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  3  0
M  END</code></pre>
<p>and V3000 mol blocks:</p>
<pre><code>nitrogen
     RDKit          2D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 2 1 0 0 0
M  V30 BEGIN ATOM
M  V30 1 N 0.000000 0.000000 0.000000 0
M  V30 2 N -0.000000 -1.500000 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 3 1 2
M  V30 END BOND
M  V30 END CTAB
M  END</code></pre>
<p>The CTFile specification says the following about this flag: &gt; The “dimensional code” is maintained explicitly. Thus “3D” really means 3D, &gt; although “2D” will be interpreted as 3D if any non-zero Z-coordinates are &gt; found</p>
<p>That’s simple (and logical) enough, but of course in the real world things are more complicated. Files found “in the wild” include every possible combination of 2D/3D flag and 2D/3D coordinates, so we need to decide how to interpret these combinations. Things are made more complicated by the possible presence of wedged bonds in the Mol block. Wedged bonds are typically a signal that something is known about the stereochemistry of the molecule; how do we combine this information with the coordinates?</p>
<blockquote class="blockquote">
<p><em>Aside on 2D vs 3D coordinates</em>: in Mol blocks X, Y, and Z coordinates are always provided (if any of them are missing, they are set to zero). It’s also possible to have 3D coordinates for planar molecules. For the purposes of this discussion, “2D” coordinates means that all Z coordinates are zero, and a “2D” RDKit molecule is one where the (default) conformer is marked as being 2D (i.e.&nbsp;the conformer’s <code>Is3D()</code> method returns <code>False</code>).</p>
</blockquote>
</section>
<section id="what-the-rdkit-does" class="level1">
<h1>What the RDKit does</h1>
<p>The following table describes how the RDKit interprets all possible combinations of dimensionality flag, coordinate dimensionality, and the presence or absence of wedged bonds:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>flag</th>
<th>coords</th>
<th>wedging</th>
<th>result</th>
<th>notes</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>2D</td>
<td>2D</td>
<td>no</td>
<td>2D</td>
<td>no chirality</td>
</tr>
<tr class="even">
<td>3D</td>
<td>2D</td>
<td>no</td>
<td>3D</td>
<td>no chirality</td>
</tr>
<tr class="odd">
<td>3D</td>
<td>3D</td>
<td>no</td>
<td>3D</td>
<td>chirality from coords</td>
</tr>
<tr class="even">
<td>2D</td>
<td>3D</td>
<td>no</td>
<td>3D</td>
<td>chirality from coords</td>
</tr>
<tr class="odd">
<td>2D</td>
<td>2D</td>
<td>yes</td>
<td>2D</td>
<td>chirality from wedging</td>
</tr>
<tr class="even">
<td>3D</td>
<td>2D</td>
<td>yes</td>
<td>2D</td>
<td>chirality from wedging</td>
</tr>
<tr class="odd">
<td>3D</td>
<td>3D</td>
<td>yes</td>
<td>3D</td>
<td>chirality from coords</td>
</tr>
<tr class="even">
<td>2D</td>
<td>3D</td>
<td>yes</td>
<td>3D</td>
<td>chirality from coords</td>
</tr>
</tbody>
</table>
<p>This is consistent with what the specification says except for the case where the 3D flag is set for 2D coordinates and a wedge is present, in which case we ignore the 3D flag, mark the conformer as 2D, and set the stereochemistry based on the wedging.</p>
<p>In cases where no 2D/3D flag is provided, the default value of the flag is 2D.</p>
<p>In a 2D structure, wedging is interpreted as a signal to indicate that stereochemistry is present and to indicate what the stereochemistry is (i.e.&nbsp;the stereochemistry is determined based on the 2D coordinates and the direction of the wedge).</p>
<p>When 3D coordinates are provided, stereochemistry is perceived from the coordinates themselves. If wedging is also present it will be ignored; the 3D signal is “stronger” than the wedging signal. The exception to this rule is <a href="https://www.rdkit.org/docs/RDKit_Book.html#atropisomeric-bonds">atropisomeric bonds</a>, where the wedged bond indicates that atropisomerism should be perceived, but the direction of the wedge is ignored.</p>


</section>

 ]]></description>
  <category>documentation</category>
  <category>technical</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-08-22-interpreting-the-2d3d-flag.html</guid>
  <pubDate>Thu, 21 Aug 2025 22:00:00 GMT</pubDate>
</item>
<item>
  <title>A BRICS tutorial</title>
  <link>https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial.html</link>
  <description><![CDATA[ 




<p>This post is a short tutorial on using the RDKit’s BRICS implementation, expanding a bit on <a href="https://www.rdkit.org/docs/GettingStartedInPython.html#brics-implementation">what’s in the documentation</a>.</p>
<p>BRICS is a method for fragmenting molecules into smaller pieces along bonds which are likely to be synthetically accessible. The original paper describing the method is:</p>
<blockquote class="blockquote">
<p>Degen, J.; Wegscheid-Gerlach, C.; Zaliani, A.; Rarey, M. On the Art of Compiling and Using “Drug-Like” Chemical Fragment Spaces. ChemMedChem 2008, 3 (10), 1503–1507. https://chemistry-europe.onlinelibrary.wiley.com/doi/full/10.1002/cmdc.200800178</p>
</blockquote>
<div id="e84077fe" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Chem</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdDepictor</span>
<span id="cb1-3">rdDepictor.SetPreferCoordGen(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem.Draw <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> IPythonConsole</span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Draw</span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> rdkit.Chem <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> BRICS</span>
<span id="cb1-7"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rdkit</span>
<span id="cb1-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(rdkit.__version__)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>2025.03.5</code></pre>
</div>
</div>
<section id="brics-basics" class="level1">
<h1>BRICS basics</h1>
<div id="3c89a385" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1">esomeprazole <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmiles(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1'</span>)</span>
<span id="cb3-2">rdDepictor.Compute2DCoords(esomeprazole)</span>
<span id="cb3-3">esomeprazole</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-3-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="0aa07efd" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1">pieces <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BRICS.BRICSDecompose(esomeprazole)</span>
<span id="cb4-2">pieces</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>{'[14*]c1ncc(C)c([16*])c1C',
 '[3*]OC',
 '[8*]C[S+]([O-])c1nc2cc([16*])ccc2[nH]1'}</code></pre>
</div>
</div>
<div id="2a2071ea" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1">Draw.MolsToGridImage([Chem.MolFromSmiles(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pieces])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-5-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>The atom environments and connection rules can be found in Scheme 2 of the BRICS paper:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://chemistry-europe.onlinelibrary.wiley.com/cms/asset/797b1262-e8d9-42ad-b6b1-ba85878a35de/msch002.gif" class="img-fluid figure-img"></p>
<figcaption>BRICS connection rules</figcaption>
</figure>
</div>
<p>Note that the RDKit BRICS implementation does not contain a definition of L2, a three connected N with a lone pair. We incorporated that with L5 in a more general definition of amine. In the RDKit implementation L5 can connect with L1, L4, L12, L13, L14, L15, and L16. We also added a few additional connection definitions. The actual SMARTS used in the RDKit can be found in <a href="https://github.com/rdkit/rdkit-orig/blob/57058c886a49cc597b0c40641a28697ee3a57aee/Code/GraphMol/ChemTransforms/MolFragmenter.cpp#L104">MolFragmenter.cpp</a> along with the and the <a href="https://github.com/rdkit/rdkit-orig/blob/57058c886a49cc597b0c40641a28697ee3a57aee/Code/GraphMol/ChemTransforms/MolFragmenter.cpp#L205">connection rules</a>. There is also a Python version of the definitions in <a href="https://github.com/rdkit/rdkit-orig/blob/master/rdkit/Chem/BRICS.py">BRICS.py</a>, but this is no longer used and may not exactly the match the C++ implementation.</p>
<p>The RDKit’s BRICS implementation includes the ability to create new molecules by stitching fragments together according to the connection rules above. Here’s an illustration of that showing all the molecules that can be formed by combining the three fragments from esomeprazole using the BRICS connection rules with a maximum enumeration depth of 2:</p>
<div id="d872b9b6" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1">piecems <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(smi) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> smi <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> pieces]</span>
<span id="cb7-2">builder <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BRICS.BRICSBuild(piecems, maxDepth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, scrambleReagents<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb7-3">ms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">list</span>(builder)</span>
<span id="cb7-4">Draw.MolsToGridImage(ms)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-6-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="enumerating-a-larger-set-of-molecules" class="level1">
<h1>Enumerating a larger set of molecules</h1>
<p>Here I grab the ChEMBL molecules that were most similar to esomeprazole, fragment them together with esomeprazole, and then explore some of the resulting molecules.</p>
<section id="aside-grabbing-similar-molecules-from-chembl" class="level2">
<h2 class="anchored" data-anchor-id="aside-grabbing-similar-molecules-from-chembl">Aside: grabbing similar molecules from ChEMBL</h2>
<p>Here’s the query I executed and the results returned:</p>
<pre><code>chembl_35=# select chembl_id,m,similarity from get_mfp2_neighbors('COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1') join chembl_id_lookup on (molregno=entity_id and entity_type='COMPOUND') join molecule_hierarchy using (molregno) where molregno=parent_molregno and similarity&lt;1 order by similarity desc limit 20;
   chembl_id   |                              m                              |     similarity     
---------------+-------------------------------------------------------------+--------------------
 CHEMBL9890    | COc1c(C)cnc(C[S+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c1C         | 0.8333333333333334
 CHEMBL5089043 | COc1c(C)cnc(C[S@@+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c1C       | 0.8333333333333334
 CHEMBL5076667 | COc1c(C)cnc(C[S@+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c1C        | 0.8333333333333334
 CHEMBL3527071 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(CO)c(OC)c3C)nc2c1            | 0.8148148148148148
 CHEMBL4525760 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C(=O)O)c(OC)c3C)nc2c1        |                0.8
 CHEMBL10184   | COc1c(C)cnc(C[S+]([O-])c2nc3ccccc3[nH]2)c1C                 | 0.7692307692307693
 CHEMBL9430    | COc1cnc(C[S+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c(C)c1OC        | 0.7288135593220338
 CHEMBL59068   | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N(C)C)c3Cl)nc2c1         | 0.7192982456140351
 CHEMBL138250  | COc1c(C)cnc(C[S+]([O-])c2nc3cscc3[nH]2)c1C                  | 0.7169811320754716
 CHEMBL5070031 | COc1ccc2[nH]c([S@@+]([O-])Cc3nccc(OC)c3C)nc2c1              | 0.7090909090909091
 CHEMBL10061   | COc1cnc(C[S+]([O-])c2nc3cc(OC(F)(F)C(F)F)ccc3[nH]2)c(C)c1OC | 0.6935483870967742
 CHEMBL1475252 | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2n1             | 0.6909090909090909
 CHEMBL9992    | COc1cnc(C[S+]([O-])c2nc3cc(OCC(F)(F)F)ccc3[nH]2)c(C)c1OC    | 0.6885245901639344
 CHEMBL440676  | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCCC4)c3Cl)nc2c1       | 0.6721311475409836
 CHEMBL20315   | COc1ccc2[nH]c([S+]([O-])Cc3c(N)cc(C)c(OC)c3C)nc2c1          | 0.6666666666666666
 CHEMBL59784   | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCCCC4)c3Cl)nc2c1      | 0.6612903225806451
 CHEMBL144285  | COc1ccc2[nH]c([S+]([O-])Cc3nccc(OC(C)C)c3C)nc2c1            |               0.65
 CHEMBL341550  | COc1ccc(-c2scc3[nH]c([S+]([O-])Cc4ncc(C)c(OC)c4C)nc23)cc1   |               0.65
 CHEMBL1796802 | COc1cc2nc([S+]([O-])Cc3ncc(C)c(OC)c3C)[nH]c2cc1NC(C)=O      | 0.6451612903225806
 CHEMBL58833   | COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCOCC4)c3Cl)nc2c1      |           0.640625
(20 rows)
</code></pre>
<p>the very useful <code>get_mfp2_neighbors()</code> function is from <a href="https://www.rdkit.org/docs/Cartridge.html#loading-chembl">the cartridge documentation</a>.</p>
<p>Here’s the set of molecules we’ll use:</p>
<div id="cb134904" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1">smis <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'''COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1</span></span>
<span id="cb9-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1c(C)cnc(C[S+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c1C</span></span>
<span id="cb9-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3ncc(CO)c(OC)c3C)nc2c1</span></span>
<span id="cb9-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3ncc(C(=O)O)c(OC)c3C)nc2c1</span></span>
<span id="cb9-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1c(C)cnc(C[S+]([O-])c2nc3ccccc3[nH]2)c1C</span></span>
<span id="cb9-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1cnc(C[S+]([O-])c2nc3cc(OC(F)F)ccc3[nH]2)c(C)c1OC</span></span>
<span id="cb9-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N(C)C)c3Cl)nc2c1</span></span>
<span id="cb9-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1c(C)cnc(C[S+]([O-])c2nc3cscc3[nH]2)c1C</span></span>
<span id="cb9-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3nccc(OC)c3C)nc2c1</span></span>
<span id="cb9-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1cnc(C[S+]([O-])c2nc3cc(OC(F)(F)C(F)F)ccc3[nH]2)c(C)c1OC</span></span>
<span id="cb9-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(OC)c3C)nc2n1</span></span>
<span id="cb9-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1cnc(C[S+]([O-])c2nc3cc(OCC(F)(F)F)ccc3[nH]2)c(C)c1OC</span></span>
<span id="cb9-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCCC4)c3Cl)nc2c1</span></span>
<span id="cb9-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3c(N)cc(C)c(OC)c3C)nc2c1</span></span>
<span id="cb9-15"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCCCC4)c3Cl)nc2c1</span></span>
<span id="cb9-16"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3nccc(OC(C)C)c3C)nc2c1</span></span>
<span id="cb9-17"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc(-c2scc3[nH]c([S+]([O-])Cc4ncc(C)c(OC)c4C)nc23)cc1</span></span>
<span id="cb9-18"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1cc2nc([S+]([O-])Cc3ncc(C)c(OC)c3C)[nH]c2cc1NC(C)=O</span></span>
<span id="cb9-19"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">COc1ccc2[nH]c([S+]([O-])Cc3ncc(C)c(N4CCOCC4)c3Cl)nc2c1'''</span>.split(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span>)</span>
<span id="cb9-20">mols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(smi) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> smi <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> smis]</span>
<span id="cb9-21">Draw.MolsToGridImage(mols,molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-7-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Fragment all the molecules and keep the unique fragments:</p>
<div id="dd009376" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1">allfrags<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>()</span>
<span id="cb10-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> mols:</span>
<span id="cb10-3">    pieces <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BRICS.BRICSDecompose(m)</span>
<span id="cb10-4">    allfrags.update(pieces)</span>
<span id="cb10-5">allfrags</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<pre><code>{'[1*]C(C)=O',
 '[14*]c1ncc(C)c([16*])c1C',
 '[14*]c1ncc(C)c([16*])c1Cl',
 '[14*]c1ncc([16*])c([16*])c1C',
 '[14*]c1nccc([16*])c1C',
 '[16*]c1c(C)cc(N)c([16*])c1C',
 '[16*]c1ccc([16*])cc1',
 '[3*]OC',
 '[3*]OC(F)F',
 '[3*]O[3*]',
 '[4*]C(C)C',
 '[4*]C(F)(F)C(F)F',
 '[4*]CC(F)(F)F',
 '[5*]N(C)C',
 '[5*]N1CCCC1',
 '[5*]N1CCCCC1',
 '[5*]N1CCOCC1',
 '[5*]N[5*]',
 '[6*]C(=O)O',
 '[8*]CO',
 '[8*]C[S+]([O-])c1nc2c([14*])scc2[nH]1',
 '[8*]C[S+]([O-])c1nc2cc([16*])c([16*])cc2[nH]1',
 '[8*]C[S+]([O-])c1nc2cc([16*])ccc2[nH]1',
 '[8*]C[S+]([O-])c1nc2ccccc2[nH]1',
 '[8*]C[S+]([O-])c1nc2cscc2[nH]1',
 '[8*]C[S+]([O-])c1nc2nc([14*])ccc2[nH]1'}</code></pre>
</div>
</div>
<p>We can then recombine fragments to produce new molecules using <code>BRICS.BRICSBuild</code>:</p>
<div id="477d859f" class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> random</span>
<span id="cb12-2">random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">27</span>)</span>
<span id="cb12-3"></span>
<span id="cb12-4">fragms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> allfrags]</span>
<span id="cb12-5">builder <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BRICS.BRICSBuild(fragms)</span></code></pre></div>
</div>
<p>The result is a generator:</p>
<div id="0d3da57a" class="cell" data-scrolled="true" data-execution_count="20">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1">builder</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="20">
<pre><code>&lt;generator object BRICSBuild at 0x785401fb8040&gt;</code></pre>
</div>
</div>
<p>Here’s what the first 16 results look like:</p>
<div id="0edec122" class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">newMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">next</span>(builder) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>)]</span>
<span id="cb15-2">Draw.MolsToGridImage(newMols,molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="21">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-11-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Let’s filter out the everything that doesn’t have exactly one sulfoxide:</p>
<div id="73db0750" class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1">qry <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Chem.MolFromSmarts(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'S-O'</span>)</span>
<span id="cb16-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(gen):</span>
<span id="cb16-3">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> res <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> gen:</span>
<span id="cb16-4">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(res.GetSubstructMatches(qry)) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>:</span>
<span id="cb16-5">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">yield</span> res</span>
<span id="cb16-6"></span>
<span id="cb16-7">random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">27</span>)</span>
<span id="cb16-8">builder <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BRICS.BRICSBuild(fragms)</span>
<span id="cb16-9">newMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">next</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(builder)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>)]</span>
<span id="cb16-10">Draw.MolsToGridImage(newMols,molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)    </span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="23">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-12-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<div id="ef9652c3" class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1">random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span>)</span>
<span id="cb17-2">builder <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BRICS.BRICSBuild(fragms)</span>
<span id="cb17-3">newMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">next</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(builder)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)]</span>
<span id="cb17-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># order by number of atoms;</span></span>
<span id="cb17-5">newMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [newMols[y] <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (x,y) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sorted</span>([(m.GetNumAtoms(),i) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i,m <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">enumerate</span>(newMols)])]</span>
<span id="cb17-6">Draw.MolsToGridImage(newMols[:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>],molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)    </span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="24">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-13-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
</section>
<section id="providing-a-seed-to-the-enumeration" class="level2">
<h2 class="anchored" data-anchor-id="providing-a-seed-to-the-enumeration">Providing a seed to the enumeration</h2>
<p>Another option in the BRICS builder is to provide one or more seeds that must be present in every output molecule. Here’s an example of how to do that using the scaffold of esomeprazole along with a version of the scaffold where the N in the second ring is replaced by a C:</p>
<div id="cf14da7f" class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1">seeds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [Chem.MolFromSmiles(x) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> x <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> (<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c19ncc(C)c([16*])c1C.C9[S+]([O-])c1nc2cc([16*])ccc2[nH]1'</span>,</span>
<span id="cb18-2">                                         <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c19ccc(C)c([16*])c1C.C9[S+]([O-])c1nc2cc([16*])ccc2[nH]1'</span>,)]</span>
<span id="cb18-3"></span>
<span id="cb18-4">Draw.MolsToGridImage(seeds)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="14">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-14-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Now do the enumeration using those seeds:</p>
<div id="0f9a5128" class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1">random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">127</span>)</span>
<span id="cb19-2">builder <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> BRICS.BRICSBuild(fragms,seeds<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>seeds)</span>
<span id="cb19-3">newMols <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">next</span>(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(builder)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>)]</span>
<span id="cb19-4">Draw.MolsToGridImage(newMols,molsPerRow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)    </span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="15">
<div>
<figure class="figure">
<p><img src="https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial_files/figure-html/cell-15-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>Doing enumeration providing seeds that are this specific is similar in ways to doing enumeration using the <a href="https://greglandrum.github.io/rdkit-blog/posts/2022-03-14-rgd-and-molzip.html">results of R-group decomposition</a></p>


</section>
</section>

 ]]></description>
  <category>documentation</category>
  <category>tutorial</category>
  <guid>https://greglandrum.github.io/rdkit-blog/posts/2025-08-15-BRICS-tutorial.html</guid>
  <pubDate>Thu, 14 Aug 2025 22:00:00 GMT</pubDate>
  <media:content url="https://greglandrum.github.io/rdkit-blog/posts/images/blog/brics-tutorial-1.png" medium="image" type="image/png" height="115" width="144"/>
</item>
</channel>
</rss>
