diff --git a/docs/examples/electricity.ipynb b/docs/examples/electricity.ipynb
index ce89d19..61795f7 100644
--- a/docs/examples/electricity.ipynb
+++ b/docs/examples/electricity.ipynb
@@ -1576,10 +1576,20 @@
    "id": "5ae5e987",
    "metadata": {},
    "source": [
-    "<div class=\"alert alert-info\">\n",
-    "    <b>Training End-to-End</b>\n",
-    "    <p>Above, we never actually registered <code>season_trainer.module</code> as an attribute of our KalmanFilter (i.e. we didn't do <code>kf_nn.season_nn = season_trainer.module</code>). This means that we won't continue training the embeddings as we train our KalmanFilter. Why not? For that matter, why did we pre-train in the first place? Couldn't we have just registered an untrained embeddings network and trained the whole thing end to end?</p>\n",
-    "    <p>In practice, neural-networks have many more parameters and take many more epochs than our state-space models (and conversely, our state-space models are much slower to train _per_ epoch). So it's much more efficient to pre-train the network first. Then it's up to us whether we want to continue training the network, or just freeze its weights (i.e. exclude it from the optimizer) and just train the state-space models' parameters. Here we're freezing them by not assigning the network as an attribute (so that the parameters don't get passed to when we run <code>torch.optim.Adam(kf_nn.parameters(), lr=.05)</code>.</p>\n",
+    "<div class=\"admonition note\">\n",
+    "<div class=\"admonition-title\">Training End-to-End</div>\n",
+    "\n",
+    "Above, we never actually registered ``season_trainer.module`` as an attribute of our KalmanFilter (i.e. we didn't do\n",
+    "``kf_nn.season_nn = season_trainer.module``). This means that we won't continue training the embeddings as we train our KalmanFilter.\n",
+    "Why not? For that matter, why did we pre-train in the first place? Couldn't we have just registered an untrained embeddings network\n",
+    "and trained the whole thing end to end?\n",
+    "\n",
+    "In practice, neural-networks have many more parameters and take many more epochs than our state-space models (and conversely, our\n",
+    "state-space models are much slower to train _per_ epoch). So it's much more efficient to pre-train the network first. Then it's up to\n",
+    "us whether we want to continue training the network, or just freeze its weights (i.e. exclude it from the optimizer) and just train the\n",
+    "state-space models' parameters. Here we're freezing them by not assigning the network as an attribute (so that the parameters don't get\n",
+    "passed to when we run ``torch.optim.Adam(kf_nn.parameters(), lr=.05)``.\n",
+    "\n",
     "</div>"
    ]
   },