more electricity example tweaks; bump version

strongio · Jan 8, 2025 · 6a99bf5 · 6a99bf5
1 parent 9d1da45
commit 6a99bf5
Show file tree

Hide file tree

Showing 2 changed files with 56 additions and 77 deletions.
diff --git a/docs/examples/electricity.ipynb b/docs/examples/electricity.ipynb
@@ -76,7 +76,7 @@
     {
      "data": {
       "text/plain": [
-       "<torch._C.Generator at 0x11d1b06f0>"
+       "<torch._C.Generator at 0x1146446f0>"
       ]
      },
      "execution_count": 3,
@@ -1183,18 +1183,10 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 25,
    "id": "cc9c3c39",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/jacobdink/miniconda3/envs/bark-phone/lib/python3.10/site-packages/torch/nn/modules/lazy.py:181: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "from torchcast.utils.training import SeasonalEmbeddingsTrainer\n",
     "\n",
@@ -1259,7 +1251,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 26,
    "id": "33119450",
    "metadata": {},
    "outputs": [
@@ -1276,10 +1268,10 @@
     {
      "data": {
       "text/plain": [
-       "<torchcast.utils.training.SeasonalEmbeddingsTrainer at 0x3f62271c0>"
+       "<torchcast.utils.training.SeasonalEmbeddingsTrainer at 0x16213db10>"
       ]
      },
-     "execution_count": 25,
+     "execution_count": 26,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1327,7 +1319,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 27,
    "id": "8fa1e51b",
    "metadata": {},
    "outputs": [
@@ -1381,7 +1373,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 28,
    "id": "3b49fb78",
    "metadata": {},
    "outputs": [
@@ -1424,22 +1416,16 @@
   },
   {
    "cell_type": "markdown",
-   "id": "95187033",
+   "id": "d1ab34ad-2c71-4083-9566-410413f51230",
    "metadata": {},
    "source": [
-    "How should we incorporate our `season_embedder` neural-network into a state-space model? There are at least two options:\n",
-    "\n",
-    "#### Option 1\n",
-    "\n",
-    "The first option is to create our fourier-features on the dataframe, and pass these as features into a dataloader.\n",
-    "\n",
-    "1. First, we create our time-series model:"
+    "How should we incorporate our `season_embedder` neural-network into a state-space model? First, we create our time-series model:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
-   "id": "fd7ca2f5",
+   "execution_count": 29,
+   "id": "b68d8b86-04fe-4613-8d11-a80ae1d43f3c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1457,15 +1443,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "caf84e63",
+   "id": "141cd31c-fb28-4a35-9761-e40b5a7dd262",
    "metadata": {},
    "source": [
-    "2. Next, we add our season features to the dataframe, and create a dataloader, passing these feature-names to the `X_colnames` argument:"
+    "Then, we have two options:\n",
+    "\n",
+    "1. The first option is to create our fourier-features on the dataframe, and pass these as features into a dataloader."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 30,
    "id": "f3244f64",
    "metadata": {},
    "outputs": [],
@@ -1491,23 +1479,12 @@
    "id": "fda8243f",
    "metadata": {},
    "source": [
-    "Finally, we train the model, either rolling our own training loop...\n",
-    "\n",
-    "```python\n",
-    "for i in range(num_epochs):\n",
-    "    for batch in dataloader_kf_nn:\n",
-    "        batch = batch.to(DEVICE)\n",
-    "        y, X = batch.tensors\n",
-    "        predictions = kf_nn(y, X=X, start_offsets=batch.start_offsets)\n",
-    "        # use predictions.log_prob on optimizer, etc.\n",
-    "```\n",
-    "\n",
-    "...or, even better, using a tool like Pytorch Lightning. Torchcast also includes a simple tool for this, the `StateSpaceTrainer`:"
+    "...then we'd train our model with a tool like Pytorch Lightning. Torchcast also includes a simple tool for this, the `StateSpaceTrainer`:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 31,
    "id": "39eff5f3",
    "metadata": {},
    "outputs": [],
@@ -1530,21 +1507,19 @@
    "id": "4ab296fc",
    "metadata": {},
    "source": [
-    "#### Option 2\n",
-    "\n",
-    "An even simpler (though less general) option is just to leverage the util methods in the `SeasonalEmbeddingsTrainer`, which handles converting a `TimeSeriesDataset` into a tensor of fourier terms:"
+    "2. An even simpler (though less general) option is just to leverage the util methods in the `SeasonalEmbeddingsTrainer`, which handles converting a `TimeSeriesDataset` into a tensor of fourier terms:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 32,
    "id": "a7d0abfa",
    "metadata": {},
    "outputs": [],
    "source": [
     "def dataset_to_kwargs(batch: TimeSeriesDataset) -> dict:\n",
     "    seasonX = season_trainer.times_to_model_mat(batch.times()).to(dtype=torch.float, device=DEVICE)\n",
-    "    return {'X' : season_trainer.module.season_nn(seasonX)}\n",
+    "    return {'X' : season_trainer.module(seasonX)}\n",
     "\n",
     "ss_trainer = StateSpaceTrainer(\n",
     "    module=kf_nn,\n",
@@ -1558,12 +1533,12 @@
    "id": "535db134",
    "metadata": {},
    "source": [
-    "Then we don't need to use `add_season_features` when creating our data-loader, since `times_to_model_mat` will create them per-batch as needed (which will be much easier on our GPU's memory):"
+    "Then we don't need to use `add_season_features` when creating our data-loader, since `season_trainer.times_to_model_mat` will create them per-batch as needed (which will be much easier on our GPU's memory):"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": 33,
    "id": "160ecea4",
    "metadata": {},
    "outputs": [],
@@ -1595,14 +1570,14 @@
     "state-space models are much slower to train _per_ epoch). So it's much more efficient to pre-train the network first. Then it's up to\n",
     "us whether we want to continue training the network, or just freeze its weights (i.e. exclude it from the optimizer) and just train the\n",
     "state-space models' parameters. Here we're freezing them by not assigning the network as an attribute (so that the parameters don't get\n",
-    "passed to when we run ``torch.optim.Adam(kf_nn.parameters(), lr=.05)``.\n",
+    "passed to when we run ``torch.optim.Adam(kf_nn.parameters(), lr=.05)``).\n",
     "\n",
     "</div>"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 34,
    "id": "51fdb529",
    "metadata": {},
    "outputs": [
@@ -1639,30 +1614,22 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0cec0bcd",
+   "id": "0cc5d6fe-fe0c-4c54-9a40-7c02faacf24a",
    "metadata": {},
    "source": [
-    "### Evaluation"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a36e1ec8",
-   "metadata": {},
-   "source": [
-    "#### Generating Torchcast Forecasts for all groups"
+    "Now we'll create forecasts for all the groups, and back-transform them, for plotting and evaluation."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 35,
    "id": "7e38a631",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "598fdfa4ec4c479b9aeb5760bd5d61cf",
+       "model_id": "19c334095df54e6d8157b2c31a4b998e",
        "version_major": 2,
        "version_minor": 0
       },
@@ -1861,7 +1828,7 @@
        "[9548098 rows x 8 columns]"
       ]
      },
-     "execution_count": 43,
+     "execution_count": 35,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1881,8 +1848,6 @@
     "\n",
     "    df_all_preds = []\n",
     "    for batch in tqdm(dataloader_all):\n",
-    "        # if example_group not in batch.group_names:\n",
-    "        #     continue\n",
     "        batch = batch.to(DEVICE)\n",
     "        seasonX = season_trainer.times_to_model_mat(batch.times()).to(dtype=torch.float, device=DEVICE)\n",
     "        pred = kf_nn(batch.tensors[0], X=season_trainer.module(seasonX), start_offsets=batch.start_offsets)\n",
@@ -1897,7 +1862,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": 36,
    "id": "6907f9bb",
    "metadata": {},
    "outputs": [
@@ -1918,14 +1883,30 @@
     "plot_2x2(df_all_preds.query(\"group==@example_group\"), actual_colname='kW', split_dt=SPLIT_DT)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "2c3bc7dd-f0c8-453a-8310-2c7c182087b5",
+   "metadata": {},
+   "source": [
+    "Success! If our example group is representative, our forecasting model was able to use the embeddings to capture complex seasonal structure."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cec0bcd",
+   "metadata": {},
+   "source": [
+    "### Evaluation"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "2ab8a44f",
    "metadata": {},
    "source": [
     "#### A Simple Baseline\n",
     "\n",
-    "We've see that, for this dataset, generating forecasts that are *sane* is an already an achievement.\n",
+    "We've see that, for this dataset, generating forecasts that are *sane* is already an achievement.\n",
     "\n",
     "But of course, ideally we'd actually have some kind of a quantitative measure of how good our forecasts are.\n",
     "\n",
@@ -1934,7 +1915,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 45,
+   "execution_count": 38,
    "id": "67852453",
    "metadata": {},
    "outputs": [
@@ -1989,7 +1970,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 46,
+   "execution_count": 39,
    "id": "34f7097c-b5ef-4384-bfc5-b0b7b61c1396",
    "metadata": {},
    "outputs": [
@@ -2169,7 +2150,7 @@
        "[19096196 rows x 7 columns]"
       ]
      },
-     "execution_count": 46,
+     "execution_count": 39,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -2178,8 +2159,6 @@
     "df_compare = (df_all_preds[['group', 'mean', 'time', 'kW', 'dataset']]\n",
     "             .rename(columns={'mean' : 'torchcast'})\n",
     "             .merge(df_baseline365, how='left'))\n",
-    "assert (df_compare['baseline'].notnull() | (df_compare['dataset'] == 'train')).all()\n",
-    "assert df_compare['torchcast'].notnull().all()\n",
     "\n",
     "df_compare_long = df_compare.melt(\n",
     "    id_vars=['group', 'time', 'kW', 'dataset'], \n",
@@ -2203,7 +2182,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 48,
+   "execution_count": 40,
    "id": "724fcadb-b243-4ea1-9666-3ac99f6e7b9a",
    "metadata": {},
    "outputs": [
@@ -2213,7 +2192,7 @@
        "<Axes: title={'center': 'Torchcast vs. Baseline: Error over Time'}, xlabel='date', ylabel='Abs(Error)'>"
       ]
      },
-     "execution_count": 48,
+     "execution_count": 40,
      "metadata": {},
      "output_type": "execute_result"
     },
@@ -2260,7 +2239,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 54,
+   "execution_count": 41,
    "id": "8cdac807",
    "metadata": {},
    "outputs": [

diff --git a/torchcast/__init__.py b/torchcast/__init__.py
@@ -1 +1 @@
-__version__ = '0.4.2'
+__version__ = '0.4.3'