generative.html

<!DOCTYPE html>
<html lang="" xml:lang="">
  <head>
    <title>Generative models and multilevel models</title>
    <meta charset="utf-8" />
    <meta name="author" content="Dominique Gravel &amp; Andrew MacDonald" />
    <script src="assets/header-attrs-2.8/header-attrs.js"></script>
    <link href="assets/remark-css-0.0.1/default.css" rel="stylesheet" />
    <link href="assets/remark-css-0.0.1/hygge.css" rel="stylesheet" />
    <link rel="stylesheet" href="assets/ecl707.css" type="text/css" />
  </head>
  <body>
    <textarea id="source">


class: title-slide, middle

&lt;style type="text/css"&gt;
  .title-slide {
    background-image: url('assets/img/bg.jpg');
    background-color: #23373B;
    background-size: contain;
    border: 0px;
    background-position: 600px 0;
    line-height: 1;
  }
&lt;/style&gt;

# Model evaluation with ecological data

&lt;hr width="65%" align="left" size="0.3" color="orange"&gt;&lt;/hr&gt;

## Generative models and multilevel models

&lt;hr width="65%" align="left" size="0.3" color="orange" style="margin-bottom:40px;" alt="@Martin Sanchez"&gt;&lt;/hr&gt;

.instructors[
  Andrew MacDonald, Dominique Gravel
]

&lt;img src="assets/img/logo.png" width="25%" style="margin-top:20px;"&gt;&lt;/img&gt;

---
# Generative models

Bayesian models can generate data. This is part of what makes them useful in ecology:

* Easy to make predictions
* Straightforward to test the model

---
# Simple Generative model


We've already seen a simple model which is generative: our Binomial model with a Beta prior:

We start with this model
$$
`\begin{align}
y \sim \text{Binomial}(6, \theta) \\
\theta \sim Beta(2,2)
\end{align}`
$$

and we have data `4, 5, 4, 4, 3`

* So total successes is `\(4 + 5 + 4 + 4 + 3 = 20\)`
* And total failures is `\(6 * 5 - 20 = 10\)`

The posterior becomes:

$$
\theta_{post} \sim \text{Beta}(22, 12)
$$

---
# Generating data with this simple model

Quick pseudocode (_Note_ this is also called the Beta-Binomial distribution) :

* generate values of `\(\theta\)` from `\(\text{Beta}(22, 12)\)`
* for each value, generate an observation of surviving eggs using:

$$
y \sim \text{Binomial}(6, \theta)
$$

.pull-left[

```r
theta_post &lt;- rbeta(200, 
                    22, 12)
obs &lt;- rbinom(n = 200, 
              size = 6, 
              theta_post)
hist(obs, breaks = 20)
```
]

.pull-right[
&lt;img src="generative_files/figure-html/unnamed-chunk-1-1.png" width="504" /&gt;
]

---
# nonlinear tree growth -- write the model

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu), \sigma_{height}) \\
\mu &amp;= \frac{aL}{a/s + L} \\
\end{align}`
$$

To make it Bayesian, we specify distributions for every parameter:
--
$$
`\begin{align}
\sigma_{height} &amp; \sim \text{Exponential}(0.5)\\
a               &amp; \sim \text{Normal}(200, 15)\\
s               &amp; \sim \text{Normal}(15, 2)\\
\end{align}`
$$

--
* _not_ the only choices for these parameters! Not even particularly good ones
--
* use prior predictive checks to see if these priors make sense

---
# Nonlinear light data -- simulation

Draw parameters from your priors. 

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu), \sigma_{height}) \\
\mu &amp;= \frac{aL}{a/s + L} \\
\sigma_{height} &amp; \sim \text{Exponential}(2)\\
a               &amp; \sim \text{Normal}(200, 15)\\
s               &amp; \sim \text{Normal}(15, 2)\\
\end{align}`
$$

---
# Nonlinear light data -- simulation

Draw parameters from your priors. 

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu), \sigma_{height}) \\
\mu &amp;= \frac{aL}{a/s + L} \\
\sigma_{height} &amp; = 0.36\\
a               &amp; = 196.4\\
s               &amp; = 16.1\\
\end{align}`
$$
---
# Nonlinear light data -- simulation

Draw parameters from your priors. 

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu), \sigma_{height}) \\
\mu &amp;= \frac{aL}{a/s + L} \\
\sigma_{height} &amp; = 0.36\\
a               &amp; = 196.4\\
s               &amp; = 16.1\\
\end{align}`
$$
--

* keep these fixed for the rest of the simulation! You can go back and try different ones later!

--

* **pop quiz** is `\(\mu\)` a parameter??

---
# Nonlinear light data -- simulation

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}\left(\text{log}\left(\frac{aL}{a/s + L} \right), \sigma_{height}\right) \\
\sigma_{height} &amp; = 0.36\\
a               &amp; = 196.4\\
s               &amp; = 16.1\\
\end{align}`
$$
&lt;br&gt;&lt;br&gt;
**NO**. `\(\mu\)` is not a parameter, it is simply a handy way to rewrite this equation so it is easier to read.  

`\(\mu\)` might be a very interesting thing to calculate after the model is run!
---
class: center
# Simple simulation

&lt;img src="generative_files/figure-html/unnamed-chunk-2-1.png" width="504" /&gt;
---
# Other ways of talking about the same model

let's go back to this representation of the model:

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu), \sigma_{height}) \\
\mu &amp;= \frac{aL}{a/s + L} \\
\sigma_{height} &amp; \sim \text{Exponential}(2)\\
a               &amp; \sim \text{Normal}(200, 15)\\
s               &amp; \sim \text{Normal}(15, 2)\\
\end{align}`
$$

* the deterministic part of the model is indicated with `\(=\)`
* the stochastic parts are all indicated with `\(\sim\)`

When one distribution has a parameter which is also a distribution, we say that the first is *conditional* on the second.
---
class: center

# Graphing conditional distributions


&lt;img src="generative_files/figure-html/dag-fig-1.png" width="504" /&gt;

---
class: center

# Converting the graph to probability notation

&lt;img src="generative_files/figure-html/unnamed-chunk-3-1.png" width="504" /&gt;

&lt;p&gt;
$$
[a, s, \sigma_{\text{height}}|y ] \propto 
[y |g(L, a, s), \sigma_{\text{height}}] 
[a] 
[s] [\sigma_{\text{height}}] 
$$
&lt;/p&gt;

---
# Converting the graph to probability notation

&lt;p&gt;
$$
[a, s, \sigma_{\text{height}}|y ] \propto 
[y |g(L, a, s), \sigma_{\text{height}}] 
[a] 
[s] [\sigma_{\text{height}}] 
$$
&lt;/p&gt;

--

* the vertical bar `\(|\)` reads "conditioned on" or "dependent on"

--

* `\(L\)` is observed and therefore it appears *only* of the right-hand side of `\(|\)`

--

* `\(a\)`, `\(s\)` and `\(L\)` are parameters and therefore need distributions

--

* `\([y |g(L, a, s), \sigma_y]\)` is a likelihood! This means all the observations multiplied together:

&lt;p&gt;
$$
[y |g(L, a, s), \sigma_y] = \prod_{i=1}^{300} \text{LogNormal}\left(y_i, \text{log}\left(\frac{aL}{a/x + L} \right), \sigma_{height}\right)
$$
&lt;/p&gt;

---
# Hierarchical models

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu), \sigma_{height}) \\
\mu &amp;= \frac{aL}{a/s + L} \\
\sigma_{height} &amp; \sim \text{Exponential}(2)\\
a               &amp; \sim \text{Normal}(200, 15)\\
s               &amp; \sim \text{Normal}(15, 2)\\
\end{align}`
$$
We replace a parameter with a linear model, complete with a distribution:
---
# Hierarchical models

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu_n), \sigma_{height}) \\
\mu_n &amp;= \frac{a_nL}{a_n/s + L} \\
a_n &amp;= \bar{a} + a_{j[n]}\\
a_j &amp;\sim \text{Normal}(0, \sigma_a)\\
\bar{a}               &amp; \sim \text{Normal}(200, 15)\\
\sigma_{a} &amp; \sim \text{Exponential}(2)\\
\sigma_{height} &amp; \sim \text{Exponential}(4)\\
s               &amp; \sim \text{Normal}(15, 2)\\
\end{align}`
$$

* For observation number `\(n\)`, between `\(1\)` and `\(N\)` total observations.
* For species number `\(j\)` , which is between `\(1\)` and `\(S\)` total species (in our example `\(S\)` is 5)
* There are **5 values** of `\(a_j\)` : `\(a_1\)`, `\(a_2\)`, `\(a_3\)`, `\(a_4\)`, `\(a_5\)`

---
# Quick word about nested indexing

`\(a_{j[n]}\)`  This means the value of `\(a\)` for observation `\(n\)` -- whichever of the `\(j\)` species it is in.


```r
sp_ids &lt;- c(fir = 1, maple = 2)
fake_data &lt;- expand.grid(light = c(5,10, 60),
                         spp = c("fir", "maple"))
*fake_data$id &lt;- sp_ids[fake_data$spp]
knitr::kable(fake_data)
```


| light|spp   | id|
|-----:|:-----|--:|
|     5|fir   |  1|
|    10|fir   |  1|
|    60|fir   |  1|
|     5|maple |  2|
|    10|maple |  2|
|    60|maple |  2|


---
# Hierarchical models -- two ways to write

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu_n), \sigma_{height}) \\
\mu_n &amp;= \frac{a_nL}{a_n/s + L} \\
a_n &amp;= \bar{a} + a_{j[n]}\\
a_j &amp;\sim \text{Normal}(0, \sigma_a)\\
\bar{a}               &amp; \sim \text{Normal}(200, 15)\\
\sigma_{a} &amp; \sim \text{Exponential}(2)\\
\sigma_{height} &amp; \sim \text{Exponential}(4)\\
s               &amp; \sim \text{Normal}(15, 2)\\
\end{align}`
$$
---
# Hierarchical models -- two ways to write

$$
`\begin{align}
\text{height} &amp;\sim \text{LogNormal}(\text{log}(\mu), \sigma_{height}) \\
\mu_n &amp;= \frac{a_{j[n]}L}{a_{j[n]}/s + L} \\
a_j &amp;\sim \text{Normal}(\bar{a}, \sigma_a)\\
\bar{a}               &amp; \sim \text{Normal}(200, 15)\\
\sigma_{a} &amp; \sim \text{Exponential}(2)\\
\sigma_{height} &amp; \sim \text{Exponential}(4)\\
s               &amp; \sim \text{Normal}(15, 2)\\
\end{align}`
$$

--

BOTH will have the same 5 values for `\(a_j\)`

---
class: center
# Different species `\(a\)` simulation

&lt;img src="generative_files/figure-html/unnamed-chunk-5-1.png" width="504" /&gt;


---
# DAG for Multilevel models


&lt;img src="generative_files/figure-html/unnamed-chunk-6-1.png" width="504" /&gt;

Note: we have **arrows pointing in** AND **arrows pointing out**

---
# Probability notation

&lt;p&gt;
$$
`\begin{align}
[a_j,\bar{a}, \sigma_a,  s, \sigma_{\text{height}}|y ] &amp;\propto \\
&amp;[y |g(L, a_j, s), \sigma_{\text{height}}] \times \\
&amp;[a_j |\bar{a}, \sigma_a] \times \\
&amp;[\bar{a}] \times \\
&amp;[\sigma_a] \times \\
&amp;[s] \times \\ 
&amp;[\sigma_{\text{height}}]  
\end{align}`
$$
&lt;/p&gt;
    </textarea>
<style data-target="print-only">@media screen {.remark-slide-container{display:block;}.remark-slide-scaler{box-shadow:none;}}</style>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script src="macros.js"></script>
<script>var slideshow = remark.create({
"highlightStyle": "solarized-light",
"highlightLines": true,
"countIncrementalSlides": false
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {
  window.dispatchEvent(new Event('resize'));
});
(function(d) {
  var s = d.createElement("style"), r = d.querySelector(".remark-slide-scaler");
  if (!r) return;
  s.type = "text/css"; s.innerHTML = "@page {size: " + r.style.width + " " + r.style.height +"; }";
  d.head.appendChild(s);
})(document);

(function(d) {
  var el = d.getElementsByClassName("remark-slides-area");
  if (!el) return;
  var slide, slides = slideshow.getSlides(), els = el[0].children;
  for (var i = 1; i < slides.length; i++) {
    slide = slides[i];
    if (slide.properties.continued === "true" || slide.properties.count === "false") {
      els[i - 1].className += ' has-continuation';
    }
  }
  var s = d.createElement("style");
  s.type = "text/css"; s.innerHTML = "@media print { .has-continuation { display: none; } }";
  d.head.appendChild(s);
})(document);
// delete the temporary CSS (for displaying all slides initially) when the user
// starts to view slides
(function() {
  var deleted = false;
  slideshow.on('beforeShowSlide', function(slide) {
    if (deleted) return;
    var sheets = document.styleSheets, node;
    for (var i = 0; i < sheets.length; i++) {
      node = sheets[i].ownerNode;
      if (node.dataset["target"] !== "print-only") continue;
      node.parentNode.removeChild(node);
    }
    deleted = true;
  });
})();
(function() {
  "use strict"
  // Replace <script> tags in slides area to make them executable
  var scripts = document.querySelectorAll(
    '.remark-slides-area .remark-slide-container script'
  );
  if (!scripts.length) return;
  for (var i = 0; i < scripts.length; i++) {
    var s = document.createElement('script');
    var code = document.createTextNode(scripts[i].textContent);
    s.appendChild(code);
    var scriptAttrs = scripts[i].attributes;
    for (var j = 0; j < scriptAttrs.length; j++) {
      s.setAttribute(scriptAttrs[j].name, scriptAttrs[j].value);
    }
    scripts[i].parentElement.replaceChild(s, scripts[i]);
  }
})();
(function() {
  var links = document.getElementsByTagName('a');
  for (var i = 0; i < links.length; i++) {
    if (/^(https?:)?\/\//.test(links[i].getAttribute('href'))) {
      links[i].target = '_blank';
    }
  }
})();
// adds .remark-code-has-line-highlighted class to <pre> parent elements
// of code chunks containing highlighted lines with class .remark-code-line-highlighted
(function(d) {
  const hlines = d.querySelectorAll('.remark-code-line-highlighted');
  const preParents = [];
  const findPreParent = function(line, p = 0) {
    if (p > 1) return null; // traverse up no further than grandparent
    const el = line.parentElement;
    return el.tagName === "PRE" ? el : findPreParent(el, ++p);
  };

  for (let line of hlines) {
    let pre = findPreParent(line);
    if (pre && !preParents.includes(pre)) preParents.push(pre);
  }
  preParents.forEach(p => p.classList.add("remark-code-has-line-highlighted"));
})(document);</script>

<script>
slideshow._releaseMath = function(el) {
  var i, text, code, codes = el.getElementsByTagName('code');
  for (i = 0; i < codes.length;) {
    code = codes[i];
    if (code.parentNode.tagName !== 'PRE' && code.childElementCount === 0) {
      text = code.textContent;
      if (/^\\\((.|\s)+\\\)$/.test(text) || /^\\\[(.|\s)+\\\]$/.test(text) ||
          /^\$\$(.|\s)+\$\$$/.test(text) ||
          /^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
        code.outerHTML = code.innerHTML;  // remove <code></code>
        continue;
      }
    }
    i++;
  }
};
slideshow._releaseMath(document);
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
  var script = document.createElement('script');
  script.type = 'text/javascript';
  script.src  = 'https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML';
  if (location.protocol !== 'file:' && /^https?:/.test(script.src))
    script.src  = script.src.replace(/^https?:/, '');
  document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
  </body>
</html>