-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmcmc_dom.html
592 lines (451 loc) · 16 KB
/
mcmc_dom.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<title>MCMC</title>
<meta charset="utf-8" />
<meta name="author" content="Dominique Gravel" />
<script src="assets/header-attrs-2.8/header-attrs.js"></script>
<link href="assets/remark-css-0.0.1/default.css" rel="stylesheet" />
<link href="assets/remark-css-0.0.1/hygge.css" rel="stylesheet" />
<link rel="stylesheet" href="assets/ecl707.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: title-slide, middle
<style type="text/css">
.title-slide {
background-image: url('assets/img/bg.jpg');
background-color: #23373B;
background-size: contain;
border: 0px;
background-position: 600px 0;
line-height: 1;
}
</style>
# Sampling the posterior with MCMC
<hr width="65%" align="left" size="0.3" color="orange"></hr>
## Rejection sampling, Metropolis Hastings etc.
<hr width="65%" align="left" size="0.3" color="orange" style="margin-bottom:40px;" alt="@Martin Sanchez"></hr>
.instructors[
**ECL707/807** - Dominique Gravel
]
<img src="assets/img/logo.png" width="25%" style="margin-top:20px;"></img>
---
# Objective
The objective of MCMC algorithm is to simulate parameters from a complex posterior distribution
---
# Outline
- Understand the principle of rejection sampling
- Get an intuition of a multivariate posterior distribution
- Introduction to Monte Carlo Markov Chain
- Run a simple example of Gibbs sampling
---
# Bayes theorem
$$ P(\theta|D,H) = \frac{P(D|\theta,H)P(\theta,H)}{P(D|H)} $$
$$ posterior = \frac{likelihood \times prior}{evidence} $$
The problem : how can we compute the posterior distribution without knowledge of the denominator ?
---
# Dealing with the normalization constant
1. Use the proportional form :
$$ P[\theta \mid X] \propto P[X \mid \theta]P[\theta] $$
2. Use conjugacy
3. Sample from the posterior
---
# Rejection sampling
1. Use an easily sampled distribution (the candidate distribution `\(c(x)\)` )
2. Evaluate the probability of the candidate according to a target distribution `\(t(x)\)`
3. Reject candidates with a probability proportional to the difference between the two distributions
---
# Rejection sampling
**Pseudo-code**
```
DEFINE candidate distribution c(x)
DEFINE target distribution t(x)
DEFINE constant M such that M*c(x) >= t(x) for all x
```
---
# Rejection sampling
Acceptance probability :
$$p = \frac{t(x)}{Mc(x)} $$
---
# Rejection sampling
**Pseudo-code**
```
REPEAT
DRAW sample X from c(x)
COMPUTE acceptance probability p = t(X)/Mc(X)
DRAW random value rand from uniform on [0,1]
IF rand < p
accept x
ELSE
reject X
UNTIL X is accepted
```
---
# Rejection sampling
**Exercise 1**
Write a function that uses rejection sampling to return `\(n\)` random normal deviates using only a uniform random number generator as a source of randomness.
Note : you can still use *dnorm* to evaluate your target distribution.
---
# Rejection sampling
```r
rej_norm <- function(n, mu, sig) {
# Define the target
target <- function(x, mu, sig) dnorm(x, mu, sig)
# Define the candidate
cand <- function(x, sig) dunif(x,-2*sig,2*sig)
# Constant
M <- 5
# Object to store the n results
result <- numeric(n)
# Main loop
for(i in 1:n) {
accept <- FALSE
while(!accept) {
X <- runif(1,-6,6)
p <- target(X,mu,sig) / (M * cand(X))
rand <- runif(1)
accept <- (rand < p)
} #endwhile
result[i] <- X
} #endfor
return(result) } #endfunction
```
---
# Metropolis-Hastings algorithm
- A very general technique to explore the parameter space
- Simulated annealing is a special case of MH
- Unlike rejection sampling, each iteration generates a sample from the target distribution
- Samples are dependent (autocorrelated), so effective sample size is much smaller than chain length
- For a Markov Chain `\(X\)`, draw `\(X_t\)` from candidate distribution `\(j(x|X_{t-1})\)` and compare to target distribution `\(f(x)\)`
- Better propositions are not deterministically accepted
- The posterior probability is stationary after a *burnin* period
---
# Acceptance probability
Full definition is :
`$$p = \frac{t(X_{t})c(X_{t-1}|X{t})}{t(X_{t-1})c(X_{t}|X_{t-1})}$$`
Where `\(X_t\)` is the current candidate value and `\(X_{t-1}\)` is the previous value.
Note that if `\(t(x)\)` is a posterior distribution, the normalization constant cancels.
---
# Metropolis-Hastings algorithm
```r
# metropolis <- function(N = 5000, X0 = 0, A) {
# N: number of MCMC iterations
# X0: starting value of the chain
# A: step parameter
# Object to store results
chain <- numeric(N + 1)
chain[1] <- X0
#}
```
---
# Metropolis-Hastings algorithm
```r
# DEFINE candidate distribution
cand_fn <- function(y, A) runif(1, y-A, y+A)
# DEFINE target distribution
target_fn <- dnorm(x, 0, 1)
# DEFINE acceptance probability function
accept_fn <- function(current.value, previous.value)
target_fn(current.value)/target_fn(previous.value)
```
---
# Acceptance probability
- What to use for `\(c(x|y)\)`?
- Very flexible, but must allow for positive recurrence of the chain
- An example is the uniform distribution centered on the previous value
- A nice property of the uniform is that `\(c(x \mid y) = c(y \mid x)\)`, so:
`$$p = \frac{f(X_{t})}{f(X_{t-1})}$$`
---
# Metropolis-Hastings algorithm
```r
for(step in 2:(N+1)) {
# PROPOSE candidate
current.value <- cand_fn(chain[step -1], A)
# COMPUTE acceptance probability
p <- accept_fn(current.value, chain[step-1])
# DO rejection sampling
rand <- runif(1)
if(rand <- p)
chain[t] = current.value
else
chain[t] = chain[t-1]
}
```
---
# Chick survival
The Fake Petrel ( _Hydrobates inventus_ ) lays precisely 6 eggs. You have data on survivorship from 10 different nests:
```r
set.seed(1859)
x_obs <- rbinom(10, 6, 0.6)
x_obs
```
```
## [1] 4 4 5 4 4 4 5 4 4 6
```
Our model is:
$$
`\begin{align}
\text{hatched}_{i} &\sim \text{Binomial}(\theta, 6) \\
\theta &\sim \text{Uniform}(0,1)
\end{align}`
$$
---
# Exercise 2.
- Copy the MH code and adapt it to evaluate the parameter of the petrel example (binomial model)
- Plot the chain to observe convergence
- Change initial values to compare
---
# Gibbs sampling
- Gibbs Sampling is generalization of MCMC for _joint distributions_
- Objective: Sample from `\([\theta|X]\)` where `\(X\)` is data and :
`$$\theta = [\theta_1, \theta_2, \theta_3, \ldots, \theta_k]$$`
- Very flexible: define samplers for each `\(\theta_i\)`
- Formally, we use Metropolis-within-Gibbs scheme, where Metropolis-Hastings is the sampler when no easier sampler can be identified
---
# Simple bivariate example
- We will consider data `\(X = \{21.4, 17.64, 18.31, 15.12, 14.40, 15, 19.59, 15.06, 15.71, 14.65\}\)`
- Objective : estimate posterior mean `\(\mu\)` and standard deviation `\(\sigma\)` given prior estimates :
+ `\([\mu] = \mathrm{N}(14.5, 0.6)\)` with parameters `\((\mu_0, \sigma_0)\)`
+ `\([\tau] = \left [ \frac{1}{\sigma^2} \right ] = \Gamma \left ( 1.08367, 3.068 \right )\)` with parameters `\((\alpha_{\sigma}, \beta_{\sigma})\)`
---
# Proportional form
`$$[{\theta} \mid {X} ] \propto [ {X} \mid {\theta} ] [ {\theta}]$$`
Where :
`$$[ {X} \mid {\theta} ] = \prod_{i = 1}^{n} \frac{1}{\hat{\sigma}\sqrt{2\pi}} e^{-\frac{\left (X_i - \hat{\mu}\right )^{2}}{2\hat{\sigma}^{2}}} = \prod_{i = 1}^{n} \mathrm{dnorm}(X_i, \hat{\mu}, \hat{\sigma})$$`
And :
.small[
`$$[{\theta}] = [\hat{\mu} \mid \mu_0, \sigma_0][\hat{\sigma} \mid \alpha, \beta] = \mathrm{dnorm}(\hat{\mu}, \mu_0, \sigma_0) \times \\
\mathrm{dgamma} \left (\hat{\tau} = \frac{1}{\hat{\sigma}^2}, \alpha, \beta \right )$$`
]
---
# Sampler
```r
mh.sampler < – function(previous, A, target, prior, other.params,
data, cand_fn = function(x, A) runif(1, x–A, x+A)) {
# propose and select candidate values using metropolis–hastings algorith
# cand_fn: the function used to draw samples
# previous: the previous state in the chain
# A: the tuning parameter
# target : function(x) returning the target density at X
candidate.value < – cand_fn(previous, A)
p < – mh.acceptance(candidate.value, previous, target, prior,
other.params, data)
U < – runif(1)
if (U < p) {
result < – candidate.value
} else {s
result < – previous
}
return(result)
}
```
---
# Conditional mu
```r
mu.conditional < – function(mu, mu.prior, sigma, data) {
# log-likelihood
lik < – exp(sum(dnorm(data, mu, sigma, log=TRUE)))
# log prior
prior <- exp(– (mu – mu.prior[1])ˆ2 / (2 ∗ mu.prior[2]ˆ2))
# Return the conditional mu
return(lik*prior)
}
```
---
# Conditional sigma
```r
sigma.conditional < – function(sigma, tau.prior, mu, data) {
# Compute the posterior
# Make sure sigma is positive
if (sigma < = 0) {
return(–Inf)
} else {
# Likelihood
lik < – exp(sum(dnorm(data, mu, sigma, log = TRUE)))
# Prior
prior <- exp((tau.prior[1]ˆ2 – 1) ∗ log(tau(sigma)) –
tau(sigma) ∗ tau.prior[2])
}
# Exponentiate back
return(lik*prior)
}
```
---
# Acceptance
```r
mh.acceptance < – function(current, previous, target, prior, other.params, data) {
# target : function returning target density at parameter x
num < – target(current, prior, other.params, data)
denom < – target(previous, prior, other.params, data)
if (denom == 0) return(-1)
else return(-num/denom)
}
```
---
# Run the whole thing
```r
N < – 5000 # number of iterations of the mcmc
starting < – c(10,4) # starting values for mu and sigma
A < – c(1.4, 0.8) # tuning parameters for the candidate function
X < – c(21.4, 17.64, 18.31, 15.12, 14.40, 15, 19.59, 15.06, 15.71, 14.65) # data set
mu.prior < – c(14.5, 0.6) # prior parameters for the mean (normal distribution)
tau.prior < – c(1.28367, 3.068) # prior parameters for sigma (gamma distribution)
## Small function to get a parameter of the gamma
tau < – function(sigma) 1/(sigmaˆ2)
# Object to store the results
chain < – matrix(nrow=N+1, ncol=2)
chain [1,] < – starting
# Run the chain
for(t in 2:( N+1)) {
# Sample for mu, fixing sigma
chain[t ,1] < – mh.sampler(previous = chain[t–1,1], A = A[1],
target = mu.conditional,
prior = mu.prior,
other.params = chain[t–1,2], data = X)
# Sample for sigma, fixing mu
chain[t ,2] < – mh.sampler(previous = chain[t–1,2], A = A[2],
target = sigma.conditional,
prior = tau.prior,
other.params = chain[t –1,1], data = X)
}
```
---
# Exercise
That was a big chunk of code and we are at day 9 ....
Let's try to implement this. Run the code and check the following :
- run multiple runs and overlay the series
- change the initial values and see how it behaves. Start far from the good values
- Adjust the parameter `\(A\)` to see how it affects convergence time
- Plot the histogram of the values
- Check the relationship between mu and sigma
---
# Exercise 2
We will try to adapt the code to the hemlock example.
- Use your simulated data so that you know precisely what are the true parameters.
- Specify a an informative prior that is not too far from the truth, with a reasonnable dispersion around the expected value.
- Revisit the **conditional** functions for each parameter theta
- Adapt the chain to account for the extra parameter
- Run the chain and plot the histogram of the different parameters
- Compare to the parameters used to simulate data
</textarea>
<style data-target="print-only">@media screen {.remark-slide-container{display:block;}.remark-slide-scaler{box-shadow:none;}}</style>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script src="macros.js"></script>
<script>var slideshow = remark.create({
"highlightStyle": "solarized-light",
"highlightLines": true,
"countIncrementalSlides": false
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {
window.dispatchEvent(new Event('resize'));
});
(function(d) {
var s = d.createElement("style"), r = d.querySelector(".remark-slide-scaler");
if (!r) return;
s.type = "text/css"; s.innerHTML = "@page {size: " + r.style.width + " " + r.style.height +"; }";
d.head.appendChild(s);
})(document);
(function(d) {
var el = d.getElementsByClassName("remark-slides-area");
if (!el) return;
var slide, slides = slideshow.getSlides(), els = el[0].children;
for (var i = 1; i < slides.length; i++) {
slide = slides[i];
if (slide.properties.continued === "true" || slide.properties.count === "false") {
els[i - 1].className += ' has-continuation';
}
}
var s = d.createElement("style");
s.type = "text/css"; s.innerHTML = "@media print { .has-continuation { display: none; } }";
d.head.appendChild(s);
})(document);
// delete the temporary CSS (for displaying all slides initially) when the user
// starts to view slides
(function() {
var deleted = false;
slideshow.on('beforeShowSlide', function(slide) {
if (deleted) return;
var sheets = document.styleSheets, node;
for (var i = 0; i < sheets.length; i++) {
node = sheets[i].ownerNode;
if (node.dataset["target"] !== "print-only") continue;
node.parentNode.removeChild(node);
}
deleted = true;
});
})();
(function() {
"use strict"
// Replace <script> tags in slides area to make them executable
var scripts = document.querySelectorAll(
'.remark-slides-area .remark-slide-container script'
);
if (!scripts.length) return;
for (var i = 0; i < scripts.length; i++) {
var s = document.createElement('script');
var code = document.createTextNode(scripts[i].textContent);
s.appendChild(code);
var scriptAttrs = scripts[i].attributes;
for (var j = 0; j < scriptAttrs.length; j++) {
s.setAttribute(scriptAttrs[j].name, scriptAttrs[j].value);
}
scripts[i].parentElement.replaceChild(s, scripts[i]);
}
})();
(function() {
var links = document.getElementsByTagName('a');
for (var i = 0; i < links.length; i++) {
if (/^(https?:)?\/\//.test(links[i].getAttribute('href'))) {
links[i].target = '_blank';
}
}
})();
// adds .remark-code-has-line-highlighted class to <pre> parent elements
// of code chunks containing highlighted lines with class .remark-code-line-highlighted
(function(d) {
const hlines = d.querySelectorAll('.remark-code-line-highlighted');
const preParents = [];
const findPreParent = function(line, p = 0) {
if (p > 1) return null; // traverse up no further than grandparent
const el = line.parentElement;
return el.tagName === "PRE" ? el : findPreParent(el, ++p);
};
for (let line of hlines) {
let pre = findPreParent(line);
if (pre && !preParents.includes(pre)) preParents.push(pre);
}
preParents.forEach(p => p.classList.add("remark-code-has-line-highlighted"));
})(document);</script>
<script>
slideshow._releaseMath = function(el) {
var i, text, code, codes = el.getElementsByTagName('code');
for (i = 0; i < codes.length;) {
code = codes[i];
if (code.parentNode.tagName !== 'PRE' && code.childElementCount === 0) {
text = code.textContent;
if (/^\\\((.|\s)+\\\)$/.test(text) || /^\\\[(.|\s)+\\\]$/.test(text) ||
/^\$\$(.|\s)+\$\$$/.test(text) ||
/^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
code.outerHTML = code.innerHTML; // remove <code></code>
continue;
}
}
i++;
}
};
slideshow._releaseMath(document);
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML';
if (location.protocol !== 'file:' && /^https?:/.test(script.src))
script.src = script.src.replace(/^https?:/, '');
document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
</body>
</html>