-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path03_rules.html
383 lines (268 loc) · 16 KB
/
03_rules.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<title>Probabilities. Basic probability theory</title>
<meta charset="utf-8" />
<meta name="author" content="Dominique Gravel" />
<script src="assets/header-attrs-2.8/header-attrs.js"></script>
<link href="assets/remark-css-0.0.1/default.css" rel="stylesheet" />
<link href="assets/remark-css-0.0.1/hygge.css" rel="stylesheet" />
<link rel="stylesheet" href="assets/ecl707.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: title-slide, middle
<style type="text/css">
.title-slide {
background-image: url('assets/img/bg.jpg');
background-color: #23373B;
background-size: contain;
border: 0px;
background-position: 600px 0;
line-height: 1;
}
</style>
# Day 1. Probabilities.
<hr width="65%" align="left" size="0.3" color="orange"></hr>
## Basic probability theory
<hr width="65%" align="left" size="0.3" color="orange" style="margin-bottom:40px;" alt="@Martin Sanchez"></hr>
.instructors[
**ECL707/807** - Dominique Gravel
]
<img src="assets/img/logo.png" width="25%" style="margin-top:20px;"></img>
---
# Resources
MacKay, David J.C. 2003. Information thoery, Inference, and learning algorithms. Chapter 2. Cambridge University Press. Cambridge.
Bolker, Benjamin M. 2009. Ecological Models and Data in R. Chapter 4. Princeton University Press. Princeton.
---
# Rule 1.
*If two events are mtually exclusive (e.g. survival or death during a time interval), then the probability that either occurs (the probability of `\(A\)` or `\(B\)`, or `\(P(A \bigcup B)\)`) is the sum of their individual probabilities : `\(P(A \bigcup B) = P(A) + P(B)\)`.*
**Example** : the probability of picking a value lower or equal to three with a dice is `\(P(x \leq 3) = P(x=1) + P(x=2) + P(x = 3)\)`
---
# Rule 2.
*If two events `\(A\)` and `\(B\)` are not mutually exclusive -- the joint probability that they occur together, `\(P(A \bigcap B)\)`, is greater than zero -- then we have to correct the rule for combining probabilities to account for double counting:*
`\(P(A \bigcup B) = P(A) + P(B) - P(A \bigcap B)\)`
**Example** : If we are tabulating the color and sex of animals, `\(P(blue \bigcup male) = P(blue) + P(male) - P(blue \bigcap male)\)`.
---
# Rule 3.
*The probabilities of all possible outcomes of an observation or experiment add to 1.*
**Example** : Leslie matrices are used to represent the probability of changing from one ecological state (e.g. size class) to another over a time interval. The row sums usually do not sum to one, the missing piece is the mortality.
---
# Rule 4.
*The conditional probability of `\(A\)` given `\(B\)`, `\(P(A|B)\)`, is the probability that `\(A\)` happens if we know that `\(B\)` happens. The conditional probability equals :*
`\(P(A|B) = P(A \bigcap B)/P(B)\)`
**Example** : The mortality probability of an individual in a structured population is often dependent on size or age. The estimated probability of a randomly picked individual will be more precise if we have information about age (conditional probability) than if we do not (marginal probability).
---
# Rule 5.
*If the conditional probability of `\(A\)` given `\(B\)`, `\(P(A|B)\)`, equals the unconditional probability of `\(A\)`, `\(P(A)\)`, then `\(A\)` is independent of `\(B\)`. Knowing about `\(B\)` provides no information about the probability of `\(A\)`. Independence implies that:*
`\(P(A \bigcap B) = P(A)P(B)\)`
**Exemple** : Co-occurrence analysis consists of comparing the observed joint probability of occurrence of a pair of species to the expectation where species are independently distributed. A spatial association is detected when `\(P(A,B)\)` differs significantly from `\(P(A)P(B)\)`.
---
# Exercise 1.3.
Five trees of a focal species located around a seed trap at various distances. You have a model telling you that the probability a seed falling from these trees getting into the trap is `\(P = \{0.01, 0.2, 0.17, 0.24, 0.06\}\)`.
What is the probability Jonathan measure at least one seed in the trap ? What is the probability there is not a single seed ?
---
# Exercise 1.4.
Jaccard similarity index is one of the most widely used metrics to compute the similarity between two communities. It is based on the occurrence of different types of joint events :
`\(J(A,B) = \frac{N_{11}}{N_{01}+ N_{10} + N_{11}}\)`
where `\(N_{11}\)` is the number of species present at both locations, `\(N_{01}\)` is the number of species absent at first location but present at second and `\(N_{10}\)` is the inverse.
You are interested by the expected Jaccard similarity because you want to know if it differs from a random distribution. You therefore need to compute the expected Jaccard index, knowing that regional prevalences of species `\(A\)`, `\(B\)` and `\(C\)` are `\(\{0.22, 0.15, 0.37\}\)` respectively. What is the expected value ?
---
# Exercise 1.4. (advanced)
Now suppose that you found a study revealing that dispersal limitations generates spatial autocorrelation such that the conditional occurrence at location `\(x\)` depends on the occurrence at location `\(y\)` and the distance `\(D_{xy}\)` between two locations, following the relationship
`\(P(x|y, D_{xy}) = (1-P(x))*exp(-\alpha* D_{xy}) + P(x)\)`.
This function is also reversed for the conditional probability of no occurrence.
Plot the expected relationship between Jaccard similarity index and distance between the locations for different `\(\alpha\)` coefficients.
---
# Exercise 1.5.
## Uncertainty in species distribution models
Guillaume is interested by the distribution of a rare bird species and potential threats of a changing climate. He used presence-absence data to fit a GLM for binomial data with temperature as predictor. The model returns an occurrence probability `\(p_x\)` for every location `\(x\)`,. The variance of occurrence equals to number of trials (observations) times `\(p*(1-p)\)`. In other words, the variance is maximal at 0.5 and it shrinks as the occurrence probability either tends to `\(0\)` or `\(1\)`.
Guillaume thought his model was pretty crappy because the largest occurrence probability across the range was `\(0.2\)`. Paradoxically, this means the uncertainty in the observation of the species is largest at locations where the species is known to be the most likely to occur. As many ecologists tend to do, he hypothesized that he had not enough information and went out to double his sample size. His results were still the same.
---
# Exercise 1.5.
## Uncertainty in species distribution models
Guillaume is confused because treating the distribution as a stochastic phenomenon got him to think about several questions :
- Why the species is not 100% sure to occur at its optimal location ?
- What are the sources of uncertainty in the model ?
- Can we reduce this uncertainty ?
---
# Exercise 1.5.
## Follow up.
Guillaume consulted a friend ornitologist who knows a lot about that species. He was not surprised by the results since that particular bird species requires a very specific type of habitat to occur. Guillaume therefore worked hard to collect information about the distribution of the habitat and managed to re-run the model with that additional information.
Results were very promising, the occurrence probability went up to `\(0.8\)` when the habitat was favourable and as low as `\(0.05\)` for unfavourable habitats, even at optimal climatic condition. That's a significant reduction in the uncertainty, now the model is much better.
With these number in hands, can you guess what is the relative proportion of this habitat in the landscape, at optimal climatic conditions ?
---
# Exercise 1.5.
## Follow up.
Guillaume now wants to run climate change scenarios to see where the species could be distributed in the future. The problem he faces is that he does not have scenarios for the potential distribution of this particular habitat feature. He therefore decided to keep this proportion of habitats constant for the future. What are the impacts of that decision on the uncertainty in the forecasts?
---
# Discussion. Is nature fundamentally noisy ?
Clark (2007, TREE) makes the following argument :
*Consider a word model like this:*
`\(response = f(covariates, parameters) + error\)`
*One way to view progress in science is what occurs when variation moves from the second term (unknown) to the first term (known). As information accumulates, we can incorporate more process in the first term.*
In the limit, this reasoning implies that there is no fundamental stochasticity in nature, only deterministic responses to random conditions. Ecologists are constantly fighting stochasticity to find signal in very noisy data. Is it only ignorance or part of nature is fundamentally stochastic ?
---
# Bayes rule
The Bayes rule in its most basic form could be derived from the above rules of probability:
`\(P(A|B) = \frac{P(B|A)P(A)}{P(B)}\)`
As we'll see, it's a very handy equation for bunch of problems. It is obviously fundamental to bayesian statistics, but its application goes beyond and one must be able to play with it before jumping to more sophisticated models.
---
# A classic (and so contemporary) example
Suppose a very hypothetical situation where a nasty disease hits the population and every individual is required to perform a test each time symptoms appear .... The disease is still rare, with approximately 1% of the population currently infected. Scientists worked to develop a reliable PCR test but there are always problems with contaminations leading to false positives. `\(5\)` out of `\(100\)` who do not have the disease will test positive anyway.
Guillaume was caughing yesterdy and felt pretty bad so he went out to perform a test, which ended up positive. What is the probability that he has the disease ?
---
# Solution
## First, define variables properly
`\(a = 1\)` : Guillaume has the disease
`\(a = 0\)` : Guillaume does not have the disease
`\(b = 1\)` : The test is positive
`\(b = 0\)` : The test is negative
---
# Solution
## Second, define probabilities
What is a false positive ?
---
# Solution
## Second, define probabilities
`\(P(b=1|a=1) = 0.95\)` : True positive
`\(P(b=1|a=0) = 0.05\)` : False positive
`\(P(a=1) = 0.01\)` : Disease prevalence
We are looking for the probability that Guillaume has the disease and it has been properly detected, `\(P(a=1|b=1)\)`
---
# Solution
## Use Bayes theorem
`\(P(a=1|b=1) = \frac{P(b=1|a=1)P(a=1)}{P(b=1|a=1)P(a=1)+P(b=1|a=0)P(a=0)}\)`
`\(P(a=1|b=1) = \frac{0.95 X 0.01}{0.95 X 0.01 + 0.05 X 0.99}\)`
`\(P(a=1|b=1) = 0.1\)`
---
# Exercise 1.6.
You are interested to know if a certain parasite species `\(P\)` can infect an host species `\(H\)`. You have observed the interactions among several species at different locations and across the sample, you observed that both the species `\(P\)` and `\(H\)` co-occur at `\(5\)` different locations. However, you never observed them interacting with each other. You looked at all of the data for this parasite and you realized it is a fairly generalist species since it interacts with roughly with `\(20\)` percent of all other possible hosts in the community.
a) What is the probability of the data (ie no interaction) if the probability of a link is `\(0.2\)` ?
b) What is the probability of the observation (no interaction) because the species do not interact ?
c) What is the probability there is a link given the observation ?
---
# Exercise 1.6.
## Hint
The function *dbinom(x, size, prob)* returns you the probability of observing `\(x\)` events for `\(size\)` bernouilli trials with associated probability `\(prob\)`.
</textarea>
<style data-target="print-only">@media screen {.remark-slide-container{display:block;}.remark-slide-scaler{box-shadow:none;}}</style>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script src="macros.js"></script>
<script>var slideshow = remark.create({
"highlightStyle": "monokai",
"highlightLines": true,
"countIncrementalSlides": false
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {
window.dispatchEvent(new Event('resize'));
});
(function(d) {
var s = d.createElement("style"), r = d.querySelector(".remark-slide-scaler");
if (!r) return;
s.type = "text/css"; s.innerHTML = "@page {size: " + r.style.width + " " + r.style.height +"; }";
d.head.appendChild(s);
})(document);
(function(d) {
var el = d.getElementsByClassName("remark-slides-area");
if (!el) return;
var slide, slides = slideshow.getSlides(), els = el[0].children;
for (var i = 1; i < slides.length; i++) {
slide = slides[i];
if (slide.properties.continued === "true" || slide.properties.count === "false") {
els[i - 1].className += ' has-continuation';
}
}
var s = d.createElement("style");
s.type = "text/css"; s.innerHTML = "@media print { .has-continuation { display: none; } }";
d.head.appendChild(s);
})(document);
// delete the temporary CSS (for displaying all slides initially) when the user
// starts to view slides
(function() {
var deleted = false;
slideshow.on('beforeShowSlide', function(slide) {
if (deleted) return;
var sheets = document.styleSheets, node;
for (var i = 0; i < sheets.length; i++) {
node = sheets[i].ownerNode;
if (node.dataset["target"] !== "print-only") continue;
node.parentNode.removeChild(node);
}
deleted = true;
});
})();
(function() {
"use strict"
// Replace <script> tags in slides area to make them executable
var scripts = document.querySelectorAll(
'.remark-slides-area .remark-slide-container script'
);
if (!scripts.length) return;
for (var i = 0; i < scripts.length; i++) {
var s = document.createElement('script');
var code = document.createTextNode(scripts[i].textContent);
s.appendChild(code);
var scriptAttrs = scripts[i].attributes;
for (var j = 0; j < scriptAttrs.length; j++) {
s.setAttribute(scriptAttrs[j].name, scriptAttrs[j].value);
}
scripts[i].parentElement.replaceChild(s, scripts[i]);
}
})();
(function() {
var links = document.getElementsByTagName('a');
for (var i = 0; i < links.length; i++) {
if (/^(https?:)?\/\//.test(links[i].getAttribute('href'))) {
links[i].target = '_blank';
}
}
})();
// adds .remark-code-has-line-highlighted class to <pre> parent elements
// of code chunks containing highlighted lines with class .remark-code-line-highlighted
(function(d) {
const hlines = d.querySelectorAll('.remark-code-line-highlighted');
const preParents = [];
const findPreParent = function(line, p = 0) {
if (p > 1) return null; // traverse up no further than grandparent
const el = line.parentElement;
return el.tagName === "PRE" ? el : findPreParent(el, ++p);
};
for (let line of hlines) {
let pre = findPreParent(line);
if (pre && !preParents.includes(pre)) preParents.push(pre);
}
preParents.forEach(p => p.classList.add("remark-code-has-line-highlighted"));
})(document);</script>
<script>
slideshow._releaseMath = function(el) {
var i, text, code, codes = el.getElementsByTagName('code');
for (i = 0; i < codes.length;) {
code = codes[i];
if (code.parentNode.tagName !== 'PRE' && code.childElementCount === 0) {
text = code.textContent;
if (/^\\\((.|\s)+\\\)$/.test(text) || /^\\\[(.|\s)+\\\]$/.test(text) ||
/^\$\$(.|\s)+\$\$$/.test(text) ||
/^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
code.outerHTML = code.innerHTML; // remove <code></code>
continue;
}
}
i++;
}
};
slideshow._releaseMath(document);
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML';
if (location.protocol !== 'file:' && /^https?:/.test(script.src))
script.src = script.src.replace(/^https?:/, '');
document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
</body>
</html>