generated from UofUEpiBio/advanced-programming-midterm
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathslides.qmd
242 lines (183 loc) · 6.51 KB
/
slides.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
---
title: Meta-Analysis for the Binomial Model with Normal Random Effects
subtitle: Heirarchical and Empirical Bayesian Approaches
author: Sophie Huebler
format: revealjs
embed-resources: true
editor:
markdown:
wrap: 72
---
# Section 1: Introduction to the Problem
## The Goal
1) Create a computationally efficient implementation of the
gibbs-metropolis-hastings hybrid algorithm for a heirarchical
bayesian approach to meta analysis.
2) Create a computationally efficient function to compute mariginal
likelihood and optimize it to empirically estimate parameters for
random effects for an empirical bayesian approach to meta analysis.
## Meta-Analysis {.scrollable .smaller}
We will use a classic meta-analysis case to motivate this problem. Our
treatment effect of interest is the odds ratio of an event occurring
between treatment and control groups, and there are 7 studies which have
estimated this effect by recording the number of events and sample size
in a treatment sample and a control sample.
```{r}
#| label: some-code
library(MASS)
library(epiworldR)
library(tidyverse)
library(metafor)
```
```{r}
source("data/read_data.R")
```
```{r}
dat |>
dplyr::select(trial:nic)|>
kableExtra::kbl(digits = 2,
caption = "Data",
col.names = c("Trial",
"R_it", "N_it",
"R_ic", "N_ic"))|>
kableExtra::kable_classic_2(html_font = "Cambria")
```
## Meta-Analysis {.scrollable .smaller}
Each study therefore has a log odds ratio $\theta_i$ which estimates the
study specific treatment effect, and a standard error for that estimate.
We assume that due to differing study protocol and random chance, each
study is estimating a log odds ratio that is drawn from a normal
distribution with center $\mu$ and variance $\tau^2$. We will call this
the random effect population prior $\pi(\theta_i | \mu, \tau)$.
```{r}
dat |>
dplyr::select(trial:v_i)|>
kableExtra::kbl(digits = 2,
caption = "Data",
col.names = c("Trial",
"R_it", "N_it",
"R_ic", "N_ic",
"log(OR)", "Var(log(OR))"))|>
kableExtra::kable_classic_2(html_font = "Cambria")
```
## The Model
$$
\begin{aligned}
&i \in i,…,k \hspace{0.5cm} and \hspace{0.5cm} j \in \{C,T\}\\
& Y_{ij}| \pi_{ij},n_{ij} \sim Bin(n_{ij}, \pi_{ij})\\
&logit \pi_i^C | \gamma_i, \theta_i = \gamma_i - \theta_i/2\\
&logit \pi_i^T | \gamma_i, \theta_i = \gamma_i + \theta_i/2\\
& logit(\frac{\pi_{iT}}{\pi_{iC}}) = \theta_i | \mu, \tau^2 \sim N(\mu, \tau^2)\\
& \mu \sim N(a_1, b_1 )\\
& 1/\tau^2\sim gamma(a_2, b_2)\end{aligned}
$$
## Heirarichal Estimation
From the model, we can easily write out the joint distribution and the
full conditionals for $\mu$, $\tau^2$, $\bf{\theta}$, and $\bf{\gamma}$
(I leave this as an exercise to the listener). This allows us to use the
gibbs sampling procedure to iteratively update posteriors. It is however
important to note that the thetas and gammas do not have closed form
full conditionals. Therefore we must also incorporate a metropolis
hastings component wise updating procedure.
## Empirical Estimation
Instead of iteratively updating the $\mu$ and $\tau^2$ variables, we can
estimate them from the marginal likelihood using an MLE approach after
integrating out all other parameters, thus reducing the problem to a
single level model.
# Section 2: Solution Plan
## Heirarchical Bayesian Approach
- Full conditionals:
- Write in both R and c++
- Wrapper for the full algorithm:
- Rewrite with data table so that storage and manipulation is
easier
- Pre-allocate memory
- Use microbenchmark to comapre efficiency to original code
## Empirical Bayesian Approach
- Marignal likelihood
- Use parallel programming for the R implementation that currently
simulates average likelihood of the marginal likelihood
- Write the full likelihood function in C++ and then use GNU
scientific library for numeric integration
# Section 3: Preliminary Results
## Heirarchical Bayesian {.scrollable .smaller}
Currently all posterior functions are written in modularized format in R
and wrapped in a function that stores the results as a data table.
- Old function: 2 for loops, 4 lapply statements
- New function: 1 for loop
```{r}
source("Functions/mh_gibbs_original.R")
source("Functions/set_hyperparameters.R")
source("Functions/post_mu.R")
source("Functions/post_tau2.R")
source("Functions/post_gamma.R")
source("Functions/post_theta.R")
source("Functions/mh.R")
source("Functions/gibbs.R")
source("Functions/mh_gibbs.R")
```
```{r}
hyperparams <- set_hyperparameters()
```
```{r}
benched <- bench::mark(
mh_gibbs_original(),
mh_gibbs(),
check = FALSE,
relative = TRUE,
iterations = 1)
benched <- benched[,2:9] %>% as.matrix()
rownames(benched) <- c("original", "updated")
benched
```
## Marginal Likelihood {.scrollable .smaller}
Original code:
```{r, echo = T}
mlik_func <- function(mu,
tau2,
Y,
#simulate
sim = 1000
){
#Set up
#number of studies
k = nrow(Y)
#initialize
lik <- 0
#test <- 0
#Outer loop !!! This is where parallelization can be implemented
for(j in 2:sim){
#Simulate theta and gamma
#based on mu and tau2 to optimize and take average lik
theta <- rnorm(k, mean = mu, sd = sqrt(tau2))
gamma <- rnorm(k, mean = 0, sd = 10)
#Calculate odds (expit scale)
eta_C <- gamma - theta/2
eta_T <- gamma + theta/2
#Transform odds to probabilities (p scale)
p_C <- 1/(1+exp(-eta_C))
p_T <- 1/(1+exp(-eta_T))
#Construct priors (on theta and gamma only)
#Use log scale for easier computation so
#addition instead of multiplication for
#likelihood and priors
pi_theta <- dnorm(theta, mean = mu, sd = sqrt(tau2), log = T)
pi_gamma <- dnorm(gamma, mean = 0, sd = 10, log = T)
#Construct likelihood for both control
#and treatments, because the likelihood
#depends on the data (do not construct
#likelihood for the parameters being estimated
#or integrated out)
#Use probabilities found above
lik_C <- dbinom(Y[,"ric"], Y[,"nic"], p_C, log = T)
lik_T <- dbinom(Y[,"rit"], Y[,"nit"], p_T, log = T)
#Need to tranform back to exp scale
lik<- lik + exp(lik_C+lik_T+pi_theta+pi_gamma)
}#End outer loop
#Find average log likelihood
log_lik <- log(lik/sim)
#Return negative log likelihood
ret <- -sum(log_lik)
ret
}
```