-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathRevenue_Maximization_Example.Rmd
688 lines (546 loc) · 26.7 KB
/
Revenue_Maximization_Example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
---
title: "Revenue optimization"
author: "Rob Trangucci & Jonah Gabry"
date: "January 9, 2017"
output:
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rstan)
library(dplyr)
library(reshape2)
library(bayesplot)
library(ggplot2)
```
## Problem set-up
Let's say that we're a data scientist at a large media company. We are told that
we have the power to set prices for all streaming movies at whatever we want in
order to maximize revenue. How should we go about learning how to maximize
revenue for each movie? Our data might look like the number of movies sold per
day, $y_{b,t}$ at a price $p_{b,t}$.
Well, one way we could do this is to use the data we have to estimate the demand curve for the
products. Luckily, our company has exclusive rights to each of the movies, and has
engaged in random pricing trials for the past 10 months. That means that on random weeks,
movies have had their prices randomly jittered by one dollar up or down. The prices can only be
set on a grid of whole dollars.
The decision problem we want to solve is the following:
\begin{align*}
\arg\max_{p \in P} \mathbb{E}_q[q(p) \, p]
\end{align*}
Because we have complete control over price, the random variable contributing to the expectation
is quantity. That's what we'll be modeling for these movies, the quantity as a function of price.
The data is in a file called $\texttt{movie_data_tt_new_set.RDS}$. Let's load the data and see what the structure
is:
```{r load-data}
movie_data <- readRDS('data/movie_data_tt_new_set_20180717.RDS')$df
head(movie_data)
```
We've got the number of units sold per week, the date corresponding to the
Sunday of every week end, the title, the price (in US dollars), the release year
and the IMDB rating.
```{r describe-data}
N_titles <- length(unique(movie_data$title))
N_titles
```
We're working with `r N_titles` titles. We should first want to learn whether price is
related to quantity for our movies.
[filler about count data modeling and using Poisson outcome]
We'll start with a simple model and build it up slowly. First, we'll define our
likelihood to be the negative binomial probability mass function over the number
of units sold, denoted below as $\texttt{qty_sld}$.
\begin{align*}
\texttt{qty_sld}_{m,t} & \sim \text{Neg-Binomial-log}(\eta_{m,t}, \phi) \\
\eta_{m,t} & = \alpha + \beta \, p_{m,t}
\end{align*}
The negative binomial mass function we'll use is called
$\texttt{neg_binomial_2_log}(\text{ints} \, y, \text{reals} \, \eta, \text{reals} \, \phi)$ in Stan.
This is negative binomial mass function that is parameterized in terms of its
log-mean, $\eta$, and its precision $\phi$. This means that $\mathbb{E}[y] \, = exp(\eta)$
and $\text{Var}[y] = \exp(\eta) + \exp(\eta)^2 / \phi$.
We can use this distribution for our outcome, qty_sld, because we have integer
outcomes (number of movies sold per week). However, this isn't a good enough
reason to use the negative binomial distribution. We could use a Poisson
distribution. However, the Poisson distribution has only one parameter, which requires
that the mean of the Poisson random variable be equal to the variance of the random
variable. However, our data appear to be overdispersed. We can check this by
taking the mean of qty_sld for each movie and comparing it to the variance of
qty_sld of each movie. We want to do this check because we would assume that if
the data were generated by a Poisson distribution, each movie would have a
variance roughly equal to its mean. We probably wouldn't expect this to hold
across movies because different movies likely have different average sales per
week.
```{r}
agg_data <- movie_data %>% group_by(title) %>%
summarise(mean_qty_sld = mean(units_sold),
var_qty_sld = var(units_sold),
rel_date = first(release_year))
head(agg_data)
```
So it does appear that the variance of the weekly sales data is greater than the mean, indicating
that our outcome would be better modeled by a negative binomial distribution than by a Poisson
distribution.
Let's prep the data. First, because we're concerned about comparing our
predictions for certain products, and we'd like to do the comparison in Stan, we
should convert our titles to integers.
```{r prep-data}
t <- max(table(movie_data$title))
J <- length(unique(movie_data$title))
movie_data <- movie_data %>%
mutate(title_fac = as.factor(title),
ttl_idx = as.integer(title_fac),
ids = rep(1:t,J),
wk_idx = lubridate::isoweek(date) - 1,
mo_idx = lubridate::month(date))
rel_year <- unique(movie_data[,c('release_year','ttl_idx')]) %>%
mutate(rel_year_std = (release_year - mean(release_year)) / sd(release_year))
meta_data <- unique(movie_data[,c('ttl_idx','release_year','rating')]) %>%
arrange(ttl_idx) %>% select(release_year, rating) %>% as.matrix()
```
We create the list to pass into RStan as data. We'll be thorough and include everything
we think we'll use in a model.
```{r stan-data}
stan_dat_simple <- with(movie_data, list(qty_sld = units_sold,
price = price,
N = length(price)))
```
Before we fit the first model we defined above, let's generate some fake data against which to
test our Stan program. If we are able to recover known parameter values, it is a good indication
that our Stan program is working.
Let's compile the fake data generating code in the file simple_NB_poisson_dgp.stan.
```{r compNBdgp, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_dgp_simple <- stan_model('stan_programs/simple_poisson_regression_dgp.stan')
```
```{r, eval=FALSE}
set.seed(1123)
fake_gen_data <- stan_dat_simple
fake_gen_data$price <- rnorm(fake_gen_data$N)
```
```{r runpoissondgp, eval=FALSE}
fitted_model_dgp <- sampling(comp_dgp_simple, data = fake_gen_data, chains = 1, cores = 1, iter = 1, algorithm='Fixed_param', seed=123)
samps_dgp <- rstan::extract(fitted_model_dgp)
```
```{r P_fake_stan_dat, eval=FALSE}
stan_dat_fake <- fake_gen_data
stan_dat_fake$qty_sld <- samps_dgp$y_gen[1,]
```
```{r compNB, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_P <- stan_model('stan_programs/simple_poisson_regression.stan')
```
```{r runPoverfake, eval=FALSE}
fit_model_P <- sampling(comp_model_P, data = stan_dat_fake, chains = 4, cores = 4)
samps_P <- rstan::extract(fit_model_P)
samps_pars <- as.matrix(fit_model_P, pars = c('alpha','beta'))
```
```{r, eval=FALSE}
bayesplot::mcmc_recover_hist(samps_pars, c(samps_dgp$alpha,samps_dgp$beta))
```
```{r marginal_PPC}
y_rep <- samps_P$pp_y
mean_y_rep <- colMeans(y_rep)
std_resid <- (stan_dat_fake$qty_sld - mean_y_rep) / sqrt(mean_y_rep)
plot(mean_y_rep, std_resid)
abline(h=1)
abline(h=-1)
```
```{r}
ppc_rootogram(stan_dat_fake$qty_sld, yrep = y_rep)
```
```{r fit_P_real_data, cache=T}
fit_P_real_data <- sampling(comp_model_P, data = stan_dat_simple, chains = 4, cores =4 )
```
```{r results_simple_P}
print(fit_P_real_data, pars = c('alpha','beta'))
```
Ok price is related to quantity sold, so let's see how well the model fits.
```{r marginal_PPC}
samps_P_real_data <- rstan::extract(fit_P_real_data)
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld,
samps_P_real_data$pp_y[1:200,])
```
```{r marginal_PPC}
y_rep <- samps_P_real_data$pp_y
mean_y_rep <- colMeans(y_rep)
std_resid <- (stan_dat_simple$qty_sld - mean_y_rep) / sqrt(mean_y_rep)
plot(mean_y_rep, std_resid)
abline(h=1)
abline(h=-1)
```
```{r}
ppc_rootogram(stan_dat_simple$qty_sld, yrep = y_rep)
```
Maybe there're some good information that we're not using, like the
release year?
```{r}
rel_dat_lm <- lm(log1p(units_sold) ~ release_year, data = movie_data)
with(movie_data, plot(release_year, log1p(units_sold)))
abline(reg = rel_dat_lm)
```
```{r}
stan_dat_simple$rel_year <- movie_data$release_year
stan_dat_simple$rating <- movie_data$rating
```
```{r}
stan_dat_fake$rel_year <- rnorm(stan_dat_fake$N)
stan_dat_fake$rating <- rnorm(stan_dat_fake$N)
```
```{r compmultPDGP, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_dgp_multiple <- stan_model('stan_programs/multiple_poisson_regression_dgp.stan')
```
```{r runpoissondgp, eval=FALSE}
fitted_model_dgp <- sampling(comp_dgp_multiple, data = stan_dat_fake, chains = 1, cores = 1, iter = 1, algorithm='Fixed_param', seed=123)
samps_dgp <- rstan::extract(fitted_model_dgp)
```
```{r P_fake_stan_dat, eval=FALSE}
stan_dat_fake$qty_sld <- samps_dgp$y_gen[1,]
```
```{r compNB, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_P_mult <- stan_model('stan_programs/multiple_poisson_regression.stan')
```
```{r runPoverfake, eval=FALSE}
fit_model_P_mult <- sampling(comp_model_P_mult, data = stan_dat_fake, chains = 4, cores = 4)
samps_pars <- as.matrix(fit_model_P_mult, pars = c('alpha','beta','beta_year','beta_rating'))
```
```{r, eval=FALSE}
bayesplot::mcmc_recover_intervals(samps_pars,
c(samps_dgp$alpha,samps_dgp$beta,samps_dgp$beta_year,samps_dgp$beta_rating))
```
Now let's fit to real data.
[Note to Jonah: If we don't rescale the predictors, we get a bunch of max treedepth warnings, because our
priors aren't well matched to the scale of the data. I know this is something you like to talk about, so
we can do that here, but I didn't broach it because it'll take some work to build a model that fits poorly
but not egregiously so, in order to not get treedepth warnings]
```{r fit_mult_P_real_dat}
stan_dat_simple$rel_year <- scale(stan_dat_simple$rel_year)[,1]
stan_dat_simple$rating <- scale(stan_dat_simple$rating)[,1]
fit_model_P_mult_real <- sampling(comp_model_P_mult, data = stan_dat_simple, cores = 4, chains = 4)
```
```{r marginal_PPC}
samps_P_real_data <- rstan::extract(fit_model_P_mult_real)
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld,
samps_P_real_data$pp_y[1:200,])
```
```{r marginal_PPC}
y_rep <- samps_P_real_data$pp_y
mean_y_rep <- colMeans(y_rep)
std_resid <- (stan_dat_simple$qty_sld - mean_y_rep) / sqrt(mean_y_rep)
plot(mean_y_rep, std_resid)
abline(h=1)
abline(h=-1)
```
```{r}
ppc_rootogram(stan_dat_simple$qty_sld, yrep = y_rep)
```
We'll start with a simple model and build it up slowly. First, we'll define our
likelihood to be the negative binomial probability mass function over the number
of units sold, denoted below as $\texttt{qty_sld}$.
\begin{align*}
\texttt{qty_sld}_{m,t} & \sim \text{Neg-Binomial-log}(\eta_{m,t}, \phi) \\
\eta_{m,t} & = \alpha + \beta \, p_{m,t}
\end{align*}
The negative binomial mass function we'll use is called
$\texttt{neg_binomial_2_log}(\text{ints} \, y, \text{reals} \, \eta, \text{reals} \, \phi)$ in Stan.
This is negative binomial mass function that is parameterized in terms of its
log-mean, $\eta$, and its precision $\phi$. This means that $\mathbb{E}[y] \, = exp(\eta)$
and $\text{Var}[y] = \exp(\eta) + \exp(\eta)^2 / \phi$.
We can use this distribution for our outcome, qty_sld, because we have integer
outcomes (number of movies sold per week). However, this isn't a good enough
reason to use the negative binomial distribution. We could use a Poisson
distribution. However, the Poisson distribution has only one parameter, which requires
that the mean of the Poisson random variable be equal to the variance of the random
variable. However, our data appear to be overdispersed. We can check this by
taking the mean of qty_sld for each movie and comparing it to the variance of
qty_sld of each movie. We want to do this check because we would assume that if
the data were generated by a Poisson distribution, each movie would have a
variance roughly equal to its mean. We probably wouldn't expect this to hold
across movies because different movies likely have different average sales per
week.
```{r compNBdgp, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_dgp_mult_NB <- stan_model('stan_programs/multiple_NB_regression_dgp.stan')
```
We're going to generate one draw from the fake data model so we can use the data to fit our
model and compare the known values of the parameters to the posterior density of the parameters.
```{r runNBdgp, eval=FALSE}
fitted_model_dgp <- sampling(comp_dgp_mult_NB, data = stan_dat_fake, chains = 1, cores = 1, iter = 1, algorithm='Fixed_param',seed=1796604581)
samps_dgp <- rstan::extract(fitted_model_dgp)
```
We create a new dataset with the fake outcome data replacing the true observed outcomes.
```{r fake_stan_dat, eval=FALSE}
stan_dat_fake$qty_sld <- samps_dgp$y_gen[1,]
```
```{r compNB, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB <- stan_model('stan_programs/multiple_NB_regression.stan')
```
Now we run our NB regression over the fake data and extract the samples to
examine posterior predictive checks and to check whether we've sufficiently
recovered our known parameters, $\texttt{alpha}$ $\texttt{beta}$.
```{r runNBoverfake, eval=FALSE}
fitted_model_NB <- sampling(comp_model_NB, data = stan_dat_fake, chains = 4, cores = 4)
samps_NB <- rstan::extract(fitted_model_NB)
samps_pars <- as.matrix(fitted_model_NB, pars = c('alpha','beta','beta_year','beta_rating','inv_phi'))
```
```{r, eval=FALSE}
bayesplot::mcmc_recover_intervals(samps_pars,
c(samps_dgp$alpha,
samps_dgp$beta,
samps_dgp$beta_year,samps_dgp$beta_rating,samps_dgp$inv_prec))
```
The parameter recovery looks good. Let's run the model over the real data and see
how our model fares!
```{r runNB, eval=FALSE}
fitted_model_NB <- sampling(comp_model_NB, data = stan_dat_simple, chains = 4, cores = 4)
samps_NB <- rstan::extract(fitted_model_NB)
```
Let's look at our predictions vs. the data.
```{r ppc-full, eval=FALSE}
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld,
samps_NB$pp_y[1:200,])
```
```{r marginal_PPC}
y_rep <- samps_P_real_data$pp_y
mean_y_rep <- colMeans(y_rep)
std_resid <- (stan_dat_simple$qty_sld - mean_y_rep) / sqrt(mean_y_rep + mean_y_rep^2*mean(samps_NB$inv_phi))
plot(mean_y_rep, std_resid)
abline(h=1)
abline(h=-1)
```
```{r}
ppc_rootogram(stan_dat_simple$qty_sld, yrep = y_rep)
```
It appears that our data generating process that we've encoded in our model doesn't match the
data we've been given. We're over-estimating the tails of the data and under-estimating the
small counts. Are there other statistics we can look at to determine what's missing
from the model?
We can look at the distribution of the means vs. our observed means by movie to
see whether we're at least modeling the means well at the movie level.
```{r ppc-group_means, eval=FALSE}
random_20_groups <- sample(1:N_titles, 20)
stan_dat_simple$ttl_idx <- movie_data$ttl_idx
sel_vec <- stan_dat_simple$ttl_idx %in% random_20_groups
bayesplot::ppc_stat_grouped(stan_dat_simple$qty_sld[sel_vec],
samps_NB$pp_y[1:200,sel_vec], group = stan_dat_simple$ttl_idx[sel_vec], stat = 'mean')
```
It appears that we aren't modeling the means well for each movie. Let's add a hierarchical
intercept parameter, $\alpha_m$ at the movie level to our model.
$$
\texttt{qty_sld}_{m,t} \sim \text{Neg-Binomial-log}(\eta_{m,t}, \phi) \\
\eta_{m,t} = \alpha_b + \beta \, p_{m,t} + \beta_{\texttt{rel_year}} \, \texttt{rel_year}_{m} \\
\alpha_m \sim \text{Normal}(\alpha, \sigma_{\alpha})
$$
In our stan model, $\alpha_m$ is the $m$-th element of the vector
$\texttt{alpha}$ defined in the parameter vector. We've dropped the
rating from the regression. Let's put the release year into the
hierarchical model for the movie-level intercept.
$$
\alpha_m \sim \text{Normal}(\alpha + \beta_{\texttt{rel_year}} \, \texttt{rel_year}_{m}, \sigma_{\alpha})
$$
```{r stan-data}
stan_dat_hier <- with(movie_data, list(qty_sld = units_sold,
price = price,
N = length(price),
J = J,
t = t,
M = length(unique(mo_idx)),
W = length(unique(wk_idx)),
xmas_ind = as.integer(date == '2015-12-27'),
rel_year = rel_year$rel_year_std[order(rel_year$ttl_idx)],
meta_data = meta_data,
ttl_idx = ttl_idx,
wk_idx = wk_idx,
mo_idx = mo_idx))
```
```{r comp-NB-hier, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB_hier <- stan_model('stan_programs/hier_NB_regression.stan')
```
[Jonah, the doc is good up until here, working on hierarchical NB models now]
```{r run-NB-hier, eval=FALSE}
fitted_model_NB_hier <- sampling(comp_model_NB_hier, data = stan_dat_hier, chains = 4, cores = 4)
```
Let's examine the fit of the new model.
```{r print-NB-hier, eval=FALSE}
print(fitted_model_NB_hier, pars = c('sigma_alpha','beta','alpha','phi','alphas'))
```
The effective samples are quite low for many of the parameters. This indicates
that there are problems with our parameterization of the model. We can reparameterize
the random intercept $\alpha_m$, which is distributed:
\begin{align*}
\alpha_m & \sim \text{Normal}(\alpha, \sigma_{\alpha})
\end{align*}
Instead, we should use the non-centered parameterization for $\alpha_m$. We
define a vector of auxiliary variables in the parameters block,
$\texttt{alphas_raw}$ that is given a $\text{Normal}(0, 1)$ prior in the model
block. We then make $\texttt{alphas}$ a transformed parameter:
$$
\begin{verbatim}
transformed parameters {
vector[J] alphas;
alphas = alpha + sigma_alpha * alphas_raw;
}
\end{verbatim}
$$
This gives $\texttt{alphas}$ a $\text{Normal}(\alpha, \sigma_\alpha)$ distribution, but it
decouples the dependence of the density of each element of $\texttt{alphas}$ from
$\texttt{sigma_alpha}$ ($\sigma_\alpha$). hier_NB_regression_ncp.stan uses the non-centered
parameterization for $\texttt{alphas}$. We will examine the effective sample size of the
fitted model to see whether we've fixed the problem with our reparameterization.
```{r comp-NB-hier-ncp, eval=FALSE}
comp_model_NB_hier_ncp <- stan_model('hier_NB_regression_ncp.stan')
```
```{r run-NB-hier-ncp, eval=FALSE}
fitted_model_NB_hier_ncp <- sampling(comp_model_NB_hier_ncp, data = stan_dat_simple, chains = 4, cores = 4)
```
Examining the fit of the new model
```{r n-eff-NB-hier-ncp-check, eval=FALSE}
print(fitted_model_NB_hier_ncp, pars = c('sigma_alpha','beta','alpha','phi','alphas'))
```
This seems to have improved the effective sample sizes of $\texttt{alphas}$, though it has
decreased the effective sample size of $\texttt{sigma_alpha}$. We'll stick with the
non-centered parameterization as long as it appears feasible when we examine our model
diagnostics.
```{r samps-full-hier, eval=FALSE}
samps_NB_hier_ncp <- rstan::extract(fitted_model_NB_hier_ncp, pars = c('pp_y'))
```
Extracting the draws from the model allows us to see how we've fitted the marginal
distribution of the data with draws from the marginal posterior predictive distribution.
That plot is below.
```{r ppc-full-hier, eval=FALSE}
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld,
samps_NB_hier_ncp$pp_y[1:200,]) + xlim(c(0,100))
```
This looks quite nice. If we've captured the movie-level means well, then the
posterior distribution of means by movie should match well with the observed
means of the quantity of movie sales by week.
```{r ppc-group_means-hier, eval=FALSE}
bayesplot::ppc_stat_grouped(stan_dat_simple$qty_sld[sel_vec],
samps_NB_hier_ncp$pp_y[1:200,sel_vec], group = stan_dat_simple$ttl_idx[sel_vec], stat = 'mean')
```
Ok, so these plots look \emph{much} better. Perhaps if the level of movie sales differs
by movie, the coefficient for price does too. We can add this to our model and observe the
fit.
\begin{align*}
\text{qty_sld}_{m,t} & \sim \text{Neg-Binomial-log}(\eta_{m,t}, \phi) \\
\eta_{m,t} & = \alpha_m + \beta_m \, p_{m,t} \\
\alpha_m & \sim \text{Normal}(\alpha, \sigma_{\alpha}) \\
\beta_m & \sim \text{Normal}(\beta, \sigma_{\beta})
\end{align*}
```{r comp-NB-hier-slopes, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB_hier_slopes <- stan_model('hier_slopes_NB_regression_ncp.stan')
```
```{r run-NB-hier-slopes, eval=FALSE}
fitted_model_NB_hier_slopes <- sampling(comp_model_NB_hier_slopes, data = stan_dat_simple, chains = 4, cores = 4)
samps_NB_hier_slopes <- rstan::extract(fitted_model_NB_hier_slopes, pars = c('sigma_beta',
'pp_y'))
```
To see if the model infers movie-to-movie differences in , we can plot a histogram of our marginal
posterior distribution for $\texttt{sigma_beta}$.
```{r, eval=FALSE}
hist(samps_NB_hier_slopes$sigma_beta,breaks=50)
```
While the model can't specifically rule out zero from the posterior, it does have mass at small
non-zero numbers, so we should leave in the hierarchy over $\texttt{betas}$. Plotting the marginal
data density again, we can see the model still looks well calibrated.
```{r ppc-full-hier-slopes, eval=FALSE}
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld,
samps_NB_hier_slopes$pp_y[1:200,]) + xlim(c(0,100))
```
Perhaps we add monthly random intercepts to our model. Let's look at the data to see if this
is warranted.
```{r ppc-group_max-hier-slopes-mean-by-mo, eval=FALSE}
bayesplot::ppc_stat_grouped(stan_dat_simple$qty_sld,
samps_NB_hier_slopes$pp_y[1:200,], group = stan_dat_simple$mo_idx, stat = 'mean')
```
We might be missing information in month indicators. It makes sense, when it gets colder
out there might be more of an incentive to watch movies (at least in the Northeast).
```{r comp-NB-hier-mos, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB_hier_mos <- stan_model('hier_slopes_NB_regression_mos.stan')
```
```{r run-NB-hier-slopes-mos, eval=FALSE}
fitted_model_NB_hier_mos <- sampling(comp_model_NB_hier_mos, data = stan_dat_simple, chains = 4, cores = 4)
```
We get divergent transitions in our run. Let's reparameterize our prior on the $\texttt{mos}$ random
intercept to be a centered hierarchical normal. We only observed divergent transitions after
running the model with monthly random effect
```{r comp-NB-hier-mos-cp, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB_hier_mos_cp <- stan_model('hier_slopes_NB_regression_mos_cp.stan')
```
`
```{r run-NB-hier-slopes-mos-cp, cache=TRUE, eval=FALSE}
fitted_model_NB_hier_mos_cp <- sampling(comp_model_NB_hier_mos_cp, data = stan_dat_simple, chains = 4, cores = 4, control = list(adapt_delta=0.80))
saveRDS(fitted_model_NB_hier_mos_cp,file = 'fitted_model_NB_hier_mos_cp.RDS')
samps_NB_hier_mos_cp <- rstan::extract(fitted_model_NB_hier_mos_cp, pars = c('pp_y'))
```
Note that in order to get this model to fit without divergent transitions we need to crank up
$\texttt{adapt_delta}$ from 0.8 (its default) to 0.99. This may indicate that our model may not
fit our data very well. How might you go about expanding the model? What are observable summary
statistics of the outcome measure, $\texttt{qty_sld}$ that would help you build additional
structure into the priors for our model?
In the interest of brevity, we won't go on expanding the model.
```{r ppc-full-hier-mos, eval=FALSE}
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld,
samps_NB_hier_mos_cp$pp_y[1:200,]) + xlim(c(0,100))
```
```{r, eval=FALSE}
bayesplot::ppc_stat_grouped(stan_dat_simple$qty_sld,
samps_NB_hier_mos_cp$pp_y[1:200,], group = stan_dat_simple$mo_idx, stat = 'mean')
```
As we can see, it appears that our monthly random intercept has captured a monthly
pattern across all the movies.
## Revenue forecasts
Let's modify the Stan program to generate revenue forecasts at different prices in the
last week of data.
We want to predict quantity at each price, and then calculate revenue for each price.
```{r comp-rev, cache=T, results="hide", message=FALSE}
comp_rev <- stan_model('hier_slopes_NB_regression_mos_predict.stan')
```
```{r run-NB-hier-rev, cache=TRUE}
rev_model <- sampling(comp_rev, data = stan_dat_simple, chains = 4, cores = 4, iter = 2000,
control = list(adapt_delta = 0.90))
samps_rev <- rstan::extract(rev_model)
```
Below we've generated our revenue curves for the first 10 products. These charts will give us
precise quantification of our uncertainty around our revenue projections at any price for
each product. This is immensely useful as a decision maker when choosing how to set prices
for these products.
```{r rev-curves}
N_price <- 20
movie_map <- unique(movie_data[,c('title','ttl_idx')])
mean_rev <- apply(samps_rev$hypo_rev,c(2,3),mean)
lower_rev <- apply(samps_rev$hypo_rev,c(2,3),quantile,0.25)
upper_rev <- apply(samps_rev$hypo_rev,c(2,3),quantile,0.75)
mean_qty <- apply(samps_rev$hypo_qty_sld,c(2,3),mean)
lower_qty <- apply(samps_rev$hypo_qty_sld,c(2,3),quantile,0.25)
upper_qty <- apply(samps_rev$hypo_qty_sld,c(2,3),quantile,0.75)
rev_df <- data.frame(rev = as.vector(t(mean_rev)), lower = as.vector(t(lower_rev)),
upper = as.vector(t(upper_rev)),
price = rep(1:N_price,N_titles),
ttl_idx = as.vector(sapply(1:N_titles,rep,N_price)),
qty = as.vector(t(mean_qty)),
lower_qty = as.vector(t(lower_qty)),
upper_qty = as.vector(t(upper_qty)))
rev_df <- rev_df %>% left_join(movie_map, by = 'ttl_idx')
betas <- order(colMeans(samps_rev$betas))[c(1:5, (N_titles - 4):N_titles)]
sub_df <- rev_df %>% filter(ttl_idx %in% betas)
ggplot(data = sub_df, aes(x = price, y = rev)) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill="grey70") + geom_line() +
facet_wrap(~ title, scales = 'free_y')
```
Perhaps we'd also like to see the demand curves for each product. Below we've
plotted the demand curves for the same products.
```{r}
ggplot(data = sub_df, aes(x = price, y = qty)) +
geom_ribbon(aes(ymin = lower_qty, ymax = upper_qty), fill="grey70") + geom_line() +
facet_wrap(~ title, scales = 'free_y')
```
Left as an exercise for the reader:
Let's say our utility function is revenue. If we wanted to maximize expected revenue, we can
take expectations at each price for each product, and choose the price that maximizes expected
revenue. This will be called a maximum revenue strategy.
How can we generate the distribution of portfolio revenue (i.e. the sum of
revenue across all the movies) under the maximum revenue strategy from the
posterior draws of $\texttt{hypo_rev}$ we already have from
$\texttt{samps_rev}$?
## Conclusion
We've walked through a full decision analysis problem with Stan. You should have a better
sense how to build hierarchical generalized linear models in Stan, and then how to use the
output to make decisions under uncertainty.