Revenue_Maximization_Example.Rmd

---
title: "Revenue optimization"
author: "Rob Trangucci & Jonah Gabry"
date: "January 9, 2017"
output:
  pdf_document: default
  html_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rstan)
library(dplyr)
library(reshape2)
library(bayesplot)
library(ggplot2)
```

## Problem set-up

Let's say that we're a data scientist at a large media company. We are told that
we have the power to set prices for all streaming movies at whatever we want in
order to maximize revenue. How should we go about learning how to maximize
revenue for each movie? Our data might look like the number of movies sold per
day, $y_{b,t}$ at a price $p_{b,t}$.

Well, one way we could do this is to use the data we have to estimate the demand curve for the 
products. Luckily, our company has exclusive rights to each of the movies, and has 
engaged in random pricing trials for the past 10 months. That means that on random weeks,
movies have had their prices randomly jittered by one dollar up or down. The prices can only be
set on a grid of whole dollars. 

The decision problem we want to solve is the following:

\begin{align*}
\arg\max_{p \in P} \mathbb{E}_q[q(p) \, p]
\end{align*}

Because we have complete control over price, the random variable contributing to the expectation
is quantity. That's what we'll be modeling for these movies, the quantity as a function of price.

The data is in a file called $\texttt{movie_data_tt_new_set.RDS}$. Let's load the data and see what the structure
is:

```{r load-data}
movie_data <- readRDS('data/movie_data_tt_new_set_20180717.RDS')$df
head(movie_data)
```

We've got the number of units sold per week, the date corresponding to the
Sunday of every week end, the title, the price (in US dollars), the release year
and the IMDB rating.

```{r describe-data}
N_titles <- length(unique(movie_data$title))
N_titles
```

We're working with `r N_titles` titles. We should first want to learn whether price is
related to quantity for our movies. 

[filler about count data modeling and using Poisson outcome]

We'll start with a simple model and build it up slowly. First, we'll define our 
likelihood to be the negative binomial probability mass function over the number
of units sold, denoted below as $\texttt{qty_sld}$.

\begin{align*}
\texttt{qty_sld}_{m,t} & \sim \text{Neg-Binomial-log}(\eta_{m,t}, \phi) \\
\eta_{m,t} & = \alpha + \beta \, p_{m,t}
\end{align*}

The negative binomial mass function we'll use is called 
$\texttt{neg_binomial_2_log}(\text{ints} \, y, \text{reals} \, \eta, \text{reals} \, \phi)$ in Stan.
This is negative binomial mass function that is parameterized in terms of its 
log-mean, $\eta$, and its precision $\phi$. This means that $\mathbb{E}[y] \, = exp(\eta)$ 
and $\text{Var}[y] = \exp(\eta) + \exp(\eta)^2 / \phi$. 

We can use this distribution for our outcome, qty_sld, because we have integer 
outcomes (number of movies sold per week). However, this isn't a good enough 
reason to use the negative binomial distribution. We could use a Poisson
distribution. However, the Poisson distribution has only one parameter, which requires
that the mean of the Poisson random variable be equal to the variance of the random
variable. However, our data appear to be overdispersed. We can check this by
taking the mean of qty_sld for each movie and comparing it to the variance of
qty_sld of each movie. We want to do this check because we would assume that if
the data were generated by a Poisson distribution, each movie would have a
variance roughly equal to its mean. We probably wouldn't expect this to hold
across movies because different movies likely have different average sales per
week.

```{r}
agg_data <- movie_data %>% group_by(title) %>% 
  summarise(mean_qty_sld = mean(units_sold),
            var_qty_sld = var(units_sold),
            rel_date = first(release_year))
head(agg_data)
```

So it does appear that the variance of the weekly sales data is greater than the mean, indicating
that our outcome would be better modeled by a negative binomial distribution than by a Poisson
distribution.


Let's prep the data. First, because we're concerned about comparing our
predictions for certain products, and we'd like to do the comparison in Stan, we
should convert our titles to integers.

```{r prep-data}
t <- max(table(movie_data$title))
J <- length(unique(movie_data$title))
movie_data <- movie_data %>%
  mutate(title_fac = as.factor(title),
         ttl_idx = as.integer(title_fac),
         ids = rep(1:t,J),
         wk_idx = lubridate::isoweek(date) - 1,
         mo_idx = lubridate::month(date))
rel_year <- unique(movie_data[,c('release_year','ttl_idx')]) %>%
  mutate(rel_year_std = (release_year - mean(release_year)) / sd(release_year))
meta_data <- unique(movie_data[,c('ttl_idx','release_year','rating')]) %>%
  arrange(ttl_idx) %>% select(release_year, rating) %>% as.matrix()
```

We create the list to pass into RStan as data. We'll be thorough and include everything
we think we'll use in a model.

```{r stan-data}
stan_dat_simple <- with(movie_data, list(qty_sld = units_sold,
                                         price = price,
                                         N = length(price)))
```

Before we fit the first model we defined above, let's generate some fake data against which to
test our Stan program. If we are able to recover known parameter values, it is a good indication
that our Stan program is working.

Let's compile the fake data generating code in the file simple_NB_poisson_dgp.stan.

```{r compNBdgp, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_dgp_simple <- stan_model('stan_programs/simple_poisson_regression_dgp.stan')
```

```{r, eval=FALSE}
set.seed(1123)
fake_gen_data <- stan_dat_simple
fake_gen_data$price <- rnorm(fake_gen_data$N)
```

```{r runpoissondgp, eval=FALSE}
fitted_model_dgp <- sampling(comp_dgp_simple, data = fake_gen_data, chains = 1, cores = 1, iter = 1, algorithm='Fixed_param', seed=123)
samps_dgp <- rstan::extract(fitted_model_dgp)
```


```{r P_fake_stan_dat, eval=FALSE}
stan_dat_fake <- fake_gen_data
stan_dat_fake$qty_sld <- samps_dgp$y_gen[1,]
```

```{r compNB, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_P <- stan_model('stan_programs/simple_poisson_regression.stan')
```

```{r runPoverfake, eval=FALSE}
fit_model_P <- sampling(comp_model_P, data = stan_dat_fake, chains = 4, cores = 4)
samps_P <- rstan::extract(fit_model_P)
samps_pars <- as.matrix(fit_model_P, pars = c('alpha','beta'))
```

```{r, eval=FALSE}
bayesplot::mcmc_recover_hist(samps_pars, c(samps_dgp$alpha,samps_dgp$beta))
```

```{r marginal_PPC}
y_rep <- samps_P$pp_y
mean_y_rep <- colMeans(y_rep)
std_resid <- (stan_dat_fake$qty_sld - mean_y_rep) / sqrt(mean_y_rep)
plot(mean_y_rep, std_resid)
abline(h=1)
abline(h=-1)
```

```{r}
ppc_rootogram(stan_dat_fake$qty_sld, yrep = y_rep)
```


```{r fit_P_real_data, cache=T}
fit_P_real_data <- sampling(comp_model_P, data = stan_dat_simple, chains = 4, cores =4 )
```

```{r results_simple_P}
print(fit_P_real_data, pars = c('alpha','beta'))
```

Ok price is related to quantity sold, so let's see how well the model fits.

```{r marginal_PPC}
samps_P_real_data <- rstan::extract(fit_P_real_data)
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld, 
                          samps_P_real_data$pp_y[1:200,])
```

```{r marginal_PPC}
y_rep <- samps_P_real_data$pp_y
mean_y_rep <- colMeans(y_rep)
std_resid <- (stan_dat_simple$qty_sld - mean_y_rep) / sqrt(mean_y_rep)
plot(mean_y_rep, std_resid)
abline(h=1)
abline(h=-1)
```

```{r}
ppc_rootogram(stan_dat_simple$qty_sld, yrep = y_rep)
```

Maybe there're some good information that we're not using, like the
release year?

```{r}
rel_dat_lm <- lm(log1p(units_sold) ~ release_year, data = movie_data)
with(movie_data, plot(release_year, log1p(units_sold)))
abline(reg = rel_dat_lm)
```

```{r}
stan_dat_simple$rel_year <- movie_data$release_year
stan_dat_simple$rating <- movie_data$rating
```

```{r}
stan_dat_fake$rel_year <- rnorm(stan_dat_fake$N)
stan_dat_fake$rating <- rnorm(stan_dat_fake$N)
```

```{r compmultPDGP, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_dgp_multiple <- stan_model('stan_programs/multiple_poisson_regression_dgp.stan')
```

```{r runpoissondgp, eval=FALSE}
fitted_model_dgp <- sampling(comp_dgp_multiple, data = stan_dat_fake, chains = 1, cores = 1, iter = 1, algorithm='Fixed_param', seed=123)
samps_dgp <- rstan::extract(fitted_model_dgp)
```

```{r P_fake_stan_dat, eval=FALSE}
stan_dat_fake$qty_sld <- samps_dgp$y_gen[1,]
```

```{r compNB, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_P_mult <- stan_model('stan_programs/multiple_poisson_regression.stan')
```

```{r runPoverfake, eval=FALSE}
fit_model_P_mult <- sampling(comp_model_P_mult, data = stan_dat_fake, chains = 4, cores = 4)
samps_pars <- as.matrix(fit_model_P_mult, pars = c('alpha','beta','beta_year','beta_rating'))
```

```{r, eval=FALSE}
bayesplot::mcmc_recover_intervals(samps_pars,
                             c(samps_dgp$alpha,samps_dgp$beta,samps_dgp$beta_year,samps_dgp$beta_rating))
```

Now let's fit to real data.

[Note to Jonah: If we don't rescale the predictors, we get a bunch of max treedepth warnings, because our
priors aren't well matched to the scale of the data. I know this is something you like to talk about, so
we can do that here, but I didn't broach it because it'll take some work to build a model that fits poorly
but not egregiously so, in order to not get treedepth warnings]

```{r fit_mult_P_real_dat}
stan_dat_simple$rel_year <- scale(stan_dat_simple$rel_year)[,1]
stan_dat_simple$rating <- scale(stan_dat_simple$rating)[,1]
fit_model_P_mult_real <- sampling(comp_model_P_mult, data = stan_dat_simple, cores = 4, chains = 4)
```

```{r marginal_PPC}
samps_P_real_data <- rstan::extract(fit_model_P_mult_real)
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld, 
                          samps_P_real_data$pp_y[1:200,])
```

```{r marginal_PPC}
y_rep <- samps_P_real_data$pp_y
mean_y_rep <- colMeans(y_rep)
std_resid <- (stan_dat_simple$qty_sld - mean_y_rep) / sqrt(mean_y_rep)
plot(mean_y_rep, std_resid)
abline(h=1)
abline(h=-1)
```

```{r}
ppc_rootogram(stan_dat_simple$qty_sld, yrep = y_rep)
```

We'll start with a simple model and build it up slowly. First, we'll define our 
likelihood to be the negative binomial probability mass function over the number
of units sold, denoted below as $\texttt{qty_sld}$.

\begin{align*}
\texttt{qty_sld}_{m,t} & \sim \text{Neg-Binomial-log}(\eta_{m,t}, \phi) \\
\eta_{m,t} & = \alpha + \beta \, p_{m,t}
\end{align*}

The negative binomial mass function we'll use is called 
$\texttt{neg_binomial_2_log}(\text{ints} \, y, \text{reals} \, \eta, \text{reals} \, \phi)$ in Stan.
This is negative binomial mass function that is parameterized in terms of its 
log-mean, $\eta$, and its precision $\phi$. This means that $\mathbb{E}[y] \, = exp(\eta)$ 
and $\text{Var}[y] = \exp(\eta) + \exp(\eta)^2 / \phi$. 

We can use this distribution for our outcome, qty_sld, because we have integer 
outcomes (number of movies sold per week). However, this isn't a good enough 
reason to use the negative binomial distribution. We could use a Poisson
distribution. However, the Poisson distribution has only one parameter, which requires
that the mean of the Poisson random variable be equal to the variance of the random
variable. However, our data appear to be overdispersed. We can check this by
taking the mean of qty_sld for each movie and comparing it to the variance of
qty_sld of each movie. We want to do this check because we would assume that if
the data were generated by a Poisson distribution, each movie would have a
variance roughly equal to its mean. We probably wouldn't expect this to hold
across movies because different movies likely have different average sales per
week.

```{r compNBdgp, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_dgp_mult_NB <- stan_model('stan_programs/multiple_NB_regression_dgp.stan')
```

We're going to generate one draw from the fake data model so we can use the data to fit our
model and compare the known values of the parameters to the posterior density of the parameters.

```{r runNBdgp, eval=FALSE}
fitted_model_dgp <- sampling(comp_dgp_mult_NB, data = stan_dat_fake, chains = 1, cores = 1, iter = 1, algorithm='Fixed_param',seed=1796604581)
samps_dgp <- rstan::extract(fitted_model_dgp)
```

We create a new dataset with the fake outcome data replacing the true observed outcomes.

```{r fake_stan_dat, eval=FALSE}
stan_dat_fake$qty_sld <- samps_dgp$y_gen[1,]
```

```{r compNB, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB <- stan_model('stan_programs/multiple_NB_regression.stan')
```

Now we run our NB regression over the fake data and extract the samples to 
examine posterior predictive checks and to check whether we've sufficiently
recovered our known parameters, $\texttt{alpha}$ $\texttt{beta}$.

```{r runNBoverfake, eval=FALSE}
fitted_model_NB <- sampling(comp_model_NB, data = stan_dat_fake, chains = 4, cores = 4)
samps_NB <- rstan::extract(fitted_model_NB)
samps_pars <- as.matrix(fitted_model_NB, pars = c('alpha','beta','beta_year','beta_rating','inv_phi'))
```

```{r, eval=FALSE}
bayesplot::mcmc_recover_intervals(samps_pars, 
                              c(samps_dgp$alpha,
                                samps_dgp$beta,
                                samps_dgp$beta_year,samps_dgp$beta_rating,samps_dgp$inv_prec))
```

The parameter recovery looks good. Let's run the model over the real data and see
how our model fares!

```{r runNB, eval=FALSE}
fitted_model_NB <- sampling(comp_model_NB, data = stan_dat_simple, chains = 4, cores = 4)
samps_NB <- rstan::extract(fitted_model_NB)
```

Let's look at our predictions vs. the data.

```{r ppc-full, eval=FALSE}
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld, 
                            samps_NB$pp_y[1:200,])
```

```{r marginal_PPC}
y_rep <- samps_P_real_data$pp_y
mean_y_rep <- colMeans(y_rep)
std_resid <- (stan_dat_simple$qty_sld - mean_y_rep) / sqrt(mean_y_rep + mean_y_rep^2*mean(samps_NB$inv_phi))
plot(mean_y_rep, std_resid)
abline(h=1)
abline(h=-1)
```

```{r}
ppc_rootogram(stan_dat_simple$qty_sld, yrep = y_rep)
```

It appears that our data generating process that we've encoded in our model doesn't match the
data we've been given. We're over-estimating the tails of the data and under-estimating the
small counts. Are there other statistics we can look at to determine what's missing
from the model?

We can look at the distribution of the means vs. our observed means by movie to
see whether we're at least modeling the means well at the movie level.

```{r ppc-group_means, eval=FALSE}
random_20_groups <- sample(1:N_titles, 20)
stan_dat_simple$ttl_idx <- movie_data$ttl_idx
sel_vec <- stan_dat_simple$ttl_idx %in% random_20_groups
bayesplot::ppc_stat_grouped(stan_dat_simple$qty_sld[sel_vec], 
                            samps_NB$pp_y[1:200,sel_vec], group = stan_dat_simple$ttl_idx[sel_vec], stat = 'mean')
```


It appears that we aren't modeling the means well for each movie. Let's add a hierarchical
intercept parameter, $\alpha_m$ at the movie level to our model. 

$$
\texttt{qty_sld}_{m,t} \sim \text{Neg-Binomial-log}(\eta_{m,t}, \phi) \\
\eta_{m,t} = \alpha_b + \beta \, p_{m,t} + \beta_{\texttt{rel_year}} \, \texttt{rel_year}_{m} \\
\alpha_m \sim \text{Normal}(\alpha, \sigma_{\alpha})
$$

In our stan model, $\alpha_m$ is the $m$-th element of the vector
$\texttt{alpha}$ defined in the parameter vector. We've dropped the 
rating from the regression. Let's put the release year into the 
hierarchical model for the movie-level intercept.

$$
\alpha_m \sim \text{Normal}(\alpha + \beta_{\texttt{rel_year}} \, \texttt{rel_year}_{m}, \sigma_{\alpha})
$$


```{r stan-data}
stan_dat_hier <- with(movie_data, list(qty_sld = units_sold,
                                       price = price,
                                       N = length(price),
                                       J = J,
                                       t = t,
                                       M = length(unique(mo_idx)),
                                       W = length(unique(wk_idx)),
                                       xmas_ind = as.integer(date == '2015-12-27'),
                                       rel_year = rel_year$rel_year_std[order(rel_year$ttl_idx)],
                                       meta_data = meta_data,
                                       ttl_idx = ttl_idx,
                                       wk_idx = wk_idx,
                                       mo_idx = mo_idx))
```

```{r comp-NB-hier, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB_hier <- stan_model('stan_programs/hier_NB_regression.stan')
```

[Jonah, the doc is good up until here, working on hierarchical NB models now]

```{r run-NB-hier, eval=FALSE}
fitted_model_NB_hier <- sampling(comp_model_NB_hier, data = stan_dat_hier, chains = 4, cores = 4)
```

Let's examine the fit of the new model. 

```{r print-NB-hier, eval=FALSE}
print(fitted_model_NB_hier, pars = c('sigma_alpha','beta','alpha','phi','alphas'))
```

The effective samples are quite low for many of the parameters. This indicates
that there are problems with our parameterization of the model. We can reparameterize
the random intercept $\alpha_m$, which is distributed:

\begin{align*}
\alpha_m & \sim \text{Normal}(\alpha, \sigma_{\alpha})
\end{align*}

Instead, we should use the non-centered parameterization for $\alpha_m$. We
define a vector of auxiliary variables in the parameters block,
$\texttt{alphas_raw}$ that is given a $\text{Normal}(0, 1)$ prior in the model
block. We then make $\texttt{alphas}$ a transformed parameter:

$$
\begin{verbatim}
transformed parameters {
  vector[J] alphas;
  alphas = alpha + sigma_alpha * alphas_raw;
}
\end{verbatim}
$$

This gives $\texttt{alphas}$ a $\text{Normal}(\alpha, \sigma_\alpha)$ distribution, but it
decouples the dependence of the density of each element of $\texttt{alphas}$ from 
$\texttt{sigma_alpha}$ ($\sigma_\alpha$). hier_NB_regression_ncp.stan uses the non-centered
parameterization for $\texttt{alphas}$. We will examine the effective sample size of the
fitted model to see whether we've fixed the problem with our reparameterization.

```{r comp-NB-hier-ncp, eval=FALSE}
comp_model_NB_hier_ncp <- stan_model('hier_NB_regression_ncp.stan')
```

```{r run-NB-hier-ncp, eval=FALSE}
fitted_model_NB_hier_ncp <- sampling(comp_model_NB_hier_ncp, data = stan_dat_simple, chains = 4, cores = 4)
```

Examining the fit of the new model

```{r n-eff-NB-hier-ncp-check, eval=FALSE}
print(fitted_model_NB_hier_ncp, pars = c('sigma_alpha','beta','alpha','phi','alphas'))
```

This seems to have improved the effective sample sizes of $\texttt{alphas}$, though it has
decreased the effective sample size of $\texttt{sigma_alpha}$. We'll stick with the 
non-centered parameterization as long as it appears feasible when we examine our model
diagnostics.

```{r samps-full-hier, eval=FALSE}
samps_NB_hier_ncp <- rstan::extract(fitted_model_NB_hier_ncp, pars = c('pp_y'))
```

Extracting the draws from the model allows us to see how we've fitted the marginal
distribution of the data with draws from the marginal posterior predictive distribution.
That plot is below.

```{r ppc-full-hier, eval=FALSE}
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld, 
                          samps_NB_hier_ncp$pp_y[1:200,]) + xlim(c(0,100))
```

This looks quite nice. If we've captured the movie-level means well, then the
posterior distribution of means by movie should match well with the observed
means of the quantity of movie sales by week.

```{r ppc-group_means-hier, eval=FALSE}
bayesplot::ppc_stat_grouped(stan_dat_simple$qty_sld[sel_vec], 
                            samps_NB_hier_ncp$pp_y[1:200,sel_vec], group = stan_dat_simple$ttl_idx[sel_vec], stat = 'mean')
```

Ok, so these plots look \emph{much} better. Perhaps if the level of movie sales differs
by movie, the coefficient for price does too. We can add this to our model and observe the
fit.

\begin{align*}
\text{qty_sld}_{m,t} & \sim \text{Neg-Binomial-log}(\eta_{m,t}, \phi) \\
\eta_{m,t} & = \alpha_m + \beta_m \, p_{m,t} \\
\alpha_m & \sim \text{Normal}(\alpha, \sigma_{\alpha}) \\
\beta_m & \sim \text{Normal}(\beta, \sigma_{\beta})
\end{align*}

```{r comp-NB-hier-slopes, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB_hier_slopes <- stan_model('hier_slopes_NB_regression_ncp.stan')
```

```{r run-NB-hier-slopes, eval=FALSE}
fitted_model_NB_hier_slopes <- sampling(comp_model_NB_hier_slopes, data = stan_dat_simple, chains = 4, cores = 4)
samps_NB_hier_slopes <- rstan::extract(fitted_model_NB_hier_slopes, pars = c('sigma_beta',
                                                                              'pp_y'))
```

To see if the model infers movie-to-movie differences in , we can plot a histogram of our marginal
posterior distribution for $\texttt{sigma_beta}$. 

```{r, eval=FALSE}
hist(samps_NB_hier_slopes$sigma_beta,breaks=50)
```

While the model can't specifically rule out zero from the posterior, it does have mass at small
non-zero numbers, so we should leave in the hierarchy over $\texttt{betas}$. Plotting the marginal
data density again, we can see the model still looks well calibrated.

```{r ppc-full-hier-slopes, eval=FALSE}
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld, 
                            samps_NB_hier_slopes$pp_y[1:200,]) + xlim(c(0,100))
```

Perhaps we add monthly random intercepts to our model. Let's look at the data to see if this
is warranted.

```{r ppc-group_max-hier-slopes-mean-by-mo, eval=FALSE}
bayesplot::ppc_stat_grouped(stan_dat_simple$qty_sld, 
                            samps_NB_hier_slopes$pp_y[1:200,], group = stan_dat_simple$mo_idx, stat = 'mean')
```

We might be missing information in month indicators. It makes sense, when it gets colder
out there might be more of an incentive to watch movies (at least in the Northeast).

```{r comp-NB-hier-mos, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB_hier_mos <- stan_model('hier_slopes_NB_regression_mos.stan')
```

```{r run-NB-hier-slopes-mos, eval=FALSE}
fitted_model_NB_hier_mos <- sampling(comp_model_NB_hier_mos, data = stan_dat_simple, chains = 4, cores = 4)
```

We get divergent transitions in our run. Let's reparameterize our prior on the $\texttt{mos}$ random
intercept to be a centered hierarchical normal. We only observed divergent transitions after
running the model with monthly random effect

```{r comp-NB-hier-mos-cp, cache=T, results="hide", message=FALSE, eval=FALSE}
comp_model_NB_hier_mos_cp <- stan_model('hier_slopes_NB_regression_mos_cp.stan')
```
`
```{r run-NB-hier-slopes-mos-cp, cache=TRUE, eval=FALSE}
fitted_model_NB_hier_mos_cp <- sampling(comp_model_NB_hier_mos_cp, data = stan_dat_simple, chains = 4, cores = 4, control = list(adapt_delta=0.80))
saveRDS(fitted_model_NB_hier_mos_cp,file = 'fitted_model_NB_hier_mos_cp.RDS')
samps_NB_hier_mos_cp <- rstan::extract(fitted_model_NB_hier_mos_cp, pars = c('pp_y'))
```

Note that in order to get this model to fit without divergent transitions we need to crank up
$\texttt{adapt_delta}$ from 0.8 (its default) to 0.99. This may indicate that our model may not
fit our data very well. How might you go about expanding the model? What are observable summary
statistics of the outcome measure, $\texttt{qty_sld}$ that would help you build additional
structure into the priors for our model? 

In the interest of brevity, we won't go on expanding the model. 

```{r ppc-full-hier-mos, eval=FALSE}
bayesplot::ppc_dens_overlay(stan_dat_simple$qty_sld, 
                            samps_NB_hier_mos_cp$pp_y[1:200,]) + xlim(c(0,100))
```

```{r, eval=FALSE}
bayesplot::ppc_stat_grouped(stan_dat_simple$qty_sld, 
                            samps_NB_hier_mos_cp$pp_y[1:200,], group = stan_dat_simple$mo_idx, stat = 'mean')
```

As we can see, it appears that our monthly random intercept has captured a monthly
pattern across all the movies.

## Revenue forecasts

Let's modify the Stan program to generate revenue forecasts at different prices in the
last week of data.

We want to predict quantity at each price, and then calculate revenue for each price.

```{r comp-rev, cache=T, results="hide", message=FALSE}
comp_rev <- stan_model('hier_slopes_NB_regression_mos_predict.stan')
```

```{r run-NB-hier-rev, cache=TRUE}
rev_model <- sampling(comp_rev, data = stan_dat_simple, chains = 4, cores = 4, iter = 2000,
                      control = list(adapt_delta = 0.90))
samps_rev <- rstan::extract(rev_model)
```

Below we've generated our revenue curves for the first 10 products. These charts will give us 
precise quantification of our uncertainty around our revenue projections at any price for
each product. This is immensely useful as a decision maker when choosing how to set prices
for these products.

```{r rev-curves}
N_price <- 20
movie_map <- unique(movie_data[,c('title','ttl_idx')])
mean_rev <- apply(samps_rev$hypo_rev,c(2,3),mean)
lower_rev <- apply(samps_rev$hypo_rev,c(2,3),quantile,0.25)
upper_rev <- apply(samps_rev$hypo_rev,c(2,3),quantile,0.75)
mean_qty <- apply(samps_rev$hypo_qty_sld,c(2,3),mean)
lower_qty <- apply(samps_rev$hypo_qty_sld,c(2,3),quantile,0.25)
upper_qty <- apply(samps_rev$hypo_qty_sld,c(2,3),quantile,0.75)
rev_df <- data.frame(rev = as.vector(t(mean_rev)), lower = as.vector(t(lower_rev)),
                     upper = as.vector(t(upper_rev)),
                     price = rep(1:N_price,N_titles),
                     ttl_idx = as.vector(sapply(1:N_titles,rep,N_price)),
                     qty = as.vector(t(mean_qty)),
                     lower_qty = as.vector(t(lower_qty)),
                     upper_qty = as.vector(t(upper_qty)))
rev_df <- rev_df %>% left_join(movie_map, by = 'ttl_idx')
betas <- order(colMeans(samps_rev$betas))[c(1:5, (N_titles - 4):N_titles)]
sub_df <- rev_df %>% filter(ttl_idx %in% betas)
ggplot(data = sub_df, aes(x = price, y = rev)) +
  geom_ribbon(aes(ymin = lower, ymax = upper), fill="grey70") + geom_line() +
  facet_wrap(~ title, scales = 'free_y')
```

Perhaps we'd also like to see the demand curves for each product. Below we've 
plotted the demand curves for the same products.

```{r}
ggplot(data = sub_df, aes(x = price, y = qty)) +
  geom_ribbon(aes(ymin = lower_qty, ymax = upper_qty), fill="grey70") + geom_line() +
  facet_wrap(~ title, scales = 'free_y')
```

Left as an exercise for the reader: 

Let's say our utility function is revenue. If we wanted to maximize expected revenue, we can 
take expectations at each price for each product, and choose the price that maximizes expected
revenue. This will be called a maximum revenue strategy.

How can we generate the distribution of portfolio revenue (i.e. the sum of 
revenue across all the movies) under the maximum revenue strategy from the
posterior draws of $\texttt{hypo_rev}$ we already have from
$\texttt{samps_rev}$?

## Conclusion

We've walked through a full decision analysis problem with Stan. You should have a better
sense how to build hierarchical generalized linear models in Stan, and then how to use the
output to make decisions under uncertainty.