diff --git a/04-case_study_er_injuries.Rmd b/04-case_study_er_injuries.Rmd
index 7b1f4e7..b5bf393 100644
--- a/04-case_study_er_injuries.Rmd
+++ b/04-case_study_er_injuries.Rmd
@@ -7,23 +7,34 @@
- Learn how to **create your app step-by-step**
- Get more comfortable **using the techniques you learned so far**
+![waffles?](images/04-case-study/waffle_irons.png)
+
+[image source](https://x.com/USCPSC/status/1033024826380771331)
+
## Introduction
-This Chapter is about building a more complex app with the tools we learned in the previous chapters.
+This chapter is about building a more complex app with the tools we learned in the previous chapters.
We're going to use the following packages:
```{r package-list, message=FALSE, warning=FALSE}
-library(shiny)
-library(vroom)
-library(tidyverse)
+library(shiny) #framework
+library(vroom) #to load the TSV files
+library(tidyverse) #data wrangling
```
+Session Info
+```{r}
+utils::sessionInfo()
+```
+
+
+
## The data
-We're exploring data from the National Electronic Injury Surveillance System (NEISS), which covers **accidents reported from a sample of hospitals in the US**.
+We're exploring data from the National Electronic Injury Surveillance System (NEISS), which covers **accidents reported from a sample of hospitals in the US**. The data set is from Hadley Wickham's [GitHub repository](https://github.com/hadley/mastering-shiny/raw/main/neiss/).
For every accident / injured person we have
@@ -42,7 +53,7 @@ as well as
Further we have a **weight** attribute for an estimation how may people the current case represents if the dataset was scaled to the entire US population.
-Code to download the data:
+Code to download the data:
```{r download-data}
dir.create("neiss")
@@ -57,7 +68,9 @@ download("population.tsv")
download("products.tsv")
```
-Main data:
+
+
+Main data tibbles:
```{r main-data, message=FALSE}
injuries <- vroom("neiss/injuries.tsv.gz")
@@ -68,6 +81,7 @@ Product names:
```{r product-data, message=FALSE}
products <- vroom("neiss/products.tsv")
+prod_codes <- setNames(products$prod_code, products$title)
products
```
@@ -79,7 +93,6 @@ population
```
-
## Exploration
As motivation for the app we want to build, we're going to explore the data.
@@ -88,10 +101,9 @@ Let's have a look at accidents related to toilets:
```{r no-toilets}
# product code for toilets is 649
-selected <- injuries %>%
- filter(prod_code == 649)
+selected <- injuries %>% filter(prod_code == 649)
-nrow(selected)
+# nrow(selected): 2993
```
We're interested in how many accidents related to toilets we see for different locations, body parts and diagnosis.
@@ -107,6 +119,13 @@ selected %>%
count(diag, wt = weight, sort = TRUE)
```
+Weights?
+
+* The [NEISS data dictionary](https://www.cpsc.gov/Research--Statistics/NEISS-Injury-Data) calls this column "Statistical Weight for National Estimates"
+* perhaps a form of [propensity weighting](https://www.pewresearch.org/methods/2018/01/26/how-different-weighting-methods-work/)
+
+
+
Next we'll we create a plot for the number of accidents for different age and sex:
```{r line-plot}
@@ -121,7 +140,48 @@ summary %>%
labs(y = "Injuries per 10,000 people")
```
-The goal is to build an app, which outputs the tables and the plot for different products, which the user selects.
+### Facet Wrap
+
+Let us briefly look at all of the injury types in the data set.
+
+```{r, echo = FALSE, eval = TRUE}
+injuries |>
+ group_by(prod_code) |>
+ count(age, sex, wt = weight) |>
+ left_join(y = population, by = c("age", "sex")) |>
+ mutate(rate = n / population * 1e4) |>
+ ggplot(mapping = aes(x = age, y = rate, color = sex)) +
+ geom_line(na.rm = TRUE) +
+ facet_wrap(vars(prod_code)) +
+ labs(title = "Accidents reported to emergency rooms in the US",
+ subtitle = "sample from 2013 to 2017",
+ caption = "Source: NEISS",
+ x = "Age",
+ y = "Injuries per 10,000 people")
+```
+
+Image code
+
+```{r, echo = TRUE, eval = FALSE}
+injuries |>
+ group_by(prod_code) |>
+ count(age, sex, wt = weight) |>
+ left_join(y = population, by = c("age", "sex")) |>
+ mutate(rate = n / population * 1e4) |>
+ ggplot(mapping = aes(x = age, y = rate, color = sex)) +
+ geom_line(na.rm = TRUE) +
+ facet_wrap(vars(prod_code)) +
+ labs(title = "Accidents reported to emergency rooms in the US",
+ subtitle = "sample from 2013 to 2017",
+ caption = "Source: NEISS",
+ x = "Age",
+ y = "Injuries per 10,000 people")
+```
+
+
+
+
+> The goal is to build an app, which **outputs the tables and the plot for different products**, which the *user selects*.
## Prototype
@@ -205,10 +265,11 @@ _Note:_ The reactive for plot data is only used once. You could also compute the
This prototype is available at https://hadley.shinyapps.io/ms-prototype/.
-Now we're going to improve the app step-by-step.
+![prototype of the app](images/04-case-study/app_prototype.png)
## Polish tables
+Now we're going to improve the app step-by-step.
The prototype version of the app has very long tables. To make it a little clearer we only want to show the top 5 and lump together all other categories in every table.
@@ -221,7 +282,41 @@ injuries %>%
summarise(n = as.integer(sum(weight)))
```
+### Exercise 4.8.2
+
+2. What happens if you flip `fct_infreq()` and `fct_lump()` in the code that reduces the summary tables?
+
+Answer
+```{r diag-table-alt}
+injuries %>%
+ mutate(diag = fct_infreq(fct_lump(diag, n = 5))) %>%
+ group_by(diag) %>%
+ summarise(n = as.integer(sum(weight)))
+```
+
+This order lumped the rarer conditions into "Other" and then did the sorting. However, since "Other" was the most frequent label, `fct_infreq()` then put "Other" at the top, which is less desirable.
+
+
+
+### Hadley's Fix
+
+Hadley's Code
+```{r, eval = FALSE}
+count_top <- function(df, var, n = 5) {
+ df %>%
+ mutate({{ var }} := fct_lump(fct_infreq({{ var }}), n = n)) %>%
+ group_by({{ var }}) %>%
+ summarise(n = as.integer(sum(weight)))
+}
+
+output$diag <- renderTable(count_top(selected(), diag), width = "100%")
+output$body_part <- renderTable(count_top(selected(), body_part), width = "100%")
+output$location <- renderTable(count_top(selected(), location), width = "100%")
+```
+
+
+![polished tables](images/04-case-study/app_polish_tables.png)
## Rate vs count
@@ -322,6 +417,193 @@ server <- function(input, output, session) {
The resulting version of the app is available at https://hadley.shinyapps.io/ms-prototype/.
+
+## Exercises
+
+1. Draw the reactive graph for each app.
+
+2. What happens if you flip `fct_infreq()` and `fct_lump()` in the code that reduces the summary tables?
+
+Answer
+```{r diag-table-2}
+injuries %>%
+ mutate(diag = fct_infreq(fct_lump(diag)), n = 5) %>%
+ group_by(diag) %>%
+ summarise(n = as.integer(sum(weight)))
+```
+
+
+This order lumped the rarer conditions into "Other" and then did the sorting. However, since "Other" was the most frequent label, `fct_infreq()` then put "Other" at the top, which is less desirable.
+
+
+
+3. Add an input control that lets the user decide how many rows to show in the summary tables.
+
+Answer
+
+> refer to code in next section
+
+
+
+4. Provide a way to step through every narrative systematically with forward and backward buttons. Advanced: Make the list of narratives “circular” so that advancing forward from the last narrative takes you to the first.
+
+
+## DSLC Customs
+
+Some people in the cohort were brainstorming ideas on how to continue to improve this Shiny app. Here are some of the ideas.
+
+### UI
+
+```{r, eval = FALSE}
+library("gt")
+library("shiny")
+library("tidyverse")
+
+injuries <- readr::read_csv("injuries.csv")
+population <- readr::read_csv("population.csv")
+products <- readr::read_csv("products.csv")
+prod_codes <- setNames(products$prod_code, products$title)
+
+count_top <- function(df, var, n = 5) {
+ df |>
+ mutate({{ var }} := fct_lump(fct_infreq({{ var }}), n = n)) %>%
+ group_by({{ var }}) %>%
+ summarise(n = as.integer(sum(weight))) |>
+
+ # gt table
+ gt() |>
+ cols_align(align = "center") |>
+ tab_style(
+ style = list(cell_fill(color = "#F9E3D6")),
+ locations = cells_body(columns = {{ var }})
+ ) |>
+ tab_style(
+ style = list(cell_fill(color = "lightcyan")),
+ locations = cells_body(columns = n)
+ )
+}
+
+ui <- fluidPage(
+ # choose product
+ fluidRow(
+ column(
+ width = 6,
+ selectInput(
+ inputId = "code",
+ label = "Product",
+ choices = prod_codes,
+ width = "100%"
+ )
+ ),
+ column(
+ width = 2,
+ selectInput(inputId = "y", label = "Y axis", choices = c("rate", "count"))
+ ),
+ column(
+ width = 4,
+ sliderInput(inputId = "n_products", label = "number of products",
+ min = 1, max = 20, value = 5, step = 1)
+ )
+ ),
+
+
+ # display tables
+ fluidRow(
+ column(width = 4, tableOutput(outputId = "diag")),
+ column(width = 4, tableOutput(outputId = "body_part")),
+ column(width = 4, tableOutput(outputId = "location"))
+ ),
+ # display plot
+ fluidRow(
+ column(width = 12, plotOutput(outputId = "age_sex"))
+ )
+
+ # narrative button
+ ,
+ fluidRow(
+ column(
+ width = 2,
+ actionButton(inputId = "story", label = "Tell me a story")
+ ),
+ column(width = 10, textOutput(outputId = "narrative"))
+ )
+)
+```
+
+
+### Server
+
+```{r, eval = FALSE}
+server <- function(input, output, session) {
+ # reactive for filtered data frame
+ selected <- reactive(
+ injuries %>%
+ filter(prod_code == input$code)
+ )
+ num_products <- reactive(input$n_products)
+
+ # retrieve injury type
+ prod_name <- reactive(products$title[products$prod_code == input$code])
+
+ # reactive for plot data
+ summary <- reactive(
+ selected() %>%
+ count(age, sex, wt = weight) %>%
+ left_join(y = population, by = c("age", "sex")) %>%
+ mutate(rate = n / population * 1e4)
+ )
+
+ output$diag <- render_gt(count_top(selected(), diag, num_products()),
+ width = "100%")
+ output$body_part <- render_gt(count_top(selected(), body_part, num_products()),
+ width = "100%")
+ output$location <- render_gt(count_top(selected(), location, num_products()),
+ width = "100%")
+
+ # render plot
+ output$age_sex <- renderPlot(
+ expr = {
+ if (input$y == "count") {
+ summary() %>%
+ ggplot(mapping = aes(x = age, y = n, colour = sex)) +
+ geom_line() +
+ labs(title = "Accidents reported to emergency rooms in the US",
+ subtitle = prod_name(),
+ caption = "Source: NEISS",
+ x = "Age",
+ y = "Estimated number of injuries") +
+ theme_minimal()
+
+ } else {
+ summary() %>%
+ ggplot(mapping = aes(x = age, y = rate, colour = sex)) +
+ geom_line(na.rm = TRUE) +
+ labs(title = "Accidents reported to emergency rooms in the US",
+ subtitle = prod_name(),
+ caption = "Source: NEISS",
+ x = "Age",
+ y = "Injuries per 10,000 people") +
+ theme_minimal()
+ }
+ },
+ res = 96
+ )
+
+ # narrative reactive
+ narrative_sample <- eventReactive(
+ eventExpr = list(input$story, selected()),
+ valueExpr = selected() %>%
+ pull(narrative) %>%
+ sample(1)
+ )
+ output$narrative <- renderText(narrative_sample())
+}
+
+# Run the application
+shinyApp(ui = ui, server = server)
+```
+
+
## Meeting Videos
### Cohort 1
diff --git a/DESCRIPTION b/DESCRIPTION
index a479c42..e954bfe 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -25,6 +25,7 @@ Imports:
auth0,
config,
sodium,
- testthat
+ testthat,
+ gt
Remotes:
hadley/emo
diff --git a/images/04-case-study/app_polish_tables.png b/images/04-case-study/app_polish_tables.png
new file mode 100644
index 0000000..d7d2141
Binary files /dev/null and b/images/04-case-study/app_polish_tables.png differ
diff --git a/images/04-case-study/app_prototype.png b/images/04-case-study/app_prototype.png
new file mode 100644
index 0000000..d7ef5b0
Binary files /dev/null and b/images/04-case-study/app_prototype.png differ
diff --git a/images/04-case-study/waffle_irons.png b/images/04-case-study/waffle_irons.png
new file mode 100644
index 0000000..1d44d3f
Binary files /dev/null and b/images/04-case-study/waffle_irons.png differ