Skip to content

Commit

Permalink
Added presentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Berry authored and Daniel Berry committed Nov 22, 2016
1 parent 5c6b580 commit be43b30
Show file tree
Hide file tree
Showing 11 changed files with 688 additions and 26 deletions.
Binary file added cp
Binary file not shown.
Binary file added mlm
Binary file not shown.
26 changes: 26 additions & 0 deletions model.r
Original file line number Diff line number Diff line change
Expand Up @@ -622,3 +622,29 @@ mlm <- glmer(desert ~ CTA_counts + crime + vacant_counts +


print(paste('AIC mlm:', AIC(mlm)))

summary(glm(desert ~ Birth.Rate +
General.Fertility.Rate +
Low.Birth.Weight +
Prenatal.Care.Beginning.in.First.Trimester +
Preterm.Births +
Teen.Birth.Rate +
Assault..Homicide. +
Breast.cancer.in.females +
Cancer..All.Sites. +
Colorectal.Cancer +
Diabetes.related +
Firearm.related +
Infant.Mortality.Rate +
Lung.Cancer +
Prostate.Cancer.in.Males +
Stroke..Cerebrovascular.Disease. +
Tuberculosis +
Below.Poverty.Level +
Crowded.Housing +
Dependency +
No.High.School.Diploma +
Per.Capita.Income +
Unemployment,
data = model_data_scale,
family = 'binomial'))
Binary file added np
Binary file not shown.
8 changes: 4 additions & 4 deletions paper.aux
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
\newlabel{ppresult}{{3}{8}}
\@writefile{lot}{\contentsline {table}{\numberline {4}{\ignorespaces Hierarchical Model Summary}}{8}}
\newlabel{mlm}{{4}{8}}
\@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces Model AICs}}{9}}
\newlabel{AICs}{{5}{9}}
\@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Model Cross Validated MSEs}}{9}}
\newlabel{MSEs}{{6}{9}}
\@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces Model AICs}}{8}}
\newlabel{AICs}{{5}{8}}
\@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Model Cross Validated MSEs}}{8}}
\newlabel{MSEs}{{6}{8}}
28 changes: 15 additions & 13 deletions paper.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex 2016.5.22) 22 NOV 2016 15:25
This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex 2016.5.22) 22 NOV 2016 18:30
entering extended mode
restricted \write18 enabled.
file:line:error style messages enabled.
Expand Down Expand Up @@ -306,7 +306,7 @@ Underfull \hbox (badness 10000) in paragraph at lines 104--105
Here is how much of TeX's memory you used:
2630 strings out of 493014
36230 string characters out of 6133351
113228 words of memory out of 5000000
115228 words of memory out of 5000000
6147 multiletter control sequences out of 15000+600000
9369 words of font info for 34 fonts, out of 8000000 for 9000
1141 hyphenation exceptions out of 8191
Expand All @@ -316,18 +316,20 @@ r/local/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx10.pfb></usr
/local/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx12.pfb></usr/
local/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmex10.pfb></usr/l
ocal/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/lo
cal/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi7.pfb></usr/loca
l/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb></usr/local/
texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb></usr/local/te
xlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr17.pfb></usr/local/texl
ive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr7.pfb></usr/local/texlive
/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy10.pfb></usr/local/texlive/
2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy7.pfb></usr/local/texlive/20
16/texmf-dist/fonts/type1/public/amsfonts/cm/cmti10.pfb>
Output written on paper.pdf (11 pages, 7826959 bytes).
cal/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi5.pfb></usr/loca
l/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi7.pfb></usr/local/
texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb></usr/local/te
xlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb></usr/local/texl
ive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr17.pfb></usr/local/texliv
e/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr5.pfb></usr/local/texlive/2
016/texmf-dist/fonts/type1/public/amsfonts/cm/cmr7.pfb></usr/local/texlive/2016
/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy10.pfb></usr/local/texlive/2016/
texmf-dist/fonts/type1/public/amsfonts/cm/cmsy7.pfb></usr/local/texlive/2016/te
xmf-dist/fonts/type1/public/amsfonts/cm/cmti10.pfb>
Output written on paper.pdf (11 pages, 7843038 bytes).
PDF statistics:
99 PDF objects out of 1000 (max. 8388607)
63 compressed objects within 1 object stream
107 PDF objects out of 1000 (max. 8388607)
69 compressed objects within 1 object stream
0 named destinations out of 1000 (max. 500000)
26 words of extra memory for PDF output out of 10000 (max. 10000000)

Binary file modified paper.pdf
Binary file not shown.
22 changes: 13 additions & 9 deletions paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,6 @@ \subsubsection*{Neighborhood level data}
\paragraph{ Race by Community Area }
This file contains a record for every neighborhood in Chicago with the number of residents of each race who reside in that neighborhood. \\

We tried to gather data on crimes and use that information in the model, however the available dataset for crimes in chicago is rather large ($>$1 GB) and we didn't have time to finish extracting features from that model. We hypothesized that food deserts were more likely to be in high crime areas.

\subsection*{Generalized Linear Models}

Expand All @@ -124,33 +123,35 @@ \subsubsection*{Complete Pooling}

To begin we have the simplest model: ordinary regression using only the block-level variables. This model pools together every neighborhood as if the neighborhood distinctions don't matter.

$$ y_{ij} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \epsilon_{ij} \right) $$
$$ y_{ij} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} \right) $$

Where $\epsilon_{ij} \sim N(0, \sigma^2)$
% Where $\epsilon_{ij} \sim N(0, \sigma^2)$

\subsubsection*{No Pooling}

The next model has a different but nonrandom intercept for each neighborhood, a fixed effect for that neighborhood. This would correspond to our belief that the neighborhoods are each different from the others.

$$ y_{i} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \gamma_j + \epsilon_{ij} \right) $$
$$ y_{i} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \gamma_j \right) $$

Where $\epsilon_i \sim N(0, \sigma^2)$
% Where $\epsilon_i \sim N(0, \sigma^2)$

\subsubsection*{Partial pooling}

The next model has a random intercept for each neighborhood which corresponds to partially pooling the data together. For every neighborhood we use some of the information in other neighborhoods to estimate its intercept. That is, the intercepts in the previous model are shrunk toward the common mean.

$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} + \epsilon_i \right) $$
$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} \right) $$

Where $\epsilon_i \sim N(0, \sigma^2)$ and $\alpha_j \sim N(0, \sigma^2_\alpha)$
Where % $\epsilon_i \sim N(0, \sigma^2)$ and
$\alpha_j \sim N(\mu_\alpha, \sigma^2_\alpha)$

\subsubsection*{Hierarchical}

The final and most complicated model that was fit was a hierarchical model including the neighborhood level predictors in estimating the random intercept for each neighborhood.

$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} + \epsilon_i \right) $$
$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} \right) $$

Where $\epsilon_i \sim N(0, \sigma^2)$ and $\alpha_j \sim N(X_N \beta_N, \sigma^2_\alpha)$
Where % $\epsilon_i \sim N(0, \sigma^2)$ and
$\alpha_j \sim N(X_N \beta_N, \sigma^2_\alpha)$


\subsection*{Model Comparison}
Expand Down Expand Up @@ -544,6 +545,8 @@ \subsection*{Model Comparison}
\end{tabular}
\end{table}

The variance ratio $\frac{\sigma^2_alpha}{\sigma^2_y}$ in the hierarchical model is approximately 2.97 which means that for a neighborhood of more than 1/2.97 $\approx$ .33 city blocks, the within neighborhood model is more informative. This indicates that pooling and hierarchical structure may not be completely effective.

\section*{Conclusions}

In terms of cross validated accuracy: the hierarchical model was more accurate on average on new city blocks than the other 3 models indicating support for the hierarchical structure of the data. However, the evidence was not as strong as the author would have liked. Consider the model summary in table \ref{mlm}. We see that food deserts tend to be located in neighborhoods with higher incidences of all site cancer. Perhaps surprisingly, in the prescence of the other information, a block in a neighborhood with higher incidences of diabetes was less likely to be in a food desert. City blocks in neighborhoods that are more populous (TOTAL.POPULATION) are less likely to be food deserts. Finally, blocks in neighborhoods with higher rates of dependency (\% of the population younger than 18 or older than 64) are more likely to be in food deserts.
Expand All @@ -555,5 +558,6 @@ \section*{Conclusions}
\section*{Future Work}
Some issues due to not having data from grocery stores outside the city limits, could affect food desert status of city blocks near the borders.

Additionally: the model isn't great and much more work would have to be done in order to make it useful.

\end{document}
Binary file added pp
Binary file not shown.
133 changes: 133 additions & 0 deletions presentation.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
title: "Food Deserts in Chicago"
author: "Daniel Berry"
output: revealjs::revealjs_presentation

---

# Introduction

## What is a Food Desert?
- In general: a place where it is more difficult to access healthy food
- Defiition used for this work: a city block located more than 1 mile from a supermarket
- Supermarket defined as a grocery store larger than 10000 sq ft
- Distance is great circle distance between center of grocery store and center of city block
- Other definitions exist that also cover rural areas and take into account car ownership (harder to travel w/o a car)

## Where are food deserts in Chicago

```{r, out.width = "600px", echo = FALSE}
knitr::include_graphics("deserts_plot.png")
```

# Chicago Demographics

## Distribution of Black People

```{r, out.width = "600px", echo = FALSE}
knitr::include_graphics("pct_black_plot.png")
```

## Distribution of White People

```{r, out.width = "600px", echo = FALSE}
knitr::include_graphics("pct_white_plot.png")
```

## Income

```{r, out.width = "600px", echo = FALSE}
knitr::include_graphics("income_plot.png")
```

## Vacancy
```{r, out.width = "600px", echo = FALSE}
knitr::include_graphics("vacant_plot.png")
```

# Data

## Source
All data from the [Chicago Open Data Portal](https://data.cityofchicago.org). Several files:

- Crimes 2001 - Present
- 311 Service Requests - Vacant Buildings
- CTA Ridership Avg Weekly Boardings Oct 2010
- City Block Population
- Public Health Statistics Selected Indicators
- Census Data: Selected Socioeconomic Indicators
- Race by Community Area

# Models

## Complete Pooling

$$ y_{ij} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} \right) $$

## No Pooling

$$ y_{i} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \gamma_j \right) $$

## Partial Pooling

$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} \right) $$

Where $\alpha_j \sim N(\mu_\alpha, \sigma^2_\alpha)$

## Hierarchical

$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} + \epsilon_i \right) $$
Where $\alpha_j \sim N(X_N \beta_N, \sigma^2_\alpha)$

# Model summaries

## Complete Pooling

```{r, echo = F}
library(lme4)
load('cp')
summary(cp)
```

## No Pooling

```{r, echo = F}
load('np')
summary(np)
```

## Partial Pooling

```{r, echo = F}
load('pp')
summary(pp)
```

## Hierarchical

```{r, echo = F}
load('mlm')
summary(mlm)
```

# Results

## Was pooling effective?

- Variance ratio $\approx 3$ indicates much higher variability within a neighborhood than between neighborhoods.

- AICs for random intercept models were higher than no pooling model.

- Cross validated MSEs (or [Brier Scores](https://en.wikipedia.org/wiki/Brier_score)) were a way to quantify how accurate we are on previously unseen city blocks within a neighborhood:
- Complete Pooling: 0.07382216
- No Pooling: 0.05328587
- Partial Pooling: 0.05329956
- Hierarchical: 0.05323632

## Thoughts

- Models are an improvement over just using neighborhood level variables, but I'm not convinced that the hierarchical model is an improvement over simpler no pooling or partial pooling.

- Unfortunately this project doesn't really give us any more information about the causes of food deserts that we didn't really know before. More importantly this project doesn't help resolve the issue at all.

# Questions?
497 changes: 497 additions & 0 deletions presentation.html

Large diffs are not rendered by default.

0 comments on commit be43b30

Please sign in to comment.