diff --git a/cp b/cp new file mode 100644 index 0000000..295ce1a Binary files /dev/null and b/cp differ diff --git a/mlm b/mlm new file mode 100644 index 0000000..dd84aff Binary files /dev/null and b/mlm differ diff --git a/model.r b/model.r index 40201fd..9d6be05 100644 --- a/model.r +++ b/model.r @@ -622,3 +622,29 @@ mlm <- glmer(desert ~ CTA_counts + crime + vacant_counts + print(paste('AIC mlm:', AIC(mlm))) + +summary(glm(desert ~ Birth.Rate + + General.Fertility.Rate + + Low.Birth.Weight + + Prenatal.Care.Beginning.in.First.Trimester + + Preterm.Births + + Teen.Birth.Rate + + Assault..Homicide. + + Breast.cancer.in.females + + Cancer..All.Sites. + + Colorectal.Cancer + + Diabetes.related + + Firearm.related + + Infant.Mortality.Rate + + Lung.Cancer + + Prostate.Cancer.in.Males + + Stroke..Cerebrovascular.Disease. + + Tuberculosis + + Below.Poverty.Level + + Crowded.Housing + + Dependency + + No.High.School.Diploma + + Per.Capita.Income + + Unemployment, + data = model_data_scale, + family = 'binomial')) diff --git a/np b/np new file mode 100644 index 0000000..a76dd55 Binary files /dev/null and b/np differ diff --git a/paper.aux b/paper.aux index fba9a62..b1d5010 100644 --- a/paper.aux +++ b/paper.aux @@ -24,7 +24,7 @@ \newlabel{ppresult}{{3}{8}} \@writefile{lot}{\contentsline {table}{\numberline {4}{\ignorespaces Hierarchical Model Summary}}{8}} \newlabel{mlm}{{4}{8}} -\@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces Model AICs}}{9}} -\newlabel{AICs}{{5}{9}} -\@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Model Cross Validated MSEs}}{9}} -\newlabel{MSEs}{{6}{9}} +\@writefile{lot}{\contentsline {table}{\numberline {5}{\ignorespaces Model AICs}}{8}} +\newlabel{AICs}{{5}{8}} +\@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Model Cross Validated MSEs}}{8}} +\newlabel{MSEs}{{6}{8}} diff --git a/paper.log b/paper.log index ba90791..d7d47b9 100644 --- a/paper.log +++ b/paper.log @@ -1,4 +1,4 @@ -This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex 2016.5.22) 22 NOV 2016 15:25 +This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex 2016.5.22) 22 NOV 2016 18:30 entering extended mode restricted \write18 enabled. file:line:error style messages enabled. @@ -306,7 +306,7 @@ Underfull \hbox (badness 10000) in paragraph at lines 104--105 Here is how much of TeX's memory you used: 2630 strings out of 493014 36230 string characters out of 6133351 - 113228 words of memory out of 5000000 + 115228 words of memory out of 5000000 6147 multiletter control sequences out of 15000+600000 9369 words of font info for 34 fonts, out of 8000000 for 9000 1141 hyphenation exceptions out of 8191 @@ -316,18 +316,20 @@ r/local/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx10.pfb> -Output written on paper.pdf (11 pages, 7826959 bytes). +cal/texlive/2016/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi5.pfb> +Output written on paper.pdf (11 pages, 7843038 bytes). PDF statistics: - 99 PDF objects out of 1000 (max. 8388607) - 63 compressed objects within 1 object stream + 107 PDF objects out of 1000 (max. 8388607) + 69 compressed objects within 1 object stream 0 named destinations out of 1000 (max. 500000) 26 words of extra memory for PDF output out of 10000 (max. 10000000) diff --git a/paper.pdf b/paper.pdf index b948dbc..528e10e 100644 Binary files a/paper.pdf and b/paper.pdf differ diff --git a/paper.tex b/paper.tex index 05a1681..e3ae6dc 100644 --- a/paper.tex +++ b/paper.tex @@ -103,7 +103,6 @@ \subsubsection*{Neighborhood level data} \paragraph{ Race by Community Area } This file contains a record for every neighborhood in Chicago with the number of residents of each race who reside in that neighborhood. \\ -We tried to gather data on crimes and use that information in the model, however the available dataset for crimes in chicago is rather large ($>$1 GB) and we didn't have time to finish extracting features from that model. We hypothesized that food deserts were more likely to be in high crime areas. \subsection*{Generalized Linear Models} @@ -124,33 +123,35 @@ \subsubsection*{Complete Pooling} To begin we have the simplest model: ordinary regression using only the block-level variables. This model pools together every neighborhood as if the neighborhood distinctions don't matter. -$$ y_{ij} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \epsilon_{ij} \right) $$ +$$ y_{ij} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} \right) $$ -Where $\epsilon_{ij} \sim N(0, \sigma^2)$ +% Where $\epsilon_{ij} \sim N(0, \sigma^2)$ \subsubsection*{No Pooling} The next model has a different but nonrandom intercept for each neighborhood, a fixed effect for that neighborhood. This would correspond to our belief that the neighborhoods are each different from the others. -$$ y_{i} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \gamma_j + \epsilon_{ij} \right) $$ +$$ y_{i} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \gamma_j \right) $$ -Where $\epsilon_i \sim N(0, \sigma^2)$ +% Where $\epsilon_i \sim N(0, \sigma^2)$ \subsubsection*{Partial pooling} The next model has a random intercept for each neighborhood which corresponds to partially pooling the data together. For every neighborhood we use some of the information in other neighborhoods to estimate its intercept. That is, the intercepts in the previous model are shrunk toward the common mean. -$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} + \epsilon_i \right) $$ +$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} \right) $$ -Where $\epsilon_i \sim N(0, \sigma^2)$ and $\alpha_j \sim N(0, \sigma^2_\alpha)$ +Where % $\epsilon_i \sim N(0, \sigma^2)$ and +$\alpha_j \sim N(\mu_\alpha, \sigma^2_\alpha)$ \subsubsection*{Hierarchical} The final and most complicated model that was fit was a hierarchical model including the neighborhood level predictors in estimating the random intercept for each neighborhood. -$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} + \epsilon_i \right) $$ +$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} \right) $$ -Where $\epsilon_i \sim N(0, \sigma^2)$ and $\alpha_j \sim N(X_N \beta_N, \sigma^2_\alpha)$ +Where % $\epsilon_i \sim N(0, \sigma^2)$ and +$\alpha_j \sim N(X_N \beta_N, \sigma^2_\alpha)$ \subsection*{Model Comparison} @@ -544,6 +545,8 @@ \subsection*{Model Comparison} \end{tabular} \end{table} +The variance ratio $\frac{\sigma^2_alpha}{\sigma^2_y}$ in the hierarchical model is approximately 2.97 which means that for a neighborhood of more than 1/2.97 $\approx$ .33 city blocks, the within neighborhood model is more informative. This indicates that pooling and hierarchical structure may not be completely effective. + \section*{Conclusions} In terms of cross validated accuracy: the hierarchical model was more accurate on average on new city blocks than the other 3 models indicating support for the hierarchical structure of the data. However, the evidence was not as strong as the author would have liked. Consider the model summary in table \ref{mlm}. We see that food deserts tend to be located in neighborhoods with higher incidences of all site cancer. Perhaps surprisingly, in the prescence of the other information, a block in a neighborhood with higher incidences of diabetes was less likely to be in a food desert. City blocks in neighborhoods that are more populous (TOTAL.POPULATION) are less likely to be food deserts. Finally, blocks in neighborhoods with higher rates of dependency (\% of the population younger than 18 or older than 64) are more likely to be in food deserts. @@ -555,5 +558,6 @@ \section*{Conclusions} \section*{Future Work} Some issues due to not having data from grocery stores outside the city limits, could affect food desert status of city blocks near the borders. +Additionally: the model isn't great and much more work would have to be done in order to make it useful. \end{document} \ No newline at end of file diff --git a/pp b/pp new file mode 100644 index 0000000..74ec993 Binary files /dev/null and b/pp differ diff --git a/presentation.Rmd b/presentation.Rmd new file mode 100644 index 0000000..7d50f9f --- /dev/null +++ b/presentation.Rmd @@ -0,0 +1,133 @@ +--- +title: "Food Deserts in Chicago" +author: "Daniel Berry" +output: revealjs::revealjs_presentation + +--- + +# Introduction + +## What is a Food Desert? +- In general: a place where it is more difficult to access healthy food +- Defiition used for this work: a city block located more than 1 mile from a supermarket + - Supermarket defined as a grocery store larger than 10000 sq ft + - Distance is great circle distance between center of grocery store and center of city block + - Other definitions exist that also cover rural areas and take into account car ownership (harder to travel w/o a car) + +## Where are food deserts in Chicago + +```{r, out.width = "600px", echo = FALSE} +knitr::include_graphics("deserts_plot.png") +``` + +# Chicago Demographics + +## Distribution of Black People + +```{r, out.width = "600px", echo = FALSE} +knitr::include_graphics("pct_black_plot.png") +``` + +## Distribution of White People + +```{r, out.width = "600px", echo = FALSE} +knitr::include_graphics("pct_white_plot.png") +``` + +## Income + +```{r, out.width = "600px", echo = FALSE} +knitr::include_graphics("income_plot.png") +``` + +## Vacancy +```{r, out.width = "600px", echo = FALSE} +knitr::include_graphics("vacant_plot.png") +``` + +# Data + +## Source +All data from the [Chicago Open Data Portal](https://data.cityofchicago.org). Several files: + +- Crimes 2001 - Present +- 311 Service Requests - Vacant Buildings +- CTA Ridership Avg Weekly Boardings Oct 2010 +- City Block Population +- Public Health Statistics Selected Indicators +- Census Data: Selected Socioeconomic Indicators +- Race by Community Area + +# Models + +## Complete Pooling + +$$ y_{ij} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} \right) $$ + +## No Pooling + +$$ y_{i} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \gamma_j \right) $$ + +## Partial Pooling + +$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} \right) $$ + +Where $\alpha_j \sim N(\mu_\alpha, \sigma^2_\alpha)$ + +## Hierarchical + +$$ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} + \epsilon_i \right) $$ +Where $\alpha_j \sim N(X_N \beta_N, \sigma^2_\alpha)$ + +# Model summaries + +## Complete Pooling + +```{r, echo = F} +library(lme4) +load('cp') +summary(cp) +``` + +## No Pooling + +```{r, echo = F} +load('np') +summary(np) +``` + +## Partial Pooling + +```{r, echo = F} +load('pp') +summary(pp) +``` + +## Hierarchical + +```{r, echo = F} +load('mlm') +summary(mlm) +``` + +# Results + +## Was pooling effective? + +- Variance ratio $\approx 3$ indicates much higher variability within a neighborhood than between neighborhoods. + +- AICs for random intercept models were higher than no pooling model. + +- Cross validated MSEs (or [Brier Scores](https://en.wikipedia.org/wiki/Brier_score)) were a way to quantify how accurate we are on previously unseen city blocks within a neighborhood: + - Complete Pooling: 0.07382216 + - No Pooling: 0.05328587 + - Partial Pooling: 0.05329956 + - Hierarchical: 0.05323632 + +## Thoughts + +- Models are an improvement over just using neighborhood level variables, but I'm not convinced that the hierarchical model is an improvement over simpler no pooling or partial pooling. + +- Unfortunately this project doesn't really give us any more information about the causes of food deserts that we didn't really know before. More importantly this project doesn't help resolve the issue at all. + +# Questions? diff --git a/presentation.html b/presentation.html new file mode 100644 index 0000000..b49c13a --- /dev/null +++ b/presentation.html @@ -0,0 +1,497 @@ + + + + + + + Food Deserts in Chicago + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+

Food Deserts in Chicago

+

Daniel Berry

+
+ +

Introduction

+

What is a Food Desert?

+
    +
  • In general: a place where it is more difficult to access healthy food
  • +
  • Defiition used for this work: a city block located more than 1 mile from a supermarket +
      +
    • Supermarket defined as a grocery store larger than 10000 sq ft
    • +
    • Distance is great circle distance between center of grocery store and center of city block
    • +
    • Other definitions exist that also cover rural areas and take into account car ownership (harder to travel w/o a car)
    • +
  • +
+
+

Where are food deserts in Chicago

+

+
+

Chicago Demographics

+

Distribution of Black People

+

+
+

Distribution of White People

+

+
+

Income

+

+
+

Vacancy

+

+
+

Data

+

Source

+

All data from the Chicago Open Data Portal. Several files:

+
    +
  • Crimes 2001 - Present
  • +
  • 311 Service Requests - Vacant Buildings
  • +
  • CTA Ridership Avg Weekly Boardings Oct 2010
  • +
  • City Block Population
  • +
  • Public Health Statistics Selected Indicators
  • +
  • Census Data: Selected Socioeconomic Indicators
  • +
  • Race by Community Area
  • +
+
+

Models

+

Complete Pooling

+

\[ y_{ij} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} \right) \]

+
+

No Pooling

+

\[ y_{i} = \text{logit}^{-1}\left( \alpha + X_{B}\beta_{B} + \gamma_j \right) \]

+
+

Partial Pooling

+

\[ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} \right) \]

+

Where \(\alpha_j \sim N(\mu_\alpha, \sigma^2_\alpha)\)

+
+

Hierarchical

+

\[ y_{i} = \text{logit}^{-1}\left( \alpha_{j[i]} + X_{B}\beta_{B} + \epsilon_i \right) \] Where \(\alpha_j \sim N(X_N \beta_N, \sigma^2_\alpha)\)

+
+

Model summaries

+

Complete Pooling

+
## 
+## Call:
+## glm(formula = desert ~ CTA_counts + vacant_counts + crime, family = "binomial", 
+##     data = model_data_scale)
+## 
+## Deviance Residuals: 
+##     Min       1Q   Median       3Q      Max  
+## -1.0109  -0.5023  -0.3341  -0.1989   3.4291  
+## 
+## Coefficients:
+##               Estimate Std. Error z value Pr(>|z|)    
+## (Intercept)   -2.84947    0.02851 -99.959  < 2e-16 ***
+## CTA_counts    -1.58807    0.04031 -39.392  < 2e-16 ***
+## vacant_counts  0.40999    0.01766  23.218  < 2e-16 ***
+## crime          0.05639    0.01904   2.962  0.00306 ** 
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## (Dispersion parameter for binomial family taken to be 1)
+## 
+##     Null deviance: 22117  on 36869  degrees of freedom
+## Residual deviance: 19819  on 36866  degrees of freedom
+## AIC: 19827
+## 
+## Number of Fisher Scoring iterations: 6
+
+

No Pooling

+
## 
+## Call:
+## glm(formula = desert ~ CTA_counts + vacant_counts + crime + Neighborhood, 
+##     family = "binomial", data = model_data_scale)
+## 
+## Deviance Residuals: 
+##     Min       1Q   Median       3Q      Max  
+## -2.3398  -0.3087  -0.0001   0.0000   3.6919  
+## 
+## Coefficients:
+##                                      Estimate Std. Error z value Pr(>|z|)
+## (Intercept)                        -2.068e+01  7.574e+02  -0.027    0.978
+## CTA_counts                         -7.828e-01  6.087e-02 -12.860  < 2e-16
+## vacant_counts                      -4.084e-01  3.795e-02 -10.761  < 2e-16
+## crime                               1.011e-01  2.316e-02   4.364 1.28e-05
+## NeighborhoodArcher Heights         -3.755e-01  1.446e+03   0.000    1.000
+## NeighborhoodArmour Square           1.793e+01  7.574e+02   0.024    0.981
+## NeighborhoodAshburn                 1.832e+01  7.574e+02   0.024    0.981
+## NeighborhoodAuburn Gresham          1.414e+01  7.574e+02   0.019    0.985
+## NeighborhoodAustin                  1.695e+01  7.574e+02   0.022    0.982
+## NeighborhoodAvalon Park             1.600e+01  7.574e+02   0.021    0.983
+## NeighborhoodAvondale                4.744e-01  1.120e+03   0.000    1.000
+## NeighborhoodBelmont Cragin         -2.183e-01  9.506e+02   0.000    1.000
+## NeighborhoodBeverly                 1.702e+01  7.574e+02   0.022    0.982
+## NeighborhoodBridgeport              1.619e+01  7.574e+02   0.021    0.983
+## NeighborhoodBrighton Park          -3.120e-01  1.089e+03   0.000    1.000
+## NeighborhoodBurnside               -2.586e-02  2.105e+03   0.000    1.000
+## NeighborhoodCalumet Heights         1.551e+01  7.574e+02   0.020    0.984
+## NeighborhoodChatham                 1.763e+01  7.574e+02   0.023    0.981
+## NeighborhoodChicago Lawn            1.757e+01  7.574e+02   0.023    0.981
+## NeighborhoodClearing                2.017e+01  7.574e+02   0.027    0.979
+## NeighborhoodDouglas                 2.585e-01  1.787e+03   0.000    1.000
+## NeighborhoodDunning                 1.842e+01  7.574e+02   0.024    0.981
+## NeighborhoodEast Side              -9.362e-01  1.160e+03  -0.001    0.999
+## NeighborhoodEdgewater               3.130e-01  1.091e+03   0.000    1.000
+## NeighborhoodEdison Park             1.636e+01  7.574e+02   0.022    0.983
+## NeighborhoodEnglewood               1.967e+01  7.574e+02   0.026    0.979
+## NeighborhoodForest Glen             1.781e+01  7.574e+02   0.024    0.981
+## NeighborhoodFuller Park             1.853e+01  7.574e+02   0.024    0.980
+## NeighborhoodGage Park              -5.577e-03  1.116e+03   0.000    1.000
+## NeighborhoodGarfield Park           2.044e+01  7.574e+02   0.027    0.978
+## NeighborhoodGarfield Ridge          1.839e+01  7.574e+02   0.024    0.981
+## NeighborhoodGrand Boulevard         1.824e+01  7.574e+02   0.024    0.981
+## NeighborhoodGreater Grand Crossing  1.179e+00  1.060e+03   0.001    0.999
+## NeighborhoodHegewisch               2.201e+01  7.574e+02   0.029    0.977
+## NeighborhoodHermosa                -7.616e-02  1.318e+03   0.000    1.000
+## NeighborhoodHumboldt Park           1.437e+01  7.574e+02   0.019    0.985
+## NeighborhoodHyde Park               3.159e-01  1.585e+03   0.000    1.000
+## NeighborhoodIrving Park            -5.197e-02  1.000e+03   0.000    1.000
+## NeighborhoodJefferson Park         -6.124e-01  1.083e+03  -0.001    1.000
+## NeighborhoodKenwood                 1.622e-01  1.831e+03   0.000    1.000
+## NeighborhoodLake View               5.727e-01  9.968e+02   0.001    1.000
+## NeighborhoodLincoln Park            2.841e-01  1.091e+03   0.000    1.000
+## NeighborhoodLincoln Square          1.524e-01  1.087e+03   0.000    1.000
+## NeighborhoodLogan Square            3.736e-01  9.286e+02   0.000    1.000
+## NeighborhoodLoop                    4.593e+00  1.627e+03   0.003    0.998
+## NeighborhoodLower West Side        -1.791e-01  1.141e+03   0.000    1.000
+## NeighborhoodMcKinley Park          -4.250e-01  1.387e+03   0.000    1.000
+## NeighborhoodMontclaire             -6.654e-01  1.428e+03   0.000    1.000
+## NeighborhoodMorgan Park             1.430e+01  7.574e+02   0.019    0.985
+## NeighborhoodMount Greenwood        -1.035e+00  1.241e+03  -0.001    0.999
+## NeighborhoodNear North Side         1.286e+00  1.065e+03   0.001    0.999
+## NeighborhoodNear South Side         2.026e-01  1.645e+03   0.000    1.000
+## NeighborhoodNear West Side          1.981e+01  7.574e+02   0.026    0.979
+## NeighborhoodNew City                1.753e+01  7.574e+02   0.023    0.982
+## NeighborhoodNorth Center            2.488e-02  1.073e+03   0.000    1.000
+## NeighborhoodNorth Lawndale          2.014e+01  7.574e+02   0.027    0.979
+## NeighborhoodNorth Park             -4.848e-01  1.314e+03   0.000    1.000
+## NeighborhoodNorwood Park            1.586e+01  7.574e+02   0.021    0.983
+## NeighborhoodO'Hare                  1.737e+01  7.574e+02   0.023    0.982
+## NeighborhoodOakland                -3.329e-01  2.336e+03   0.000    1.000
+## NeighborhoodPortage Park            1.700e+01  7.574e+02   0.022    0.982
+## NeighborhoodPullman                 2.048e+01  7.574e+02   0.027    0.978
+## NeighborhoodRiverdale              -9.502e-01  2.643e+03   0.000    1.000
+## NeighborhoodRogers Park            -1.802e-01  1.140e+03   0.000    1.000
+## NeighborhoodRoseland                2.114e+01  7.574e+02   0.028    0.978
+## NeighborhoodSouth Chicago           4.705e-01  1.066e+03   0.000    1.000
+## NeighborhoodSouth Deering           1.846e+01  7.574e+02   0.024    0.981
+## NeighborhoodSouth Lawndale         -6.613e-02  1.038e+03   0.000    1.000
+## NeighborhoodSouth Shore             1.490e+01  7.574e+02   0.020    0.984
+## NeighborhoodUptown                  8.448e-01  1.290e+03   0.001    0.999
+## NeighborhoodWashington Heights      1.734e+01  7.574e+02   0.023    0.982
+## NeighborhoodWashington Park         1.985e+01  7.574e+02   0.026    0.979
+## NeighborhoodWest Elsdon            -3.566e-01  1.287e+03   0.000    1.000
+## NeighborhoodWest Lawn               1.803e+01  7.574e+02   0.024    0.981
+## NeighborhoodWest Pullman            2.043e+01  7.574e+02   0.027    0.978
+## NeighborhoodWest Ridge              1.843e+01  7.574e+02   0.024    0.981
+## NeighborhoodWest Town               5.658e-01  9.413e+02   0.001    1.000
+## NeighborhoodWoodlawn                1.751e+01  7.574e+02   0.023    0.982
+##                                       
+## (Intercept)                           
+## CTA_counts                         ***
+## vacant_counts                      ***
+## crime                              ***
+## NeighborhoodArcher Heights            
+## NeighborhoodArmour Square             
+## NeighborhoodAshburn                   
+## NeighborhoodAuburn Gresham            
+## NeighborhoodAustin                    
+## NeighborhoodAvalon Park               
+## NeighborhoodAvondale                  
+## NeighborhoodBelmont Cragin            
+## NeighborhoodBeverly                   
+## NeighborhoodBridgeport                
+## NeighborhoodBrighton Park             
+## NeighborhoodBurnside                  
+## NeighborhoodCalumet Heights           
+## NeighborhoodChatham                   
+## NeighborhoodChicago Lawn              
+## NeighborhoodClearing                  
+## NeighborhoodDouglas                   
+## NeighborhoodDunning                   
+## NeighborhoodEast Side                 
+## NeighborhoodEdgewater                 
+## NeighborhoodEdison Park               
+## NeighborhoodEnglewood                 
+## NeighborhoodForest Glen               
+## NeighborhoodFuller Park               
+## NeighborhoodGage Park                 
+## NeighborhoodGarfield Park             
+## NeighborhoodGarfield Ridge            
+## NeighborhoodGrand Boulevard           
+## NeighborhoodGreater Grand Crossing    
+## NeighborhoodHegewisch                 
+## NeighborhoodHermosa                   
+## NeighborhoodHumboldt Park             
+## NeighborhoodHyde Park                 
+## NeighborhoodIrving Park               
+## NeighborhoodJefferson Park            
+## NeighborhoodKenwood                   
+## NeighborhoodLake View                 
+## NeighborhoodLincoln Park              
+## NeighborhoodLincoln Square            
+## NeighborhoodLogan Square              
+## NeighborhoodLoop                      
+## NeighborhoodLower West Side           
+## NeighborhoodMcKinley Park             
+## NeighborhoodMontclaire                
+## NeighborhoodMorgan Park               
+## NeighborhoodMount Greenwood           
+## NeighborhoodNear North Side           
+## NeighborhoodNear South Side           
+## NeighborhoodNear West Side            
+## NeighborhoodNew City                  
+## NeighborhoodNorth Center              
+## NeighborhoodNorth Lawndale            
+## NeighborhoodNorth Park                
+## NeighborhoodNorwood Park              
+## NeighborhoodO'Hare                    
+## NeighborhoodOakland                   
+## NeighborhoodPortage Park              
+## NeighborhoodPullman                   
+## NeighborhoodRiverdale                 
+## NeighborhoodRogers Park               
+## NeighborhoodRoseland                  
+## NeighborhoodSouth Chicago             
+## NeighborhoodSouth Deering             
+## NeighborhoodSouth Lawndale            
+## NeighborhoodSouth Shore               
+## NeighborhoodUptown                    
+## NeighborhoodWashington Heights        
+## NeighborhoodWashington Park           
+## NeighborhoodWest Elsdon               
+## NeighborhoodWest Lawn                 
+## NeighborhoodWest Pullman              
+## NeighborhoodWest Ridge                
+## NeighborhoodWest Town                 
+## NeighborhoodWoodlawn                  
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## (Dispersion parameter for binomial family taken to be 1)
+## 
+##     Null deviance: 22117  on 36869  degrees of freedom
+## Residual deviance: 12814  on 36792  degrees of freedom
+## AIC: 12970
+## 
+## Number of Fisher Scoring iterations: 19
+
+

Partial Pooling

+
## Generalized linear mixed model fit by maximum likelihood (Laplace
+##   Approximation) [glmerMod]
+##  Family: binomial  ( logit )
+## Formula: desert ~ CTA_counts + vacant_counts + crime + (1 | Neighborhood)
+##    Data: model_data_scale
+## 
+##      AIC      BIC   logLik deviance df.resid 
+##  13139.4  13182.0  -6564.7  13129.4    36865 
+## 
+## Scaled residuals: 
+##     Min      1Q  Median      3Q     Max 
+## -3.7507 -0.2201 -0.0253 -0.0137 30.0781 
+## 
+## Random effects:
+##  Groups       Name        Variance Std.Dev.
+##  Neighborhood (Intercept) 15.97    3.996   
+## Number of obs: 36870, groups:  Neighborhood, 75
+## 
+## Fixed effects:
+##               Estimate Std. Error z value Pr(>|z|)    
+## (Intercept)   -6.30657    0.65670  -9.603  < 2e-16 ***
+## CTA_counts    -0.79401    0.06081 -13.058  < 2e-16 ***
+## vacant_counts -0.40397    0.03795 -10.644  < 2e-16 ***
+## crime          0.10091    0.02316   4.358 1.31e-05 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Correlation of Fixed Effects:
+##             (Intr) CTA_cn vcnt_c
+## CTA_counts   0.008              
+## vacant_cnts  0.026  0.067       
+## crime       -0.004 -0.007 -0.008
+
+

Hierarchical

+
## Generalized linear mixed model fit by maximum likelihood (Laplace
+##   Approximation) [glmerMod]
+##  Family: binomial  ( logit )
+## Formula: 
+## desert ~ CTA_counts + crime + vacant_counts + Cancer..All.Sites. +  
+##     Diabetes.related + Dependency + TOTAL.POPULATION + (1 | Neighborhood)
+##    Data: model_data_scale
+## Control: glmerControl(calc.derivs = FALSE, optCtrl = list(maxfun = 1000))
+## 
+##      AIC      BIC   logLik deviance df.resid 
+##  13099.2  13175.8  -6540.6  13081.2    36861 
+## 
+## Scaled residuals: 
+##     Min      1Q  Median      3Q     Max 
+## -3.8351 -0.2207 -0.0313 -0.0075 29.1992 
+## 
+## Random effects:
+##  Groups       Name        Variance Std.Dev.
+##  Neighborhood (Intercept) 9.784    3.128   
+## Number of obs: 36870, groups:  Neighborhood, 75
+## 
+## Fixed effects:
+##                    Estimate Std. Error z value Pr(>|z|)    
+## (Intercept)        -6.22074    0.52232 -11.910  < 2e-16 ***
+## CTA_counts         -0.77853    0.06099 -12.765  < 2e-16 ***
+## crime               0.10207    0.02317   4.406 1.05e-05 ***
+## vacant_counts      -0.41319    0.03791 -10.899  < 2e-16 ***
+## Cancer..All.Sites.  3.28898    0.85192   3.861 0.000113 ***
+## Diabetes.related   -1.95373    0.80128  -2.438 0.014758 *  
+## Dependency          1.15782    0.69514   1.666 0.095794 .  
+## TOTAL.POPULATION   -0.14705    0.04757  -3.091 0.001992 ** 
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Correlation of Fixed Effects:
+##             (Intr) CTA_cn crime  vcnt_c C..A.S Dbts.r Dpndnc
+## CTA_counts   0.028                                          
+## crime       -0.004 -0.005                                   
+## vacant_cnts  0.019  0.068 -0.008                            
+## Cncr..Al.S. -0.338 -0.007  0.003 -0.012                     
+## Diabts.rltd  0.382 -0.019 -0.001 -0.011 -0.755              
+## Dependency  -0.360  0.033 -0.001 -0.004 -0.150 -0.218       
+## TOTAL.POPUL  0.001 -0.063 -0.014  0.009  0.006 -0.006  0.010
+## convergence code: 0
+## failure to converge in 1000 evaluations
+
+

Results

+

Was pooling effective?

+
    +
  • Variance ratio \(\approx 3\) indicates much higher variability within a neighborhood than between neighborhoods.

  • +
  • AICs for random intercept models were higher than no pooling model.

  • +
  • Cross validated MSEs (or Brier Scores) were a way to quantify how accurate we are on previously unseen city blocks within a neighborhood: +
      +
    • Complete Pooling: 0.07382216
    • +
    • No Pooling: 0.05328587
    • +
    • Partial Pooling: 0.05329956
    • +
    • Hierarchical: 0.05323632
    • +
  • +
+
+

Thoughts

+
    +
  • Models are an improvement over just using neighborhood level variables, but I’m not convinced that the hierarchical model is an improvement over simpler no pooling or partial pooling.

  • +
  • Unfortunately this project doesn’t really give us any more information about the causes of food deserts that we didn’t really know before. More importantly this project doesn’t help resolve the issue at all.

  • +
+
+

Questions?

+
+
+ + + + + + + + + + + + +