index.Rmd

---
title: "rhodyRstats: GIS in R"
date: 2016-10-26
output: 
  html_notebook:
    toc: true
    toc_depth: 3
    toc_float: true
---

```{r setup, echo=FALSE, warning=FALSE, purl=FALSE, message=FALSE}
options(repos="http://cran.rstudio.com/")
pkgs <- c("sp","rgdal","rgeos","raster","knitr")
x<-lapply(pkgs, library, character.only = TRUE)
opts_chunk$set(tidy=T)
```
 
# Lesson 1: Setting up R to do GIS
Out of the box, R is not ready to do GIS analysis.  As such, we need to add a few packages that will provide most of the functionality you'd expect out of a GIS.  In this lesson we will introduce the bare minimum packages for doing GIS.

## Lesson Outline
- [Required packages](#required-packages)
- [Interacting with an external GIS](#interacting-with-an-external-gis)

## Lesson Exercises
- [Exercise 1.1](#exercise-11)

## Required packages
Within in the last several years there has been a lot of effort spent on adding spatial data handling and analysis capability to R.  Thanks to the very significant effort of the package authors we now have the foundation for doing GIS entirely within R. ***Aside:*** *R may not always be the best choice for a given GIS task, but at least now it is a possible choice.  This is a discussion though, for another time*

The four packages that provide this foundation are:

- [sp](https://cran.r-project.org/web/packages/sp/index.html)
- [rgdal](https://cran.r-project.org/web/packages/rgdal/index.html)
- [raster](https://cran.r-project.org/web/packages/raster/index.html)
- [rgeos](https://cran.r-project.org/web/packages/rgeos/index.html)

Let's dig in a bit deeper on each of these.

### sp
The `sp` package provides the primary spatial data structures for use in R.  Many other packages assume your data is stored as one of the `sp` data structures.  Getting into the details of these is beyond the scope of this workshop, but look at the [introduction to sp vignette for more details](https://cran.r-project.org/web/packages/sp/vignettes/intro_sp.pdf).  That being said, we will be working mostly with `SpatialPointsDataFrame` and `SpatialPolygonsDataFrame`.  More on that later.

Getting `sp` added is no different than adding any other package that is on CRAN.

```{r add_sp, eval=FALSE}
install.packages("sp")
library("sp")
```

### rgdal
The `rgdal` package provides tools for reading and writing multiple spatial data formats.  It is based on the [Geospatial Data Abstraction Library (GDAL)](http://www.gdal.org/) which is a project of the Open Source Geospatial Foundation (OSGeo).  The primary use of `rgdal` is to read various spatial data formats into R and store them as `sp` objects.  In this workshop, we will be only using `rgdal` to read in shape files, but it has utility far beyond that.  

As before, nothing special to get set up with `rgdal` on windows.  Simply:

```{r add_rgdal, eval=FALSE}
install.packages("rgdal")
library("rgdal")
```

Getting set up on Linux or Mac requires more effort (i.e. need to have GDAL installed).  As this is for a USEPA audience the windows installs will work for most.  Thus, discussion of this is mostly beyond the scope of this workshop.  

### raster
Although `sp` and `rgdal` provide raster data capabilities, they do require that the full raster dataset be read into memory.  This can have some performance implications as well as limits the size of datasets you can readily work with.  The `raster` package works around this by working with raster data on the disk.  That too has some performance implications, but for the most part, in my opinion anyway, `raster` makes it easier to work with raster data.  It also provides several additional functions for analyzing raster data.  

To install, just do: 

```{r add_raster, eval=FALSE}
install.packages("raster")
library("raster")
```

### rgeos
The last of the core packages for doing GIS in R is `rgeos`.  Like `rgdal`, `rgeos` is a project of OSgeo.  It is a wrapper around the [Geometry Engine Open Source](https://trac.osgeo.org/geos/) c++ library and provides a suite of tools for conducting vector GIS analyses.  

To install on windows

```{r add_rgeos, eval=FALSE}
install.packages("rgeos")
library("rgeos")
```

For Linux and Mac the GEOS library will also need to be available.  As with `rgdal` this is a bit beyond the scope of this workshop.  One item to note for US EPA Linux users.  The official Linux OS is Red Hat 6.  There have been reports of problems with the version of GEOS available for Red Hat 6.  If this applies to you, contact [me](mailto::hollister.jeff@epa.gov) for details on how to solve this (assuming I can remember how I did it).

## Exercise 1.1
The first exercise won't be too thrilling, but we need to make sure everyone has the four packages installed. 

1.) Install `sp` and load `sp` into your library.
2.) Repeat, with `rgdal`, `raster`, and `rgeos`.


## Interacting with an external GIS
Although we won't be working with external GIS in this workshop, there are several packages that provide ways to move back and forth from another GIS and R.  

- [spgrass6](https://cran.r-project.org/web/packages/spgrass6/index.html): Provides an interface between R and [GRASS 6+](https://grass.osgeo.org/download/software/#g64x).  Allows for running R from within GRASS as well as running GRASS from within R.  
- [rgrass7](https://cran.r-project.org/web/packages/rgrass7/index.html): Same as `spgrass6`, but for the latest version of GRASS, [GRASS 7](https://grass.osgeo.org/download/software/#g70x).
- [RPyGeo](https://cran.r-project.org/web/packages/RPyGeo/index.html): A wrapper for accessing ArcGIS from R.  Utilizes intermediate python scripts to fire up ArcGIS.  Hasn't been updated in some time.
- [RSAGA](https://cran.r-project.org/web/packages/RSAGA/index.html): R interface to the command line version of [SAGA GIS](http://www.saga-gis.org/en/index.html).

# Lesson 2: Reading and Writing Raster and Vector Data
So, now that we have the base packages installed and loaded we can work on getting our data into and out of R.  While it is possible to store spatial data as R objects (e.g. via .Rda/Rdata files) that is probably not the best approach.  It is better to store spatial data in widely used files (e.g. shapefiles,.tiff, or geojson) or in spatial databases (e.g. file geodatabse or PostGIS) and then read that data into R for analysis then write the results back out to your file format of choice.  In this lesson we will explore several ways to read and write multiple vector and raster data types.

## Lesson Outline
- [Vector data: shapefiles](#vector-data-shapefiles)
- [Vector data: file geodatabase](#vector-data-file-geodatabase])
- [Vector data: geojson](#vector-data-geojson)
- [Raster data: GeoTIFF](#raster-data-geotiff)
- [Raster data: ASCII](#raster-data-arcinfo-ascii)
- [Writing rasters](#writing-rasters)
- [Geospatial data packages](#geospatial-data-packages)

## Lesson Exercises
- [Exercise 2.1](#exercise-21)
- [Exercise 2.2](#exercise-22)

## Get the example data
For this workshop, I have collected several example datasets to use and have included them in this repository.  So, let's first grab the dataset.  It is stored as a zip file.  You can download it [directly from this link](https://github.com/USEPA/aed_r/blob/master/meetings/data.zip?raw=true), or we could use R.  I prefer to use the `httr` package because base `download.file` can act funny on different platforms.

```{r download_zip,eval=FALSE}
library(httr)
url <- "https://github.com/USEPA/aed_r/blob/master/meetings/data.zip?raw=true"
GET(url,write_disk("data.zip",overwrite = TRUE))
```

Oh and while we are being a bit #rstats crazy...  Let unzip it with R too!

```{r unzip_it,eval=FALSE}
unzip("data.zip",overwrite = TRUE)
```
```{r unzip_it2,eval=FALSE, echo=FALSE}
unzip("../data.zip",overwrite = TRUE)
```

## Vector data: shapefiles
For many, shapefiles are going to be the most common way to interact with spatial data.  In R, there are many ways to read in shapefiles.  We are going to focus on using `rgdal` because it is flexible and provides a common interface to multiple file types.  But to be fair, I'll also quickly show a another option from `maptools`.

### Reading in Shapfiles
To read in a shapefile using `rgdal`, we want to use the `readOGR` function.  This function is the primary way to interact with vector data using `rgdal`.  There are many arguments to this function, but the two you need are the "dsn" and "layer".  For a shapefile the "dsn" is the path (in our case probably "data") and the "layer" is the name of the shapefile without any extension.  The function call to read the DC Metro shapefile from our example data looks like: 

```{r read_shp}
dc_metro <- readOGR("data","Metro_Lines")
```

We will get more into working with `sp` object and visualizing spatial data later, but just to prove that this did something:

```{r metro_chk}
summary(dc_metro)
plot(dc_metro)
```

As I mentioned earlier, there are other ways to read in shapefiles.  For example:

```{r maptools}
dc_metro_mt<-maptools::readShapeLines("data/Metro_Lines")
summary(dc_metro_mt)
```
 
Couple of notes on this  First the `maptools` functions require that you know your geometry type, whereas, `readOGR` will get that from the data.  I did test to see if the the `maptools::readShapeLines` was any quicker than `rgdal::readOGR` and in my huge sample of one, it was. Lastly, `readShapeLines` is a one-trick pony.  It reads in shapefiles and that is it.  As we will see, `readOGR` works across a range of vector data types and thus, is what I would recommend for most vector data I/O tasks.

### Writing shapefiles

Writing shapefiles is just as easy as reading them, assuming you have an `sp` object to work with.  We will just show this using `writeOGR`.

Before we do this, we can prove that the shapefile doesn't exist.

```{r clean_it,echo=FALSE}
x<-file.remove(list.files("data","dc_metro",full.names = TRUE))
```
```{r noshape}
list.files("data","dc_metro")
```

Now to write the shapefile:

```{r write_shp}
writeOGR(dc_metro,"data","dc_metro",driver="ESRI Shapefile")

#Is it there?
list.files("data","dc_metro")
```

So same "dsn" and "layer" arguments as before.  Only difference is that the first argument is the `sp` object you want to write out to a shapefile.  

## Vector data: file geodatabase
A recent addition to the GDAL world is the ability to read ESRI File Geodatabases.  This is easy to access on windows as the latest version of GDAL is wrapped up as part of the `rgdal` install and thus you get access to the appropriate drivers.  This is a bit more challenging on Linux (even more so on the antiquated RHEL 6 that is EPAs approved OS) as you need to have GDAL 1.11.x +.  In any event, if you use file geodatabases, you can read those directly into R with readOGR. Difference here is the "dsn" is the name of the file geodatabase (with path info if needed), and the "layer" is the feature class.

```{r read_fgdb}
#List feature classes
ogrListLayers("data/spx.gdb")
examp_fgdb <- readOGR(dsn = "data/spx.gdb", layer="polygons5")
```

And to be sure it worked:

```{r check_gdb}
summary(examp_fgdb)
plot(examp_fgdb)
```

Writing to a file geodatabase from R is not yet possible.

## Vector data: geojson

Last vector example we will show is geojson.  For most desktop GIS users this will not be encountered too often, but as more and more GIS moves to the web, geojson will become increasingly common.  We will still rely on `readOGR` for the geojson.

### Reading in geojson

To read in with `rgdal` we use "dsn" and "layer" a bit differently.  The "dsn" is the name (and path) of the file, and "layer" is going to be set as "OGRGeoJSON". 

```{r read_geojson}
dc_metro_sttn <- readOGR("data/metrostations.geojson", "OGRGeoJSON")
```

And to see that something is there...
 
```{r check_geojson}
#Let's use the defualt print method 
dc_metro_sttn
```
```{r plot_geojson,eval=TRUE}
#And add a few more things to our plot
plot(dc_metro)
plot(dc_metro_sttn, col = "red", add=TRUE)
```

### Writing geojson

Just as with shapefiles, writing to a geojson file can be accomplished with `writeOGR`.

```{r write_geojson,eval=F}
writeOGR(dc_metro_sttn,dsn="stations.gejson",layer="dc_metro_sttn",driver="GeoJSON")
```

Lastly, if you commonly work with geojson files, there is the `geojsonio` package from [rOpenSci](https://ropensci.org/) that provides a number of tools for reading, writing, and converting geojson files.  It is certainly worth exploring as it provides additional functionality beyond the `rgdal` toolset.

## Exercise 2.1
For this first exercise we will just focus on getting a shapefile read into R.  We will be using the sticky notes I handed out to let me know who needs help and who has finished the exercise.  Once everyone is done, we will move on.

1. Using `rgdal::readOGR` to read in the US Census Tiger Line Files of the state boundaries (tl_2015_us_state).  Assign it to an object called `us_states`.
2. Once it is read in use `summary` to look at some of the basics and then plot the data. 

## Raster data: GeoTIFF
We will just show a couple of examples as reading in rasters is a bit more straightforward than vector.  Our first examples will be GeoTIFF.

I will show one example with `rgdal`, but then we are going to switch to using `raster` for the remainder of the examples.  We'll see why pretty quickly with this example.

The `rgdal` function for reading in rasters is `readGDAL`.  For rasters, it essentially has a single argument we need to worry about, "fname" which is the filename.

```{r readGDAL}
dc_elev_gdal <- readGDAL("data/dc_ned.tif")
raster::print(dc_elev_gdal) #using the raster print method
```

Using `raster` is just as easy

```{r raster}
dc_elev <- raster("data/dc_ned.tif")
dc_elev
```

So it wasn't too obvious, but if we look closer ...

```{r time}
system.time(readGDAL("data/dc_ned.tif"))
system.time(raster("data/dc_ned.tif"))
```

The speed here is due to the fact that `raster` actually leaves the data on disk as opposed to pulling it all into memory.  Some operations will actually be faster on the `SpatialGrid` objects, but with bigger rasters reading in can be a challenge.  In addition, a lot of the typical raster operations come from the `raster` package and, in my opinion, it is just a bit easier to work with `raster` objects as opposed to `sp` for this.  Lastly, it is what I prefer, so there's that.  We will stick with `raster` for the rest of the workshop.

## Raster data: ArcInfo ASCII

Just to show another example, let's look at ASCII.

```{r ascii_examp}
dc_elev_ascii <- raster("data/dc_ned.asc")
dc_elev_ascii
```

That is really it for reading in rasters.

## Writing rasters:
Writing out to a raster file is done with `writeRaster`.  It has three arguments, "x" which is the `raster` object, "filename" which is the output file, and "format" which is the output raster format.  In practice, you can usually get away with not specifying the format as `raster` will try to infer the file format from the file name.  If you want to see the possible formats you can use `writeFormats()`.

To write out to a GeoTIFF:

```{r write_rast}
writeRaster(dc_elev,"dc_elev_example.tif", overwrite = T)
```

## Exercise 2.2
For this exercise let's get some practice with reading in raster data using the `raster` function.

1. Read in "dc_nlcd.tif". Assign it to an object names `dc_nlcd`.
2. Plot the object to make sure everything is working well.


## Geospatial data packages
There are a few packages on CRAN that provide access to spatial data. While this isn't necessarily data I/O, it is somewhat related.  We won't go into details as the intreface and data types for these are unique to the packages and a bit different than the more generic approach we are working on here.  That being said, these are useful and you should be aware of them.

A couple of interesting examples.

- `maps`: This has been around for a while and is still actively maintained so it's a good first place to look for spatial data. Contains mostly boundary datasets (e.g. counties) and has both US and international data. 
- `USCensus2010`:  Provides access to census data directly in R.  I haven't dug into this one much, so can't say too much.  There is also a package for the 2000 census. 

# Lesson 3: Basic GIS Analysis with R
We now have the required packages installed and know how to read data into R. Our next step is to start doing some GIS analysis with R. Throughout the course of this lesson will show how to do some basic manipulation of the `raster` and `sp` objects and then show a few examples of relatively straightforward analyses.  We will only be scratching the surface here, but hopefully this will provide a starting point for more work doing spatial analysis in R.  ***Note:*** *Much of this lesson assumes some familiarity with R and working with data frames.*

## Lesson Outline
- [Explore and manipulate](#explore-and-manipulate)
- [Projections](#projections)
- [Brief introduction to rgeos](#brief-introduction-to-rgeos)
- [Working with rasters](#working-with-rasters)
- [Other geospatial packages](#other-geospatial-packages)

## Lesson Exercises
- [Exercise 3.1](#exercise-31)
- [Exercise 3.2](#exercise-32)
- [Exercise 3.3](#exercise-33)

## Explore and manipulate
One of the nice things about `SpatialXDataFrame` objects is that many of the tricks you know for working with data frames will also work.  This allows us to subset our spatial data, summarize data, etc. in a very R like way.

Let's start working through some examples using the two Metro datasets.

```{r, echo=FALSE}
dc_metro <- readOGR("data","Metro_Lines")
dc_metro_sttn <- readOGR("data/metrostations.geojson", "OGRGeoJSON")
dc_elev <- raster("data/dc_ned.tif")
```

We've already seen how to use the default print statements to look at the basics

```{r}
dc_metro
dc_metro_sttn
```

We can get more info on the data with:

```{r}
head(dc_metro_sttn)
summary(dc_metro_sttn)
names(dc_metro_sttn)
#Look at individual columns
dc_metro_sttn$NAME
```

And to get into the guts of the `sp` objects:

```{r}
str(dc_metro)
```

Yikes!

Now for the fun part.  We can use indexing/subsetting tools we already know to pull out individual features based on the data stored in the `sp` objects data frame.  For instance:

```{r}
#select with base indexing
est_mrkt <- dc_metro_sttn[dc_metro_sttn$NAME == "Eastern Market",]
est_mrkt
#select with subset (plus a Lil Rhody Shout Out!)
ri <- subset(dc_metro_sttn,NAME == "Rhode Island Ave")
ri
#select multiple items
red_line_sttn <- subset(dc_metro_sttn,grepl("red",LINE))
red_line_sttn
```

Adding data is just the same as for adding data to data frames.  I found some ridership data for the different stations and summarized that, by station, into "station_rides.csv".  Let's pull that in, and add it to `dc_metro_sttn`.  

```{r}
station_rides <- read.csv("data/station_rides.csv")
dc_metro_sttn<-merge(dc_metro_sttn,station_rides,
                     by.x="NAME",by.y="Ent.Station",all.x=TRUE)
head(dc_metro_sttn)
```

So, now we can use these values to select.

```{r}
busy_sttn <- subset(dc_metro_sttn,avg_wkday >= 10000)
busy_sttn
```


## Projections
Although many GIS provide project-on-the-fly (jwh editorial: WORST THING EVER), R does not.  To get our maps to work and analysis to be correct, we need to know how to modify the projections of our data so that they match up.  A description of projections is way beyond the scope of this workshop, but these links provide some good background info and details:

- [USGS](http://egsc.usgs.gov/isb//pubs/MapProjections/projections.html)
- [NCEAS](https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/OverviewCoordinateReferenceSystems.pdf)

And for more on projecting there's some good info in the [rOpenSci draft spatial data viz Task View](https://github.com/ropensci/maptools#projecting-data)

For our purposes we will be using `spTransform` to reproject data.  We need to supply two arguments, "x", the object we are transforming, and "CRSobj" which is the details of the new projection.  We will assume that we have good data read into R and that the original projection is already defined.  This is the case with all of the example data.

There are many ways to specify the "CRSobj".  We will be using [Proj.4](https://trac.osgeo.org/proj/) strings and the `CRS` function for this.  We can get the Proj.4 strings from other datasets, or specify them from scratch.  To get them from scratch, the easiest thing to do is search at [spatialreference.org](http://spatialreference.org/).  You can either search there, or just use Google.  For instance, if we want the [ESRI Albers Equal Area projection as Proj.4](www.google.com/search?q=ESRI Albers Equal Area projection as Proj.4) gets it as the first result.  Just select the [Proj4](http://spatialreference.org/ref/esri/usa-contiguous-albers-equal-area-conic/proj4/) link from the list.

So, if we want to reproject our data using this projection:

```{r}
esri_alb_p4 <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"
dc_metro_alb <- spTransform(dc_metro,
                CRS(esri_alb_p4))
```

Luckily, it is pretty common to have several datasets with one of which in the projections you want to use.  We can then just pull the Proj4 string from that.

```{r}
dc_metro_sttn_prj <- spTransform(dc_metro_sttn,
                                 CRS(proj4string(dc_metro_alb))) 
```

Projecting rasters is a bit different.  We will use `raster::projectRaster` to accomplish this. Be aware that this is looking for a Proj4 string for the "crs", and not a CRSobj.  

```{r,eval=TRUE}
dc_elev_prj <- projectRaster(dc_elev,crs=proj4string(dc_metro_sttn_prj))
```

## Exercise 3.1
In this first exercise we will work on manipulating the Tiger Lines file of the states that we pulled in as part of lesson 2 and assigned to `us_states`.

1. Assign just the DC boundary to an object named `dc_bnd`.
2. Re-project `dc_bnd` to match the projection of `dc_nlcd`.  Assign this to an object named `dc_bnd_prj`.

## Brief introduction to rgeos
In this section we are going to start working with many of the "typical" GIS type analyses, specifically buffers and a few overlays. We will use mostly `rgeos` but will also look a bit at `sp::over`.

Let's start with a buffer. We will use the albers projected stations for these examples

```{r}
sttn_buff_500 <- gBuffer(dc_metro_sttn_prj,width=500)
plot(sttn_buff_500)
```

We can see that overlapping buffers merged in this case.  If we wanted a buffer for each station we can use the "byid" argument.

```{r}
sttn_buff_500_id <- gBuffer(dc_metro_sttn_prj,width=500,byid = TRUE)
plot(sttn_buff_500_id)
```

Now we get a 500 meter circle around each of the stations.  Let's move on to one of the overlay commands in `rgeos`,the difference.

```{r diff}
#Create something to take difference of
sttn_buff_100 <- gBuffer(dc_metro_sttn_prj,width = 100)
sttn_diff <- gDifference(sttn_buff_500,sttn_buff_100)
sttn_diff
#pulls into individual polygons, instead of a single multi-polygon.
sttn_diff <- disaggregate(sttn_diff)
sttn_diff
plot(sttn_diff)
```

Lastly, let's pull out some of the basic geographic info on our datasets using `rgeos`.  That is provided by `gArea` and `gLength`. Let's get the area and perimeter of the all the land 500 meters from a metro station

``` {r}
gLength(sttn_diff)
gArea(sttn_diff)
#likely want area of each poly
gArea(sttn_diff,byid=TRUE)
```

We have left most of `rgeos` untouched, but hopefully shown enough to get you started.  

## Exercise 3.2
We will work with the re-projected `dc_bnd_prj` lets set this up for some further analysis.

1. Buffer the DC boundary by 1000 meters. Save it to dc_bnd_1000
2. Assign an object that represents only the area 1000 meters outside of DC (hint: gDifference).
3. Determine the area of both the DC boundary as well as just the surrounding 1000 meters.

## Working with rasters
Let's move on to rasters.  We will be doing mostly work with base R to summarize information stored in rasters and use our vector datasets to interact with those rasters and then we will show a few functions from `raster`.

We've already seen how to get some of the basic info of a raster.  To re-hash:

```{r}
dc_elev
```

This gives us the basics.  There are many options for looking at the values stored in the raster.  I usually default to `values` which returns the values as a vector which we can then use in R functions.

For instance, mean elevation in `dc_elev` could be calculated with 

```{r}
mean(values(dc_elev),na.omit=T)
```

If our raster contains categorical data (e.g. LULC), we can work with that too.  We don't have a ready example so lets use another `raster` function to reclassify our elevation data and then look at some summary stats of that.

```{r}
#reclass elevation into H, M, L
elev_summ <- summary(values(dc_elev))
#this is the format for the look up table expected by reclassify
rcl <- matrix(c(-Inf,elev_summ[2],1,
                elev_summ[2],elev_summ[5],2,
                elev_summ[5],Inf,3),
              nrow=3,byrow=T)
dc_elev_class <- reclassify(dc_elev,rcl)
dc_elev_class
```

So now we have categorical data, we can do cross-tabs on the values and calculate percent in each category.

```{r}
elev_class_perc <- table(values(dc_elev_class))/
  length(values(dc_elev_class))
elev_class_perc
```

The last task we will show is using vector to data to clip out our raster data.  We can do this with crop and mask.  We do the crop first as it will subset our raster based on the extent.  In most cases this is a significantly smaller area than the full raster dataset and speeds up the subsequent mask. We will do this with the projected versions.

```{r}
dc_elev_crop <- crop(dc_elev_prj,sttn_buff_500)
plot(dc_elev_crop)
plot(sttn_buff_500,add=T)
```

So, with this limited to just the extent of our dataset we can now clip out the values for each of the circles with.

```{r}
dc_elev_sttns <- mask(dc_elev_crop,sttn_buff_500)
plot(dc_elev_sttns)
plot(sttn_buff_500,add=T,border="red",lwd=2)
```

That gives us just the elevation within 500 meters of the Metro stations.  Probably not really interesting information, but we have it!  It might be more interesting to get the average elevation of each metro station.  Our workflow would be different as we would need to look at this on a per-station basis.  Might require a loop or a different approach all together.  Certainly possible, but beyond what we have time for today.

## Exercise 3.3
Let's combine all of this together and calculate some landcover summary statistics

1. Clip out the NLCD from within the DC boundaries.
2. Clip out the NLCD from the surrounding 1000 meters.
3. Summarize the land use/land cover statistics and report percent of each landcover type both within the DC boundary and within the surrounding 1000 meters.

## Other Geospatial packages
In this section, I'll introduce a few other packages that I have used or know about that provide some common analyses that may not be readily available via the base packages.  For a complete annotated listing though, the [CRAN Spatial Analysis Task View](https://cran.r-project.org/web/views/Spatial.html) should be your first stop.  The task view provides a full list of packages for working with spatial data, geostatistics, spatial regression, etc.  

Some of the other packages I have used for various tasks have been:
- [gdistance](https://cran.r-project.org/web/packages/gdistance/index.html): Provides tools for calculating distances across a grid.  Computes things like cost distance, accumulate costs, shortest path, etc. The [vignette for gdistance](https://cran.r-project.org/web/packages/gdistance/vignettes/gdistance1.pdf) is a good place to start for an overview of the package.
- [geosphere](https://cran.r-project.org/web/packages/geosphere/index.html): `geosphere` provides tools for spherical trigonometry and allows working directly with latitude, longitude, and bearing.  For more, look at the [vignette](https://cran.r-project.org/web/packages/geosphere/vignettes/geosphere.pdf).
- [SDMTools](https://cran.r-project.org/web/packages/SDMTools/index.html): This package provides functions to work with species distribution models.  In addition though, it also has implementations of most of the metrics available in the venerable landscape ecology tool, [FRAGSTATS](http://www.umass.edu/landeco/research/fragstats/fragstats.html).

```{r data_setup, echo=FALSE,cache=TRUE,message=FALSE}
dc_metro <- readOGR("data","Metro_Lines",verbose=FALSE)
dc_metro_sttn <- readOGR("data/metrostations.geojson", "OGRGeoJSON",verbose = FALSE)
dc_elev <- raster("data/dc_ned.tif")
dc_metro_prj <- spTransform(dc_metro,
                CRS("+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"))
dc_metro_sttn_prj <- spTransform(dc_metro_sttn,
                                 CRS(proj4string(dc_metro_prj)))
dc_elev_prj <- projectRaster(dc_elev,crs=proj4string(dc_metro_sttn_prj)) 
sttn_buff_500 <- gBuffer(dc_metro_sttn_prj,width=500)
```

# Lesson 4: Visualizing Spatial Data in R
Visualizing spatial data in interactive and static forms is one of the defining characteristics of GIS.  With interactive visualization and analysis, R, admittedly, is not quite up to the standards of a stand-alone GIS like QGIS or ArcGIS.  That being said, it has come quite a long way in the last several years.  Static visualization (e.g. maps) in R are, in my opinion, on par with anything you can create with a dedicated GIS.  A few (now somewhat dated) examples of maps built with R show this:

- [London Bike Hires](http://spatialanalysis.co.uk/wp-content/uploads/2012/02/bike_ggplot.png)
- [Facebook Users](http://paulbutler.org/archives/visualizing-facebook-friends/facebook_map.png)

Now we won't get to this level in just an hour or so, but we will see how to build static maps, get access to simple interactivity, and then see some of the javascript based mapping packages.

## Lesson Outline
- [Visualizing spatial data with `sp` and `raster`](#visualizing-spatial-data-with-sp-and-raster)
- [Simple interactivity with `quickmapr`](#simple-interactivity-with-quickmapr)
- [Mapping with javascript: `leaflet`](#mapping-with-javascript-leaflet)
- [Other visualization options](#other-visualization-options)

## Lesson Exercises
- [Exercise 4.1](#exercise-41)
- [Exercise 4.2](#exercise-42)

## Visualizing spatial data with `sp` and `raster`
The default plotting tools from `sp` and `raster` are good enough for most of your needs and there have been many additional tools added that allow these to be acceptable for making static maps (e.g. [GISTools](https://cran.r-project.org/web/packages/GISTools/)).  We have already seen these functions in action.  We will show these again.

To create a plot of a single layer

```{r}
plot(dc_metro)
#Play with symbology
plot(dc_metro, col="red", lwd = 3)
#Use data to color
plot(dc_metro, col=dc_metro$NAME, lwd=dc_metro$GIS_ID)
```

To create a plot of a multiple layers, we can use the "add" argument.

```{r}
plot(dc_metro)
#Add stations, change color,size, and symbol
plot(dc_metro_sttn, add=T, col="red", pch=15, cex=1.2)
```

Add some raster data in.

```{r}
plot(dc_elev)
plot(dc_metro, add=T)
plot(dc_metro_sttn, add=T, col="red", pch=15,cex=1.2)
```

We can certainly get fancier with the final plot, but that means digging into the details of plotting with base R.  That'd be a workshop in and of itself!

## Simple interactivity with `quickmapr`
At the risk of being self-serving and tooting my own horn, the next package we are going to play with is [`quickmapr`](https://cran.r-project.org/web/packages/quickmapr/index.html).  

While building plots with the default plotting functions is fairly painless, I wanted something that was a bit more straightforward.  Additionally, the default plots are static and don't have any interactivity built into them and the interactive javascript solutions (coming up) expect unprojected data in latitude and longitude.  This is the other problem I wanted to address.  `quickmapr` is not meant as a replacement for default plotting nor is it meant to be used to create production quality maps.  It is for use during the course of an analysis.

And before we move on, keep in mind that this is currently version 0.1.1, so it has bugs, but it works well enough that I am willing to go out on a limb and have a large number of people try to break it!

First thing you will need to do is install it from CRAN and load into your library

```{r, eval=FALSE}
install.packages("quickmapr")
library(quickmapr)
```

This package is built around the `qmap` object.  All of the information for creating the plots are stored in this object and it is what allows for the interactivity.

To build this we use the function `qmap`. There are several options available, but all you need to create a plot with multiple layers is the layers to include in the plot.

```{r}
my_map <- qmap(dc_elev_prj,dc_metro_prj,dc_metro_sttn_prj)
```

So, not any different than the default plots (because it uses those!).  But now, we can do some other fun stuff.

We zoom with `zi`, `zo`, and `ze`. We can pan with `p`. We can identify with `i`, and we can get back to our original extent with `f`.

```{r,eval=FALSE}
zi(my_map)
p(my_map)
zo(my_map)
i(my_map,3)
f(my_map)
```

There are a few other tricks built in, but they are experimental.  For example, adding a base images from the National Map (only aerial and topo currently supported).

```{r}
my_map<-qmap(dc_metro_prj, dc_metro_sttn_prj, colors=c("yellow","green"), 
             basemap="topo",resolution = 800)
my_map<-qmap(dc_metro_prj, dc_metro_sttn_prj, colors=c("yellow","green"), 
             basemap="1m_aerial",resolution = 800)
```

Lastly, while this can handle large datasets, it is still slow.  This is because the default plotting functions slow down as your number of features get into the 10s of thousands.  It works, but isn't nearly as zippy and smooth as a stand-alone GIS.  In short, this provided handy tools for me and allowed me to stick with a single analysis environment.   

## Exercise 4.1
We will create a map of the data we've been working with, the NLCD and DC boundary.

1. Map your clipped landcover and the DC boundary using the default plotting tools from `sp` and `raster`.
2. Create the same map, but use `quickmapr`.  Try out some of the interactivity tools: zoom, pan, identify.

## Mapping with javascript: `leaflet`
Many of the visualization tasks (e.g. zoom, pan, identify) are implemented (and implemented well) in various javascript libraries.  As such, much of the development in R has been towards packages to access javascript libraries and allow the display of R objects. Our efforts are going to focus on the `leaflet` package which, unsurprisingly, allows us to access the leaflet javascript library.  The `leaflet` package is written and maintained through RStudio.  For more on how to use `leaflet`, check out [RStudio's tutorial](https://rstudio.github.io/leaflet/).

Before we build some maps, let's get everything we need installed and loaded.

```{r eval=FALSE}
install.packages("leaflet")
library(leaflet)
```

Although the maps we can create with `leaflet` are really nice, there is one downside.  It is expected that the data are all in unprojected latitude and longitude, so if you have projected data, that will need to be converted back in to geographic coordinates.  For us, we have examples of data that are already in the correct projection.

One of the nice things about the `leaflet` interface is that it is really easy to work iteratively and build your maps by adding data and options to an existing leaflet map. So lets start with the bare minimum.

```{r, eval=FALSE}
map <- leaflet()
map <- addTiles(map)
map <- addPolylines(map,data=dc_metro)
map
```

There are lots of tiles available to us.  The default is Open Street Map. We can try out some of the other available tiles. Full list of options available from <http://leaflet-extras.github.io/leaflet-providers/preview/>.

```{r, eval=FALSE}
map <- leaflet()
map <- addPolylines(map,data=dc_metro)
map <- addProviderTiles(map,"Esri.NatGeoWorldMap")
map
#or
map <- leaflet()
map <- addPolylines(map,data=dc_metro)
map <- addProviderTiles(map,"MapQuestOpen.Aerial")
map
```

And we can add other layers in and also change their styling.

```{r, eval=FALSE}
map <- leaflet()
map <- addTiles(map)
map <- addPolylines(map,data=dc_metro)
map <- addCircles(map, data=dc_metro_sttn,
                  color="red", weight = 7,
                  popup = dc_metro_sttn$NAME)

map
```

Lastly, we can add in rasters.

```{r, eval=FALSE}
map <- leaflet()
map <- addTiles(map)
map <- addPolylines(map,data=dc_metro)
#Note: Takes a while.  Does projection behind the scenes.
map <- addRasterImage(map, dc_elev)
map
```

## Exercise 4.2
For this exercise, we will create a leaflet map

1. Create a leaflet map and add in the DC boundary.  Look at the `addPolygons` help to get you started.
2. Add in the NLCD you clipped out as part of lesson 3.

## Other visualization options
What we have had the time to show in this workshop is just the beginning, as there are many packages that provide support for mapping spatial data.  The following are just a few of these.

- [mapview](https://cran.r-project.org/web/packages/mapview/index.html): This is a wrapper to leaflet that also greatly simplifies the creation of the maps by taking care of many of the settings behind the scenes (including, I believe, reprojecting data to work with leaflet).  
- [ggplot2](https://cran.r-project.org/web/packages/ggplot2/index.html): THE data viz package for R.  Also can be used to make maps (it is what I use for my static maps). Requires additional processing of the spatial data to create plots, but has almost unlimited possibilities for creating maps.
- [ggmap](https://cran.r-project.org/web/packages/ggmap/index.html): A `ggplot2` based package for creating maps.  Makes it a bit easier and has built in support for some basemaps (e.g. Google Maps).
- [cartographer](https://github.com/ropensci/cartographer): Not on CRAN and hasn't been actively developed in a while, but is interesting because it provides access to a different javascript library, d3 and d3-carto-maps. Similar in functionality to the leaflet solution, but d3 has support for projections built in so has possibility for better handling of projected data.