-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmaptime_gps-tracks_in_r.rmd
390 lines (274 loc) · 15.7 KB
/
maptime_gps-tracks_in_r.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
---
title: "Exploring gps-tracks in R"
author: "Emiel van Loon"
date: "June 11, 2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
In this hands-on workshop we will take a look at methods that we typically use to
investigate animal gps-tracks. The data that is being used is coming from the research
location at Texel that was mentioned in the presentation.
Some of the material has been used at a summer course that is offered at the University of Amsterdam
by the Institute of Biodiversity and Ecosystem Dynamics. Especially Michael Kemp and Wouter Vansteelant have
contributed a lot to this material.
For convenience, an HTML-version of this file is available at http://horizon.science.uva.nl/public/maptime/
In this workshop we assume that you have some working knowledge of R but are not
necessarily very skilled in doing spatial analyses in R. For those who are more experienced
we have a few challenges at the end of each chapter.
We use functions from some libraries that are not installed by default in R.
To ensure that you have them available you can run this command in R:
```{r install_libs, eval=FALSE}
source('http://horizon.science.uva.nl/public/maptime/install_libs.R')
```
Before starting with the analysis we will also load two libraries:
the plotting library `ggmap` and a small set of functions to make some calculations
more convenient: `pt2pt_fxns.R`.
```{r load_libs}
library(ggmap) # to use Google Earth satellite images in R
source('http://horizon.science.uva.nl/public/maptime/pt2pt_fxns.R')
```
Before we start, let's make sure that you have the required data for this workshop
downloaded and stored in the directory which is also set as your working directory
we need the following files:
- tracking_data.csv
- migration_data.csv
- rGU.rda
- tracking_data.rda
To set your working directory, you can use the command `setwd('/path_to_this_file')`.
## Load and preprocess the data
Now that we have done all the preparation, it's time to start.
The following command loads the data. This data file contains the movement tracks
of eight lesser black-backed gulls over the period of one month (June 2010).
```{r load_csv}
gulls <- read.csv('tracking_data.csv')
```
If you are using RStudio, you see in the environment already some information
about the data just loaded. It is a data frame with 66146 observations for 10 variables.
With some basic R-commands we can take a look into the structure & contents of
this data.
```{r explore_data_structure}
str(gulls)
```
```{r explore_data}
summary(gulls)
head(gulls,n=10)
tail(gulls,n=5)
```
The column 'device_info_serial' contains the IDs of the tracking devices, so de facto
these are IDs for the individual birds.
Let's check if there are indeed eight individuals in the data set.
```{r how_many}
( gullun <- unique(gulls$device_info_serial) )
```
As you can see in the summary data above, the date-time information is stored as text.
To work more efficiently we do need to change this into a dedicated data format.
In addition we might want to consider the day of the month and hour of the day as
additional variables.
```{r convert_datetime}
gulls$date_time <- as.POSIXct(gulls$date_time, format="%Y-%m-%d %H:%M:%S", tz='UTC')
gulls$hr <- lubridate::hour(gulls$date_time) # returns hour (0-23)
gulls$day <- lubridate::day(gulls$date_time) # returns day (0-31)
```
Possibly the data is not ordered nicely. For some analyses later-on that's
required, so let's do this ordering already.
```{r ordering}
gulls <- gulls[order(gulls$device_info_serial,gulls$date_time),]
```
We also take the data apart by species because it is handy for analyses later on.
```{r mk_subsets}
ss.298 <- subset(gulls, gulls$device_info_serial == "298")
ss.311 <- subset(gulls, gulls$device_info_serial == "311")
ss.317 <- subset(gulls, gulls$device_info_serial == "317")
ss.320 <- subset(gulls, gulls$device_info_serial == "320")
ss.327 <- subset(gulls, gulls$device_info_serial == "327")
ss.329 <- subset(gulls, gulls$device_info_serial == "329")
ss.344 <- subset(gulls, gulls$device_info_serial == "344")
ss.355 <- subset(gulls, gulls$device_info_serial == "355")
```
## Visualise the tracking data
### Basic plotting functions
We use the function `get_map()` from package ggmap to download a satellite image
for Texel, and subsequently plot the gull tracks on top of it, with different
colours per bird, using ggplot2-functions.
```{r make_maps}
Texel <- get_map(location=c(4.7,53), zoom = 7, maptype ='satellite')
p1 <- ggmap(Texel) + xlim(2.8,5.65)+ ylim(52.35,54) +
geom_path(data=gulls,aes(x=longitude,y=latitude,group=device_info_serial,col=factor(device_info_serial)),size=.6)
p1
```
If you are not familiar with ggplot/ggmap, it is nice to explore the possibilities.
It has the option to update the existing plot through extra commands.
Below are some examples.
Specify the colour for each bird.
```{r}
p2 <- p1 + scale_colour_manual(name="Bird",values=c("298"="darkorange2","311"="cornflowerblue","317"="green","320"="yellow","327"="orange","329"="red","344"="purple","355"="pink"))
p2
```
Change the labels at the axes.
```{r}
p3 <- p2 + xlab("Long[°]") + ylab("Lat[°]")
```
Specify lay-out details, using the `theme()` function
```{r}
p4 <- p3 + theme(legend.position='bottom',
legend.direction='horizontal',
legend.title=element_text(size=10,face='bold'),
axis.text=element_text(size=8,face='italic'),
axis.title=element_text(size=10,face='bold'))
p4
```
Separate figures for each bird or time period using `facet_grid()` and `facet_wrap()`.
```{r}
p4 + facet_wrap(~device_info_serial,ncol=4)
```
### Calculating and visualising additional statistics
The gull data does contain a column 'speed_2d'. This is the speed measured by the GPs-tracker over a very short time-interval. While on average it gives a good estimate of the speed, it shows a large variation and does sometimes contain outliers.
Another way to calculate speed is by dividing distance traveled by the time between consecutive GPS-points. We can make this calculation using the functions `pt2pt.distance()` and `pt2pt.duration()`, but have to do this per bird. We call the result 'trajectory speed'.
This calculation is done below for three birds.
```{r, echo=FALSE}
ss.298$distance <- pt2pt.distance(ss.298$latitude, ss.298$longitude)
ss.311$distance <- pt2pt.distance(ss.311$latitude, ss.311$longitude)
ss.317$distance <- pt2pt.distance(ss.317$latitude, ss.317$longitude)
ss.298$duration <- pt2pt.duration(ss.298$date_time)
ss.311$duration <- pt2pt.duration(ss.311$date_time)
ss.317$duration <- pt2pt.duration(ss.317$date_time)
ss.298$traj_speed <- pt2pt.speed(ss.298$distance, ss.298$duration)
ss.311$traj_speed <- pt2pt.speed(ss.311$distance, ss.311$duration)
ss.317$traj_speed <- pt2pt.speed(ss.317$distance, ss.317$duration)
```
If you compare the instantaneous speeds with the trajectory speed, you see indeed that there are more extremes in the instantaneous speed (more very low and very high speeds).
```{r}
summary(ss.298$speed_2d)
summary(ss.298$traj_speed)
hist(ss.298$speed_2d)
hist(ss.298$traj_speed)
```
Similar to what we did with speed, we can calculate the direction of each trajectory segment using the function `pt2pt.direction()`
```{r}
ss.298$direction <- pt2pt.direction(ss.298$latitude,ss.298$longitude)
ss.311$direction <- pt2pt.direction(ss.311$latitude,ss.311$longitude)
ss.317$direction <- pt2pt.direction(ss.317$latitude,ss.317$longitude)
```
This is an example of the result: a distribution of flight directions for bird 311.
```{r}
hist(ss.311$direction, col='steelblue1')
```
The values in the variable 'direction' are normal (linear) numbers in degrees, with North at 0 degrees (but also North at 360 degrees). We do need an additional step to ensure that this direction is understood properly as a circular variable (i.e. directions of 0 and 180 degrees are much further apart than 0 and 270 degrees).
```{r}
ss.298$c_direction <- circular::circular(ss.298$direction, units = 'degrees', template ='geographics')
ss.311$c_direction <- circular::circular(ss.311$direction, units = 'degrees', template ='geographics')
ss.317$c_direction <- circular::circular(ss.317$direction, units = 'degrees', template ='geographics')
```
Based on this, a circular histogram can be made.
```{r}
circular::plot.circular(ss.311$c_direction, stack=T, shrink=2, bins=90, cex=.1)
lines(density(ss.311$c_direction, bw=35, na.rm=T), col='red', lwd=3)
circular::axis.circular(at=circular::circular(seq(0, (2*pi)-(pi/2), pi/2)), labels=c("E","N","W","S"))
```
```{r}
circular::windrose(ss.311$c_direction, log(ss.311$traj_speed), increment=.5, fill.col=topo.colors(7))
```
### Challenges
We have only given a very brief look into ways to plot tracking data.
And in case you already are familiar with the type of plots shown in the previous
section you might be up for something more challenging. We provide three ideas for
extensions.
1. It would be nice to combine maps and statistical graphs to make cartograms, which would for instance visualise the distribution of a variate (e.g. the speed or altitude in specific spatial region) by histograms or windroses.
2. There are quite a few nice plotting platforms in R. Some of these contain interactivity like `leaflet` library. It would be nice to explore the plotting-options using leaflet.
## Explore time budgets
As you have seen in the presentation earlier this evening, time budgets are important to
learn more about the behaviour and ecology of our birds. A time-budget specifies how much time a bird is spending in different habitats or on different behaviours.
Here we will focus on how much time the gulls spend in different habitats during their breeding period in June. We will use a map with major geographical land-units for North-Holland with relevant units for the lesser black-backed gull. The spatial units are: Wadden Sea, North Sea, Texel, Mainland and Beach.
Let's load the data first.
```{r}
load('./rGU.rda')
```
The datafile rGU contains a map (`rGU`) which is stored in a commonly used format for spatial raster data in R (a `RasterLayer`). Let's have a look what this map contains:
at the very end you see the attributes listed.
```{r}
rGU
```
And also visualise the map.
```{r, echo=FALSE}
library(rasterVis)
```
```{r}
levelplot(rGU, col.regions=c("yellow","green","blue","red","brown"))
```
So how do we collect data from this map? First of all, this map is not in a latitude-longitude coordinate system (the GPS system is called `WGS84`) but the projected Dutch coordinate system (called `RDnew`).
So we have to either transform the GPS-trajectory data to the Map-coordinate system or vice versa. We choose for converting the GPS data to the Map.
```{r eval=FALSE}
library(sp)
library(rgdal)
sgulls <- SpatialPointsDataFrame( cbind(gulls$longitude,gulls$latitude),
gulls[,c('device_info_serial','date_time','altitude','speed_2d')] )
# specify that this data is in WSG84 lat-lon
proj4string(sgulls) <- CRS("+init=epsg:4326")
# transform the data to the Dutch coordinate system
# sgulls <- spTransform(sgulls, CRS("+init=epsg:28992")) # this short comment didn't work sometimes
sgulls <- spTransform(sgulls, CRS("+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889
+k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.417,50.3319,465.552,-0.398957,0.343988,-1.8774,4.0725 +units=m +no_defs"))
```
```{r echo=FALSE}
load('tracking_data.rda')
```
In case the above fails (if some components of R are not installed correctly or so)
we can also load the spatial data directly from an additional file: `load('tracking_data.rda')`.
If we check the transformed data in sgull, we see that the coordinates are indeed changed from latitude longitude values into the RD-values (which range from 0 to 300000 in the West-East direction and 300000 to 600000 in the South-North direction).
```{r}
head(sgulls)
```
With the data in the same coordinate system, we can record the codes of the geographical units that are visited by the gulls, using the function `extract()`
```{r}
gulls$GU <- extract(rGU,sgulls)
# the result in gulls$GU are integers, we turn these into categories
gulls$GUc <- factor(gulls$GU, levels=1:5,
labels= c('Beach', 'Mainland', 'North Sea', 'Texel', 'Wadden Sea') )
```
It is good to note that any points outside the range of the map are coded as NA.
In this case there are `r sum( is.na(gulls$GU) )` GPS points outside the range.
The resulting data in GU can now be used to quantify habitat use.
For example, if we assume that the duration of the time intervals is comparable between
the birds, we can simply count the number of GPS points in each geographical unit per bird.
This is done below.
```{r}
( bird_GU_time <- table(gulls$device_info_serial,gulls$GUc) )
```
This kind of table is hard to read, so let's transform it to a proportional table,
where the fractions sum to 1 per bird (the numbers are also rounded to 2 digits because otherwise they are dificult to read).
```{r}
( bird_GU_time_pr <- round( prop.table(bird_GU_time, margin=1),2) )
```
Follow-up steps in the analysis are often to investigate if:
- the observed difference in habitat use between individuals are in fact large or that they are doing more or less the same;
- the selected has been selected is proportional to what is available.
And have a statistical angle, because the idea is to generalize the results to the
population (not just the few birds that were tagged). So these questions are investigated
with statistical tools. Here we will consider the second question and show how it is commonly answered.
This analysis requires to first make a subjective choice of what spatial domain we consider to be 'available' to these birds. And once that is done, a statistical test can be applied
which compares the selected geographical units by the birds with the available
area in each of the units (a test for goodness of fit; we use the chi-squared test in this case).
```{r}
# note: the raw values in the rGU RasterLayer are stored in the 'slot' @data@values
available <- table(rGU@data@values)
selected <- table(gulls$GU)
prop_expected <- available/sum(available)
prop_selected <- selected/sum(selected)
chisq.test(selected,p=prop_expected)
```
The (extremely small) p-value of this test tells is that the birds do not use the landscape proportional to their availability but select for certain uses.
### Challenges
1. The comparison we made between 'used' and 'available' habitats is dominated by the fact that the birds spend a lot of time in the colony (which is on Texel). It is especially interesting to know where the birds spend their time when not in the colony. So we should first remove the points that fall in this colony and then redo the time-budget analysis.
2. In spite of what we assumed, the time-steps in the data set are not homogeneous over time and also not exactly equal among individuals. We can take this problem into account by calculating the time between different points (see the function `pt2pt.duration()` fro the previous chapter) and subsequently using this information to calculate habitat use more precise.
## Resources to learn more
If you would like to learn more about geocomputation in R this book is a great resource:
- https://geocompr.robinlovelace.net/index.html.
Also this online course by datacamp is a nice place to start (the first part is available for free):
- https://www.datacamp.com/courses/working-with-geospatial-data-in-r
For learning more about (animal) movement analysis in R, a look at the specific packages for his purpose is a good place to start:
- https://cran.r-project.org/web/packages/move/vignettes/move.pdf
- https://cran.r-project.org/web/packages/moveVis/moveVis.pdf
- https://cran.r-project.org/package=adehabitatLT/vignettes/adehabitatLT.pdf