forked from seananderson/ggplot2-FISH554
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathggplot2-exercises-answers.Rmd
230 lines (165 loc) · 9.68 KB
/
ggplot2-exercises-answers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# An introduction to the ggplot2 package
Exercises written by Sean C. Anderson: sean "at" seananderson.ca.
For a UW FISH 544 self-study workshop in February 2014.
This is an R Markdown document --- it mixes R code with text. Install the knitr package to work with it.
See <http://www.rstudio.com/ide/docs/authoring/using_markdown> for more details.
Once knitr is installed, you can "knit" the document by clicking the "knit" button in RStudio, or running `knitr::knit2html("ggplot2-exercises.Rmd")` in an R console.
```{r install-knitr, echo=FALSE, eval=FALSE}
install.packages(knitr)
```
```{r knitr-options, cache=FALSE, echo=FALSE}
library(knitr)
opts_chunk$set(cache=TRUE, message=FALSE, tidy=FALSE)
```
First, load ggplot2:
```{r load-ggplot2, cache=FALSE}
library(ggplot2)
```
## Loading the data
We're going to work with morphological data from Galapagos finches, which is available from BIRDD: Beagle Investigation Return with Darwinian Data at <http://bioquest.org/birdd/morph.php>. It is originally from Sato et al. 2000 Mol. Biol. Evol. <http://mbe.oxfordjournals.org/content/18/3/299.full>.
I've taken the data and cleaned it up a bit for this exercise. I've removed some columns and made the column names lower case. I've also removed all but one island. You can do that with this code:
```{r}
morph <- read.csv("Morph_for_Sato.csv")
names(morph) <- tolower(names(morph)) # make columns names lowercase
morph <- subset(morph, islandid == "Flor_Chrl") # take only one island
morph <- morph[,c("taxonorig", "sex", "wingl", "beakh", "ubeakl")] # only keep these columns
names(morph)[1] <- "taxon"
morph <- data.frame(na.omit(morph)) # remove all rows with any NAs to make this simple
morph$taxon <- factor(morph$taxon) # remove extra remaining factor levels
morph$sex <- factor(morph$sex) # remove extra remaining factor levels
row.names(morph) <- NULL # tidy up the row names
```
Take a look at the data. There are columns for taxon, sex, wing length, beak height, and upper beak length:
```{r look-data, eval=FALSE}
head(morph)
str(morph)
```
## Part 1: geoms
First, read the included notes from the introduction to the end of the section on geoms.
Then, open the ggplot2 web documentation <http://docs.ggplot2.org/> and keep it open to refer back to it through these exercises.
First, let's experiment with some geoms using the `morph` dataset. I'll start by setting up a basic scatterplot of beak height vs. wing length:
```{r geom1}
ggplot(morph, aes(wingl, beakh)) + geom_point(alpha = 0.4)
```
Because there's lots of overplotting, I've set the alpha (opacity) value to 40%. Alternatively, we could have added some jittering. We'll do that below.
Experiment with ggplot2 geoms. Try applying at least 3 different geoms to the `morph` dataset. For example, try showing the distribution of wing length with `geom_histogram()` and `geom_density()`. You could also try showing the distribution of wing length for male and female birds by using `geom_violin()`. Remember to consult <http://docs.ggplot2.org/>.
```{r geom-examples}
ggplot(morph, aes(wingl)) + geom_histogram(binwidth=1)
ggplot(morph, aes(wingl)) + geom_density()
ggplot(morph, aes(sex, wingl)) + geom_violin()
ggplot(morph, aes(taxon, wingl)) + geom_violin() + coord_flip() # coord_flip() rotates 90 degrees
ggplot(morph, aes(taxon, wingl)) + geom_boxplot() + coord_flip()
```
## Part 2: aesthetics
Start by reading the section on aesthetics in the included notes.
Let's play with mapping some of our data to aesthetics. I'll start with one example. I'm going to map the male/female value to a colour in our scatterplot of wing length and beak height. This time I'll use jittering instead of transparency to deal with overplotting:
```{r aes-example}
ggplot(morph, aes(wingl, beakh)) +
geom_point(aes(colour = sex),
position = position_jitter(width = 0.3, height = 0))
```
Explore the `morph` dataset yourself by applying some aesthetics. You can see all the available aesthetics for a give geom by looking at the documentation. Either see the website or, for example, `?geom_point`
Some suggestions:
- try the same scatterplot but show upper beak length (`ubeakl`) with size
- try the same scatterplot but show the taxon with colour
- try the same scatterplot but show the upper beak length with colour (note how ggplot treats `ubeakl` differently than `taxon` when it picks a colour scale)
- try the same scatterplot but show the sex with a different shape
- combine all these: colour for taxon, shape for sex, and size for upper beak length
Yes, this last version is a bit silly, but it illustrates how quickly you can explore multiple dimensions with ggplot2.
```{r aes1}
ggplot(morph, aes(wingl, beakh)) +
geom_point(aes(size = ubeakl), alpha = 0.4)
```
```{r aes2, fig.width=9}
ggplot(morph, aes(wingl, beakh)) +
geom_point(aes(colour = taxon),
position = position_jitter(width = 0.3, height = 0))
```
```{r aes3, fig.width=9}
ggplot(morph, aes(wingl, beakh)) +
geom_point(aes(colour = ubeakl),
position = position_jitter(width = 0.3, height = 0))
```
```{r aes4}
ggplot(morph, aes(wingl, beakh)) +
geom_point(aes(shape = sex),
position = position_jitter(width = 0.3, height = 0))
```{r aes5, fig.width=9}
ggplot(morph, aes(wingl, beakh)) +
geom_point(aes(shape = sex, size = ubeakl, colour = taxon),
alpha = 0.4)
```
## Part 3: facets (small multiples)
Read the notes section on small multiples.
Try a scatterplot of beak height against wing length with a different panel for each taxon. Use `facet_wrap`:
```{r facet1, fig.height=9, fig.width=9}
ggplot(morph, aes(wingl, beakh)) +
geom_point(alpha = 0.4) + facet_wrap(~taxon)
```
In some cases, it's useful to let the x or y axes have different scales for each panel. Try giving each panel a different axis here using `scales = "free"` in your call to `facet_wrap()`:
```{r facet2, fig.heigh=9, fig.width=9}
ggplot(morph, aes(wingl, beakh)) +
geom_point(alpha = 0.4) +
facet_wrap(~taxon, scales = "free")
```
Now try using `facet_grid` to explore the same scatterplot for each combination of sex and taxa. (Remove the `scales = "free"` code for simplicity.)
```{r facet3, fig.width=16, fig.height=3}
ggplot(morph, aes(wingl, beakh)) +
geom_point(alpha = 0.4) + facet_grid(sex~taxon)
```
As another example, let's look at the distribution of wing length by sex with different panels for each taxa. Use a boxplot or violin plot to show the distributions.
```{r facet4, fig.width=9, fig.height=9}
ggplot(morph, aes(sex, wingl)) + geom_violin() +
facet_wrap(~taxon)
```
## Part 4: customizing ggplot2
So far we've used the default settings. Let's try applying a theme and adjust some of the annotation of our plots.
Read the notes from the section on themes until the end.
Let's start by applying the black and white theme, `theme_bw()`. Try adding that on to the end of a plot showing the distribution of wing length by sex using `geom_violin()`. Save your plot to an object named `p` and `print()` it to show it. We'll re-use this plot in the next step.
```{r customizing1}
p <- ggplot(morph, aes(sex, wingl)) + geom_violin() + theme_bw()
print(p)
```
Let's go one step further and remove all grid lines. See the included notes and the help for `?theme`. Hint: setting an argument to `element_blank()` will remove it. Add these elements to the object `p` and print `p` again. Hint: see the note "Exploiting the object-oriented nature of ggplot2" in the "Random tips" section of the notes.
```{r customizing2}
p <- p + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
print(p)
```
And now let's set the x and y axis labels ourselves. Name them something more appropriate.
```{r customizing3}
p <- p + xlab("Sex") + ylab("Wing length")
print(p)
```
Use the function `ggsave()` to save your plot to a PDF file.
```{r customizing4}
ggsave(p, file = "wingl-violin.pdf", width = 6, height = 6)
```
## More examples
### Dot and line plots
ggplot2 can be useful for quickly making dot and line plots. For example, this is useful for coefficient plots. To illustrate how to make this style of plot, let's make a plot that shows a dot for the median and line segment for the quantiles. First, we'll calculate these values. Run the following code:
```{r manipulate}
# install.packages("dplyr")
library(dplyr)
morph_quant <- summarise(group_by(morph, taxon),
l = quantile(wingl, 0.25)[[1]],
m = median(wingl),
u = quantile(wingl, 0.75)[[1]])
# order taxa factor levels by the median for plotting:
morph_quant <- transform(morph_quant,
taxon = reorder(taxon, m, function(x) x)) # see ?reorder
```
Now make the plot. The final plot should have the taxa listed down the y-axis and the wing length value on the x axis. Use the geom `geom_pointrange()`. Because this geom only works for vertical line segments, you'll need to rotate the whole plot by adding `+ coord_flip()`. So, `ymax` and `ymin` refer to the maximum and minimum line segment values and `x` to the taxa, even though they will appear rotated in the end.
```{r pointrange-example}
ggplot(morph_quant, aes(x = taxon, y = m, ymin = l, ymax = u)) +
geom_pointrange() + coord_flip() +
ylab("Wing length") + xlab("")
```
### Adding model fits to the data
ggplot2 can add model fits to the data to help visualize patterns. For example, it can quickly add linear regression lines, GLMs, GAMs, and loess curves. Let's add loess curves to scatter plots of beak height and wing length with a panel (facets) for male and female. See `?stat_smooth`.
```{r loess-example}
ggplot(morph, aes(wingl, beakh)) +
geom_point(alpha = 0.4) + facet_wrap(~sex) +
stat_smooth(method = "loess")
```
You can add any model fit by specifying the function and the formula (see the examples in `?stat_smooth`). Alternatively, you can fit your own model, predict from it, and add the data with `geom_ribbon()`.