-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathIntro_to_API.qmd
220 lines (149 loc) · 7.13 KB
/
Intro_to_API.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
title: "Intro to course tools: VectorByte Dataset Access Functions"
format:
html:
toc: true
toc-location: left
html-math-method: katex
css: styles.css
---
This section will cover the exploration of of the [VecTraits](https://vectorbyte.crc.nd.edu/vectraits-explorer) database and the functions included in the package `bayesTPC` to access to specific data.
## Installing `bayesTPC` package
Lets start installing the package `bayesTPC`. It is important to have some packages installed in advance. The package `bayesTPC` uses some functionality from the package `nimble`. It is also useful to install the package `remotes` to install `bayesTPC` from the repository. To do so, you can run the following codes:
```{r, eval=FALSE}
# Run any of the following if you don't have them installed yet
install.packages("nimble")
install.packages("remotes")
```
To install the package `bayesTPC`, you can use the following line of code:
```{r, eval=FALSE}
remotes::install_github("johnwilliamsmithjr/bayesTPC")
```
## Getting to know the VecTraits database
Explore the [VecTraits](https://vectorbyte.crc.nd.edu/vectraits-explorer) database. Notice that you can search datasets by filtering according to traits, variables, genus, etc. We encourage you to identify some datasets that meets your particular interests.
## Pulling data into `R` using the VecTraits API (Application Programming Interface) from `bayesTPC`
This section will walk you through the functions included in `bayesTPC` to retrieve data from [VecTraits](https://vectorbyte.crc.nd.edu/vectraits-explorer) database
Load some packages...
```{r results='hide',message=FALSE}
library(nimble)
library(bayesTPC)
library(tidyverse)
```
Next, we are going to use some functions to extract specific datasets. We can use the function `get_dataset()` to extract a single one:
```{r}
dataset50 <- get_dataset(50) # Get dataset with ID 50
head(dataset50)
```
## Retrieve Multiple Datasets
Get a list of datasets by their ID numbers.
format:
`get_datasets(<Vector or sequence of IDs>)`
```{r}
list_of_dataframes <- get_datasets(c(1:10))
```
Use `do.call` and `lapply` to create a single dataframe containing all of the datasets you pulled in the previous step:
```{r}
datasets1to10 <- do.call(rbind,lapply(list_of_dataframes, data.frame, stringsAsFactors=FALSE))
head(datasets1to10)
```
## Search for and list datasets on a specific genus
```{r}
list_of_sepedon <- find_datasets("Sepedon")
```
### Other options
Format:
`list_of_dataframes <- get_datasets(<Vector or Range of IDs>)`
Examples:
```{r}
list_of_dataframes <- get_datasets(c(10, 20, 35)) # Get datasets 10, 20, and 35
list_of_dataframes <- get_datasets(seq(200, 240, 10)) # Get datasets 200, 210, 220, 230, and 240
list_of_dataframes <- get_datasets(c(50, 1, 580, 20)) # Get datasets 1, 20, 50, and 580
list_of_dataframes <- get_datasets(320:324) # Get datasets 320, 321, 322, 323, and 324
```
### Retrieve List of Datasets Related To Keyword
Format: `list_of_dataframes <- find_datasets(<Keyword>)`
Examples:
```{r, eval=FALSE}
list_of_tiger_dataframes <- find_datasets("tiger mosquito") # Get datasets related to tiger mosquitos
```
Probably you won't be able to get a full list of datasets because the high number of datasets. If that is the case, you can use the argument `safety = FALSE`.
```{r}
list_of_tiger_dataframes <- find_datasets("tiger mosquito", safety = FALSE)
```
Another option would be refine the search by adding multiple keywords.
Format: `list_of_dataframes <- find_datasets(<Vector of Keywords>)`
Examples:
(All the following retrieve datasets related to yellow fever & terrestrial areas in North Carolina)
```{r}
list_of_dataframes <- find_datasets(c("terrestrial", "north carolina", "yellow fever"))
list_of_dataframes <- find_datasets(c("terrestrial", "north carolina", "yellow fever"))
list_of_dataframes <- find_datasets(c("TERRESTRIAL", "NORTH CAROLINA", "YELLOW FEVER"))
list_of_dataframes <- find_datasets(c("tERRestRiAL", "NoRTh CARoLina", "YELLow feVer"))
```
### The `pick()` Function
If all the information above was too much and there’s no way you’re going to remember it, all you need to know is the `pick()` function. The `pick()` function displays a small menu and allows you to choose an option in order for you to find and retrieve whatever dataset(s) you may be looking for.
Format:
`x <- pick()`
### Access Dataframes In Lists
When multiple datasets are retrieved and stored in a list of dataframes, you can access individual datasets with the following format:
`first_dataframe <- list_of_dataframes[[1]]`
`second_dataframe <- list_of_dataframes[[2]]`
...and so on for every dataframe in the list.
Example:
To print a list of the names of those who contributed to the datasets related to tiger mosquitos:
```{r}
for (i in 1:length(list_of_tiger_dataframes)) {
print(list_of_tiger_dataframes[[i]]$SubmittedBy[1])
}
```
or retrieve the IDs:
```{r}
for (i in 1:length(list_of_tiger_dataframes)) {
print(list_of_tiger_dataframes[[i]]$DatasetID[1])
}
```
### Access Data In Dataframes
To access variable `x` in row `y` of a dataset stored as a dataframe, use the `$` symbol and square brackets `[]`:
`variable_value <- dataframe$x[y]`
For example, if you had a dataset stored as a dataframe and you wanted to know the Genus of Interactor 1 in the fifth row, you could get that with the following code:
```{r}
dataset50$Interactor1Genus[5]
```
The same format also works for dataframes in lists. For example, you could run the following to get the original trait value in row 3 of the fourth dataset in the list:
```{r}
list_of_dataframes[[3]]$OriginalTraitValue[5]
```
The same format also works on raw functions, although it is recommended that retrieved datasets are immediately stored in an `R` object:
```{r}
get_dataset(5)$SubmittedBy[1] # Name of person who submitted the dataset with ID 5
```
### Plotting datasets using `ggplot2`
Once the database is retrieved, data can be used to explore and plot. In this example the dataset 49 will be used. This dataset have data about fecundity (number of eggs per individual) at different temperatures for the species **Sepedon fuscipennis**:
```{r}
dataset49 <- get_dataset(49)
```
By using tools from `ggplot2` data can be transformed and used:
```{r}
dataset49 %>%
ggplot(aes(Interactor1Temp,OriginalTraitValue)) +
geom_point(col = "blue") +
geom_errorbar(aes(ymax = OriginalTraitValue+OriginalErrorPos,
ymin = OriginalTraitValue-OriginalErrorNeg), width = 1) +
theme_bw()
```
The same can be done when multiple datasets are retrieved. For this example we will use two datasets about fecundity (number of eggs per individual) at different temperatures.
```{r}
longevity_temp <- get_datasets(c(71,111))
longevity_temp <- do.call(rbind,lapply(longevity_temp, data.frame, stringsAsFactors=FALSE))
```
Now we can display both datasets on the same plot:
```{r}
longevity_temp %>%
mutate(DatasetID = factor(DatasetID)) %>%
ggplot(aes(Interactor1Temp, OriginalTraitValue, col = DatasetID)) +
geom_point() +
geom_errorbar(aes(ymax = OriginalTraitValue+OriginalErrorPos,
ymin = OriginalTraitValue-OriginalErrorNeg), width = 2) +
theme_bw()
```
<br>