-
Notifications
You must be signed in to change notification settings - Fork 47
/
Copy path01-intro-to-vroom.Rmd
123 lines (88 loc) · 3.2 KB
/
01-intro-to-vroom.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
```{r intro-to-vroom, include = FALSE}
eval_vroom <- FALSE
if(Sys.getenv("GLOBAL_EVAL") != "") eval_vroom <- Sys.getenv("GLOBAL_EVAL")
```
```{r, include = FALSE, eval = eval_vroom}
library(vroom)
library(fs)
library(purrr)
library(dplyr)
```
# Introduction to `vroom`
## `vroom` basics
*Load data into R using `vroom`*
1. Load the `vroom()` library
```{r, eval = eval_vroom}
library(vroom)
```
2. Use the `vroom()` function to read the **transactions_1.csv** file from the **/usr/share/class/files** folder
```{r, eval = eval_vroom}
vroom("/usr/share/class/files/transactions_1.csv")
```
3. Use the `id` argument to add the file name to the data frame. Use **file_name** as the argument's value
```{r, eval = eval_vroom}
```
4. Load the prior command into a variable called `vr_transactions`
```{r, eval = eval_vroom}
```
5. Load the file spec into a variable called `vr_spec`, using the `spec()` command
```{r, eval = eval_vroom}
vr_spec <- spec(vr_transactions)
vr_spec
```
## Load multiple files
1. Load the `fs` and `dplyr` libraries
```{r, eval = eval_vroom}
library(fs)
library(dplyr)
```
2. List files in the **/usr/share/class/files** folder using the `dir_ls()` function
```{r, eval = eval_vroom}
dir_ls("/usr/share/class/files")
```
3. In the `dir_ls()` function, use the `glob` argument to pass a wildcard to list CSV files only. Load to a variable named `files`
```{r, eval = eval_vroom}
```
4. Pass the `files` variable to `vroom`. Set the `n_max` argument to 1,000 to limit the data load for now
```{r, eval = eval_vroom}
```
5. Add a `col_types` argument with `vr_specs` as its value
```{r, eval = eval_vroom}
```
6. Use the `col_select` argument to pass a `list` object containing the following variables: order_id, date, customer_name, and price
```{r, eval = eval_vroom}
```
## Load and modify multiple files
*For files that are too large to have in memory, keep a summarization*
1. Use a `for()` loop to print the content of each vector inside `files`
```{r, eval = eval_vroom}
for(i in seq_along(files)) {
print(files[i])
}
```
2. Switch the `print()` command with the `vroom` command, using the same arguments, except the file name. Use the `files` variable. Load the results into a variable called `transactions`.
```{r, eval = eval_vroom}
for(i in seq_along(files)) {
}
```
3. Group `transactions` by `order_id` and get the total of `price` and the number of records. Name them `total_sales` and `no_items` respectively. Name the new variable `orders`
```{r, eval = eval_vroom}
for(i in seq_along(files)) {
}
```
4. Define the `orders` variable as `NULL` prior to the for loop and add a `bind_rows()` step to `orders` to preserve each summarized view.
```{r, eval = eval_vroom}
orders <- NULL
for(i in seq_along(files)) {
}
```
5. Remove the `transactions` variable at the end of each cycle
```{r, eval = eval_vroom}
orders <- NULL
for(i in seq_along(files)) {
}
```
6. Preview the `orders` variable
```{r, eval = eval_vroom}
orders
```