-
Notifications
You must be signed in to change notification settings - Fork 47
/
Copy path05-db-analysis.Rmd
152 lines (111 loc) · 3.54 KB
/
05-db-analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
```{r db-analysis, include = FALSE}
eval_db <- FALSE
if(Sys.getenv("GLOBAL_EVAL") != "") eval_db <- Sys.getenv("GLOBAL_EVAL")
```
```{r, eval = eval_db, include = FALSE}
library(connections)
library(RSQLite)
library(dplyr)
library(dbplyr)
library(config)
```
# Databases and `dplyr`
## Intro to `connections`
*Use `connections` to open open a database connection*
1. Load the `connections` package
```{r, eval = eval_db}
library(connections)
library(config)
```
2. Use `connection_open()` to open a Database connection
```{r, eval = eval_db}
con <- connection_open(
RPostgres::Postgres(),
host = "localhost",
user = get("user", config = "dev"),
password = get("pwd", config = "dev"),
port = 5432,
dbname = "postgres",
bigint = "integer"
)
```
3. The RStudio Connections pane should show the tables in the database
## Table reference
*Use the `dplyr`'s `tbl()` command*
1. Load the `dplyr` package
```{r, eval = eval_db}
library(dplyr)
```
2. Add `in_schema()` as an argument to `tbl()` to specify the schema
```{r, eval = eval_db}
tbl(con, in_schema("retail", "customer"))
```
3. Load the results from the `tbl()` command that points the table called **orders** to a variable called `orders`
```{r, eval = eval_db}
```
4. Use the `class` function to determine the object type of `orders`
```{r}
```
## Under the hood
*Use `show_query()` to preview the SQL statement that will be sent to the database*
1. Use `show_query()` to preview SQL statement that actually runs when we run `orders` as a command
```{r, eval = eval_db}
```
2. When executed, `orders` returns the first 1000 rows of the remote **orders** table
```{r}
orders
```
3. Full results of a remote query can be brought into R with `collect`
```{r}
local_orders <- collect(orders)
```
4. Easily view the resulting query by adding `show_query()` in another piped command
```{r, eval = eval_db}
```
5. Insert `head()` in between the two statements to see how the SQL changes
```{r, eval = eval_db}
```
6. Queries can be assigned to variables. Create a variable called `orders_head` that contains the previous query
```{r}
```
7. Use `sql_render()` and `simulate_mssql()` to see how the SQL statement changes from vendor to vendor
```{r, eval = eval_db}
```
8. Use `explain()` to explore the query plan
```{r, eval = eval_db}
```
## Un-translated R commands
*Review of how `dbplyr` handles R commands that have not been translated into a like-SQL command*
1. Preview how `mean` is translated
```{r}
orders %>%
mutate(avg_id = mean(order_id, na.rm = TRUE)) %>%
show_query()
```
2. Preview how `Sys.Date()` is translated
```{r, eval = eval_db}
```
3. Use PostgreSQL native commands, in this case `date`
```{r, eval = eval_db}
orders %>%
mutate(today = date('now')) %>%
show_query()
```
4. Run the `dplyr` code to confirm it works
```{r, eval = eval_db}
```
## Using bang-bang
*Intro on passing unevaluated code to a dplyr verb*
1. Preview how `Sys.Date()` is translated when prefixing `!!`
```{r, eval = eval_db}
```
2. View resulting table when `Sys.Date()` is translated when prefixing `!!`
```{r, eval = eval_db}
orders %>%
mutate(today = !!Sys.Date()) %>%
head()
```
3. Disconnect from the database using `connection_close`
```{r}
connection_close(con)
```