Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/add date-related functions #643

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

IoannaNika
Copy link

Added support for:

  1. date_build() - clock package
  2. difftime() - base package

Fixed:

  1. add_days()
  2. add_years()

The issue is shown below. When the functions are used with a variable being passed for the interval number to be added, the translation is wrong. Parenthesis are needed to support variables (see documentation https://duckdb.org/docs/sql/functions/date.html#date_adddate-interval)

library(clock)
library(DBI)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  
con <- DBI::dbConnect(duckdb::duckdb())

date_df <- dplyr::tibble(
  date1 = as.Date(c("2000-12-01", "2000-12-01", "2000-12-01", "2000-12-01")),
  date2 = as.Date(c("2001-12-01", "2001-12-02", "2001-11-30", "2001-01-01")),
  y = 2000L, m = 10L, d = 11L
)

DBI::dbWriteTable(con,  "tmpdate", date_df)
date_tbl <- dplyr::tbl(con,  "tmpdate")


df <- dplyr::mutate(date_tbl,
  date7 = as.Date(add_years(date1, m)),
  date8 = as.Date(add_days(date1, m))
) 

df %>% show_query()
#> <SQL>
#> SELECT
#>   tmpdate.*,
#>   CAST(DATE_ADD(date1, INTERVAL 'm year') AS DATE) AS date7,
#>   CAST(DATE_ADD(date1, INTERVAL 'm day') AS DATE) AS date8
#> FROM tmpdate

df
#> Error in `collect()`:
#> ! Failed to collect lazy table.
#> Caused by error in `duckdb_result()`:
#> ! rapi_execute: Failed to run query
#> Error: Conversion Error: Could not convert string 'm year' to INTERVAL

sql <- "SELECT
   tmpdate.*,
   CAST(DATE_ADD(date1, INTERVAL (m) year) AS DATE) AS date7,
   CAST(DATE_ADD(date1, INTERVAL (m) day) AS DATE) AS date8
 FROM tmpdate"

dbGetQuery(conn = con, sql)
#>        date1      date2    y  m  d      date7      date8
#> 1 2000-12-01 2001-12-01 2000 10 11 2010-12-01 2000-12-11
#> 2 2000-12-01 2001-12-02 2000 10 11 2010-12-01 2000-12-11
#> 3 2000-12-01 2001-11-30 2000 10 11 2010-12-01 2000-12-11
#> 4 2000-12-01 2001-01-01 2000 10 11 2010-12-01 2000-12-11

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS 15.0
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Amsterdam
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.1.4 DBI_1.2.3   clock_0.7.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.5       cli_3.6.3         knitr_1.48        rlang_1.1.4      
#>  [5] xfun_0.48         purrr_1.0.2       generics_0.1.3    glue_1.8.0       
#>  [9] dbplyr_2.5.0.9000 htmltools_0.5.8.1 fansi_1.0.6       rmarkdown_2.28   
#> [13] evaluate_1.0.0    tibble_3.2.1      tzdb_0.4.0        fastmap_1.2.0    
#> [17] yaml_2.3.10       lifecycle_1.0.4   duckdb_1.1.3-1    compiler_4.4.1   
#> [21] blob_1.2.4        fs_1.6.4          pkgconfig_2.0.3   rstudioapi_0.16.0
#> [25] digest_0.6.37     R6_2.5.1          tidyselect_1.2.1  reprex_2.1.1     
#> [29] utf8_1.2.4        pillar_1.9.0      magrittr_2.0.3    tools_4.4.1      
#> [33] withr_3.0.2

Created on 2024-12-12 with reprex v2.1.1

Copy link
Collaborator

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The bug fix and the new features could be two separate PRs. Also, medium-term, I'd like to switch to snapshot tests for the SQL translations, this PR could be a start.

@@ -275,10 +275,10 @@ sql_translation.duckdb_connection <- function(con) {

# clock
add_days = function(x, n, ...) {
build_sql("DATE_ADD(", !!x, ", INTERVAL '", n ," day')")
build_sql("DATE_ADD(", !!x, ", INTERVAL (", n ,") day)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now wonder why this says !!x with the bang-bang and n without.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess is a choice, I think x can be a function to be evaluated in this case, while n can't

@@ -302,6 +302,17 @@ sql_translation.duckdb_connection <- function(con) {
build_sql("DATEDIFF('day', ", !!start, ", " ,!!end, ")")

},
date_build = function(year, month = 1L, day = 1L, ..., invalid = NULL) {
dbplyr:::check_unsupported_arg(invalid, allow_null = TRUE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not access a function private to dbplyr here. How do other packages handle this? As a last resort, we could vendor a variant of check_unsupported_arg() here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would you implement such a function?

date_build = function(year, month = 1L, day = 1L, ..., invalid = NULL) {
dbplyr:::check_unsupported_arg(invalid, allow_null = TRUE)
rlang::check_dots_empty()
build_sql("MAKE_DATE(CAST(", year, " AS INTEGER), CAST(", month, " AS INTEGER), CAST(", day, " AS INTEGER))")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how SELECT MAKE_DATE(2022.0, 3.0, 8.0); fails in duckdb. Still, casting shouldn't be a concern to MAKE_DATE() .

difftime = function(time1, time2, tz, units = "days") {
dbplyr:::check_unsupported_arg(tz)
dbplyr:::check_unsupported_arg(units, allowed = "days")
build_sql("DATEDIFF('day', ", !!time2, ", " ,!!time1, ")")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use the recommended sql_expr() in the translation here and above.

@@ -156,6 +157,10 @@ test_that("custom clock functions translated correctly", {
precision = "day")),
sql(r"{DATEDIFF('day', start, end)}"))

expect_equal(translate(difftime(time1 = "time1", time2 = "time2", units = "days")), sql(r"{DATEDIFF('day', time2, time1)}"))
expect_equal(translate(date_build(year = 2000, month = 08, day = 08)), sql(r"{MAKE_DATE(CAST(2000.0 AS INTEGER), CAST(8.0 AS INTEGER), CAST(8.0 AS INTEGER))}"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a proper test:

Suggested change
expect_equal(translate(date_build(year = 2000, month = 08, day = 08)), sql(r"{MAKE_DATE(CAST(2000.0 AS INTEGER), CAST(8.0 AS INTEGER), CAST(8.0 AS INTEGER))}"))
expect_equal(translate(date_build(year = 2000L, month = 8L, day = 8L)), sql(r"{MAKE_DATE(CAST(2000.0 AS INTEGER), CAST(8.0 AS INTEGER), CAST(8.0 AS INTEGER))}"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants