Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-day precision Date class must be converted to integer by floor, not trunc #517

Open
eitsupi opened this issue Oct 23, 2024 · 4 comments · May be fixed by #981
Open

Sub-day precision Date class must be converted to integer by floor, not trunc #517

eitsupi opened this issue Oct 23, 2024 · 4 comments · May be fixed by #981
Labels
bug an unexpected problem or unintended behavior

Comments

@eitsupi
Copy link
Contributor

eitsupi commented Oct 23, 2024

Same as apache/arrow#44502 and apache/arrow-nanoarrow#665

Conversion of dates prior to the Posix origin is not correct.

(d <- as.Date(c(-1.1, -0.1, 0, 0.1, 1.1)))
#> [1] "1969-12-30" "1969-12-31" "1970-01-01" "1970-01-01" "1970-01-02"
df <- data.frame(d = d)
con <- DBI::dbConnect(duckdb::duckdb())
duckdb::duckdb_register(con, "df", df)
duckdb:::sql("from df", con)$d
#> [1] "1969-12-31" "1970-01-01" "1970-01-01" "1970-01-01" "1970-01-02"

Created on 2024-10-23 with reprex v2.1.1

With duckplyr:

(df <- tibble::tibble(d = as.Date(c(-1.1, -0.1, 0, 0.1, 1.1))))
#> # A tibble: 5 × 1
#>   d
#>   <date>
#> 1 1969-12-30
#> 2 1969-12-31
#> 3 1970-01-01
#> 4 1970-01-01
#> 5 1970-01-02

df |>
  duckplyr::as_duckplyr_tibble() |>
  dplyr::select(d)
#> The duckplyr package is configured to fall back to dplyr when it encounters an
#> incompatibility. Fallback events can be collected and uploaded for analysis to
#> guide future development. By default, no data will be collected or uploaded.
#> → Run `duckplyr::fallback_sitrep()` to review the current settings.
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [d as d]
#>   r_dataframe_scan(0x56267c113fd8)
#>
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - d (DATE)
#>
#> # A tibble: 5 × 1
#>   d
#>   <date>
#> 1 1969-12-31
#> 2 1970-01-01
#> 3 1970-01-01
#> 4 1970-01-01
#> 5 1970-01-02

Created on 2024-10-23 with reprex v2.1.1

@eitsupi eitsupi changed the title Sub-day precision Date class must be converted to integer by floor, not ceil Sub-day precision Date class must be converted to integer by floor, not trunc Oct 23, 2024
@krlmlr
Copy link
Collaborator

krlmlr commented Oct 28, 2024

Thanks, confirmed. What behavior would you expect with such dates, given that they can't be properly represented in duckdb (or so it seems)?

df1 <- tibble::tibble(
  delta = c(-1.1, -0.1, 0, 0.1, 1.1),
  d = as.Date(c(-1.1, -0.1, 0, 0.1, 1.1), origin = "1970-01-01"),
  dd = d + 0.5,
)
df1
#> # A tibble: 5 × 3
#>   delta d          dd        
#>   <dbl> <date>     <date>    
#> 1  -1.1 1969-12-30 1969-12-31
#> 2  -0.1 1969-12-31 1970-01-01
#> 3   0   1970-01-01 1970-01-01
#> 4   0.1 1970-01-01 1970-01-01
#> 5   1.1 1970-01-02 1970-01-02

duckdb <- asNamespace("duckdb")
drv <- duckdb::duckdb()
con <- DBI::dbConnect(drv)
experimental <- FALSE

"select"
#> [1] "select"
rel1 <- duckdb$rel_from_df(con, df1, experimental = experimental)
"select"
#> [1] "select"
rel2 <- duckdb$rel_project(
  rel1,
  list(
    {
      tmp_expr <- duckdb$expr_reference("delta")
      duckdb$expr_set_alias(tmp_expr, "delta")
      tmp_expr
    },
    {
      tmp_expr <- duckdb$expr_reference("d")
      duckdb$expr_set_alias(tmp_expr, "d")
      tmp_expr
    },
    {
      tmp_expr <- duckdb$expr_reference("dd")
      duckdb$expr_set_alias(tmp_expr, "dd")
      tmp_expr
    }
  )
)
rel2
#> DuckDB Relation: 
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> Projection [delta as delta, d as d, dd as dd]
#>   r_dataframe_scan(0x1145c1888)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - delta (DOUBLE)
#> - d (DATE)
#> - dd (DATE)
duckdb$rel_to_altrep(rel2)
#>   delta          d         dd
#> 1  -1.1 1969-12-31 1970-01-01
#> 2  -0.1 1970-01-01 1970-01-01
#> 3   0.0 1970-01-01 1970-01-01
#> 4   0.1 1970-01-01 1970-01-01
#> 5   1.1 1970-01-02 1970-01-02

Created on 2024-10-28 with reprex v2.1.1

@krlmlr krlmlr added the bug an unexpected problem or unintended behavior label Oct 28, 2024
@eitsupi
Copy link
Contributor Author

eitsupi commented Oct 28, 2024

What behavior would you expect with such dates, given that they can't be properly represented in duckdb (or so it seems)?

I'm sorry, but I don't understand your question.
I was just pointing out the fact that it is unreasonable for Dates based on a double type to be trunced.
For a simple double, it makes sense to switch before and after 0 because 0 has meaning, but in the case of Date, 1970-01-01 has no meaning, so I don't think the operation should switch before or after it.

@krlmlr
Copy link
Collaborator

krlmlr commented Oct 28, 2024

I see that flooring is the more correct operation here. Would you like to contribute? I'm releasing 1.1.2 very soon, but 1.1.3 is imminent.

@eitsupi
Copy link
Contributor Author

eitsupi commented Oct 28, 2024

Sorry, but I have no experience with C++ and don't know where to fix it.

@toppyy toppyy linked a pull request Jan 8, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants