-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New C callables to support tibble #1679
Comments
Alternatively, vctrs could provide a trimmed-down version of tibble subsetting, in C, without tibble's compatibility code, based on https://github.com/DavisVaughan/standardize. We could evaluate beforehand what happens when we remove the compatibility code from tibble -- replacing all |
To support this case here is a benchmarks from my original issue in tibble: f <- function(x, n = 10e3) {
for (i in seq(n)) {
x[["x"]] <- 1L
}
}
t <- tibble::tibble(x = 1L)
df <- data.frame(x = 1L)
l <- list(x = 1L)
bench::mark(
tibble = f(t),
dataframe = f(df),
list = f(l)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 tibble 523.22ms 523.22ms 1.91 211.5KB 15.3
#> 2 dataframe 78.31ms 80.04ms 12.0 80.9KB 18.0
#> 3 list 1.35ms 1.52ms 560. 0B 31.9 Created on 2022-08-25 with reprex v2.0.2 The difference between tibble and a bare list are so big that for me it made more sense to use the following workflow:
See tidyverse/tidyr#1386 (comment) for a real life example where the performance loss was so drastic that I had to use this approach. |
|
Subsetting and subset assignment are basic operations that are also used in loops or repeatedly. The styler package got 30 % faster when replacing tibble subsetting with data.frame subsetting: r-lib/styler#1007. I'd like to make tibble subsetting faster than data.frame subsetting. I wouldn't even mind to move the entire subsetting to vctrs to achieve this. |
@lionel-: Are you suggesting to use |
Subsetting and subassignment is up to 5x slower than the corresponding data frame operations. I don't see potential to optimize at the R level without compromising readability. To implement efficiently in C, we'd need the following C callables:
Ideally, these C callables would return an error object instead of throwing an error. Do we have a good pattern for this?
The text was updated successfully, but these errors were encountered: