Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duckdb equivalent to dplyr's separate() or separate_wider_delim()? #581

Open
adamschwing opened this issue Nov 6, 2024 · 1 comment
Open

Comments

@adamschwing
Copy link

Hello!

I would like to take a comma separated string and put each element in its own row. This is easy to do in dplyr using the separate() or separate_wider_delim() functions. However, my dataset is very large because each string has thousands of elements and the dataset contains thousands of these strings across many columns and rows. So doing this separation is impractical using purely dplyr.

Is there an equivalent function in duckdb-r or duckplyr for this?

@nbc
Copy link

nbc commented Nov 21, 2024

Something like that ?

library(duckdb)
#> Loading required package: DBI
con <- dbConnect(duckdb())

cat(readr::read_file('/tmp/split.csv'))
#> str1;str2
#> string;a1,a2,a3
#> string;a4,a5,a6
dbGetQuery(con, "SELECT str1, str_split(str2, ',').UNNEST() FROM read_csv('/tmp/split.csv', delim=';')")
#>     str1 unnest(str_split(str2, ','))
#> 1 string                           a1
#> 2 string                           a2
#> 3 string                           a3
#> 4 string                           a4
#> 5 string                           a5
#> 6 string                           a6

Created on 2024-11-21 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants