You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With lazy tables being a relatively new concept for most users, it is super easy for the user to accidentally try to dplyr::collect() too much data into memory.
Perhaps provide a warning for novice users when creating a DuckDB lazy table based on more than X days of data, that they should not dplyr::collect() this into memory and should learn to use DuckDB/Arrow with dplyr and only pull the results of aggregation.
Expert users would be able to silence this warning with a package option.
The text was updated successfully, but these errors were encountered:
I think this is a good idea but also that we shouldn't spend too much time doing 'defensive programming'. If we state in the docs that the package can download lots of data and that we assume the user will take care with their settings that could save some lines of code, right?
Sure, just noting down some nice to haves, so that I don't forget. Not a big priority and I guess we are already doing some explaining in the README on how to work with lazy tables. The side effect of the package is that we are educating the users about the cutting edge approaches to working with large datasets on their laptops.
With lazy tables being a relatively new concept for most users, it is super easy for the user to accidentally try to
dplyr::collect()
too much data into memory.Perhaps provide a warning for novice users when creating a DuckDB lazy table based on more than X days of data, that they should not
dplyr::collect()
this into memory and should learn to useDuckDB
/Arrow
withdplyr
and only pull the results of aggregation.Expert users would be able to silence this warning with a package option.
The text was updated successfully, but these errors were encountered: