You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I think we can improve the speed of create_age_groups quite a bit and also remove the dependency on 'utils' package.
If we avoid cut() which is inefficient in creating factors as it goes through unnecessary unique() + match() steps internally.
We already have our cleaned age breaks which are unique and sorted, meaning we can avoid using cut() and directly use .bincode(). .bincode() is basically a low-level factor constructor and also what cut() uses as well.
To get a character vector, all that's needed is to subset our age breaks onto our bin codes.
I'm always keen on speed/memory improvements (especially if they work on larger datasets). My usual approach when I've made changes like this in the past to other functions is to start by expanding the tests for the function(s) so I'm 100% sure there's no unintended regressions or behaviour changes.
I've also been interested in https://lorenzwalthert.github.io/touchstone/ for a while which is meant for exactly these types of improvements - it involves a bit of setup but then you get a benchmark comment added to PRs.
Hi, I think we can improve the speed of
create_age_groups
quite a bit and also remove the dependency on 'utils' package.If we avoid
cut()
which is inefficient in creating factors as it goes through unnecessaryunique()
+match()
steps internally.We already have our cleaned age breaks which are unique and sorted, meaning we can avoid using
cut()
and directly use.bincode()
..bincode()
is basically a low-level factor constructor and also whatcut()
uses as well.To get a character vector, all that's needed is to subset our age breaks onto our bin codes.
On the topic of
cut()
inefficiency, there is a stack thread I opened a while ago: https://stackoverflow.com/questions/76867914/can-cut-be-improvedProposed function and benchmark:
Created on 2024-09-19 with reprex v2.0.2
This obviously relates to issues #93 and #54, which I think are also worthwhile but as subsequent step.
The text was updated successfully, but these errors were encountered: