Release 0.10.3 #371
nnansters
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey there!
We hope you've enjoyed the end-of-year festivities and are ready for an exciting new year. We at NannyML sure are! We've kicked off the year with a couple of small releases. We'll provide a quick overview here, check out the full release notes in our changelog!
So, without further ado, let's dive into NannyML 0.10.3!
Installing/upgrading
You can get this latest version by using pip:
pip install -U nannyml
Or conda:
conda install -c conda-forge nannyml
What’s new?
Domain classifier
We're very happy to announce that we now support using a Domain Classifier for multivariate drift detection!
Here's how it works, in a nutshell. For each chunk, we combine the reference data with the current chunk data. We then use cross validation to train a model to discriminate between chunk and reference rows. The model's predictions on the validation folds used are used to measure it's performance (via AUROC).
High performance implies that there are significant differences between both. It is easy to discriminate between reference and your chunk data, drift was detected!
A low score indicates the opposite: it is difficult to determine if the data belongs to the reference data or the chunk data, hence they must be very much alike!
The following snippet illustrates how to use the Domain Classifier:
Find out more about the
DomainClassifierCalculator
in the tutorial and how it works documentation.Distribution calculators
Our distribution plots have been present since our very first release of NannyML as part of the
UnivariateDriftCalculator
. We felt the time had come to optimize a bit and give them a space of their own in the library.We've added the
ContinuousDistributionCalculator
andCategoricalDistributionCalculator
. They calculate the distributions for a list of continuous or categorical features, surprisingly.We've tweaked the implementation a bit to be more resource-efficient than the previous version, as they no longer store the entire reference data set during fitting, but only some properties of the reference data distribution. This improves both the computation speed and memory usage.
The results of these calculators also support plotting, yielding the same nice "joyplots over time" or "bars over time" visualizations as before!
Here's an example of how to use them:
The old distribution implementation embedded in the
UnivariateDriftCalculator
was left untouched for now, so you can continue using it as before. We'll be evaluating its role and implementation in the future.What's changed?
We've made a lot of fixes, here's a highlight:
np.NaN
value for that chunk and then proceed with the next chunk. Previously this kind of exception would just shut down the calculator.p-value
-based thresholds for Chi2 univariate drift detection. This was the only place where p-values were being used. They caused a lot of confusion in the plots because the alerts would not visually align with "crossing the threshold". All univariate drift methods will now be using standard deviation-based thresholds.Some honorary mentions
@Kishan Savant
for having our documented backs on every release. Implying we forget about something with every release, woops.
What's next?
We've been pretty busy working on our NannyML Cloud product, featuring some novel algorithms like the improved, multi-calibrated version of
CBPE
and our reverse concept drift algorithms to estimate the effect of concept drift on your model.In the meantime, we're "weighing down" on an alternative for performance estimation and implementing a very popular (or should I say "population") drift detection method.
We hope you enjoy this new release. Any feedback is, as always, most welcome!
All the best,
Niels
Beta Was this translation helpful? Give feedback.
All reactions