Release 0.7.0 #139
nnansters
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
Niels from engineering here to announce our new 0.7.0 release!
We're focusing on eliminating some technical debt we've built up over time and have some exciting developments waiting for you. Let's dive in!
Installing / upgrading
You can get this latest version by using pip:
pip install -U nannyml
Or conda:
conda install -c conda-forge nannyml
What's new?
Refactoring the drift module
We created an elaborate structure of calculators when we first implemented drift detection. We had calculators for model inputs, scores, predictions, and targets. We thought it would make things simple. User research showed us it didn't.
We have now created a univariate drift calculator that is simpler and more streamlined. It lets you detect drift on model inputs, scores, predictions, and targets. It supports multiple methods to do so.
This new design is not only more user-friendly, but it is also extensible. It lets us introduce new drift detection methods smoothly. But I'm getting ahead of myself.
The following snipped shows off the new univariate drift calculator.
You can read more in the univariate drift calculator documentation.
Introducing Jensen-Shannon
We haven't only been refactoring. We added the Jensen-Shannon distance as a new univariate drift detection method. According to our experiments, Jenssen-Shannon can detect drift when KS or CHI2 tests would miss it. You can read more on this in the docs. And it works for both continuous and categorical variables!
The following snippet shows how to use it.
You can find out more in the univariate drift calculator documentation.
Refactoring results
The
Result
classes were another bit of debt to tackle. Storing the output of a calculator with multiple metrics, sometimes even for many columns, was a challenge.We would encode the name of metrics and features within the column names of the
DataFrames
we use for storage. But when plotting these results, we'd have to decode all of these again.The naming conventions were not consistent across calculators, to make things worse. It made it difficult for users to understand where to find specific data. The refactor aims to solve multiple problems. We introduced multilevel indexes to deal with hierarchies of many columns and metrics elegantly.
We ensure consistency across calculators with a new paradigm for filtering result data and turning them into
DataFrames
. And if you don't like multilevel indexes, you can always turn them off.The following snippet shows how it works.
You can read more about it in the working with results documentation.
Exporting results
We envision NannyML to be one of many tools in an MLOps toolchain. Results should be able to live outside of NannyML to achieve that.
We could already write results to disk. We've now added exporting to a pickle file or a database.
We've already created some fun integration scenarios using a database and Grafana using the NannyML container. Check out our examples repository for more information.
This code snippet shows you how to export results in code.
You can read more about it in the API reference and CLI documentation.
What's changed?
Updated Poetry to
1.2.0
. There are some breaking changes in thepyproject.toml
. Be sure to upgrade Poetry if you want to build from source locally.We've improved how the SizeBasedChunker deals with leftover data. You can now choose to drop it, allocate it to a new chunk, or append it to the last complete one. The default behavior has changed from drop to append.
What's next?
As we continue paying off technical debt, we tackle plotting next. Our current implementation lacks the flexibility we envision, but we have some ideas to improve it.
We hope you're excited about these new changes. Don't hesitate to give us your feedback and help us build a better NannyML!
Beta Was this translation helpful? Give feedback.
All reactions