Python Polars 0.20.16
🚀 Performance improvements
- add new when-then-otherwise kernels (#15089)
- Coerce sorted flag of unit arrays during concat (#15104)
- Use sorted flag for
(first|last)_non_null
(#15050) - OOC sort improvements (#14994)
✨ Enhancements
- improved dtype inference/refinement for
read_database
results (#15126) - raise if both
closed
andby
are passed torolling_*
aggregations (#15108) - raise informative error for rolling_* aggs with
by
of invalid dtype (#15088) - add
non_existent
arg toreplace_time_zone
(#15062) - Support single nested row encodings (#15105)
- make ooc sort configurable (#15084)
- Make
register_plugin
a standalone function and include shared lib discovery (#14804) - Expose
infer_schema_length
parameter onread_database
(#15076) - Async parquet: Decode parquet on a blocking thread pool (#15083)
- let "ambiguous" take "null" value (#14961)
- Raise informative error message when join would introduce duplicate column name (#15042)
- Allow cast of decimal to boolean (#15015)
- Add
strict
parameter toDataFrame
constructor to allow non-strict construction (#15034) - Support Array statistics in parquet (#15031)
- Support decimal groupby (#15000)
- Add thread names to rayon thread pool (#15024)
- Support decimal uniq (#15001)
- expose timings in verbose state of OOC sort (#14979)
🐞 Bug fixes
- Support BinaryView in row decoder to prevent a panic in streaming group by (#15117)
- Binview chunked gather; don't modify inlined view (#15124)
- Fix chunked_id gather for binview buffers (#15123)
- Don't cache HTTP object stores as they maintain URL state (#15121)
- Output
u32
whensum_horizontal
provided with single boolean column (#15114) - Propagate error instead of panicking when calling
product
on an invalid type (#15093) - Raise error when casting Array to different width (#14995)
- Fix file scan bugs for ipc, csv and parquet that occur with combinations of glob paths, row indices and predicates (#15065)
- Incorrectly preserved sorted flag when concatenating sorted series containing nulls (#15082)
- Return largest non-NaN value for
max()
on sorted float arrays if it exists instead of NaN (#15060) - return NaN for all-NaN min/max (#15066)
- Prevent "index out of range for slice" error in parquet reader (#15021)
- Respect
nulls_last
in streaming sort (#15061) - Fix Series construction from nested list with mixed data types (#15046)
- Don't count nulls in streaming
count
agg (#15051) - agg_list on decimal lost scale (#15054)
- Block predicate pushdown on equality that are use in join (#15055)
- Enum equality based on categories (#15053)
- Don't panic in
string_addition_to_linear_concat
(#15006) - CSV do utf8-validation after escaping fields (#15004)
- Use primitive constructors to create a Series of lists when dtype is provided (#15002)
- replace_time_zone with single-null-element "ambiguous" was panicking (#14971)
📖 Documentation
- Update write_database code blocks in user guide (#15106)
- Add missing docstring examples in the Struct namespace (#15071)
- Improve API reference landing page (#14888)
- improve join_asof example (#14993)
- Fix inadvertent swap of
new
andold
parameters inreplace
description (#15019)
🛠️ Other improvements
- Extend and speed up scan tests (#15127)
- Add parameterized-scan-tests (#15057)
- Simplify streaming execution (#15039)
- Ensure we hit the spilled source path in ooc sort test (#15010)
- Refactor constructor code (#15009)
- fix features (#14977)
- Revert pinning PyPI publish action (#14975)
Thank you to all our contributors for making this release possible!
@JackRolfe, @MKisilyov, @MarcoGorelli, @alexander-beedie, @c-peters, @flisky, @jqnatividad, @mcrumiller, @mickvangelderen, @nameexhaustion, @orlp, @petrosbar, @ritchie46, @stinodego and @trueb2