Releases: pola-rs/polars
Python Polars 0.20.1
🐞 Bug fixes
- repeat_by should not raise if by contains nulls (#13105)
- [csv] raise on single quote char (#13104)
- Raise if scan zstd compressed csv file (#13102)
- allow timeunit-less dtype in
pl.lit
creation (#12997) - Don't check map length if input is literal (#13098)
- rolling_quantile can get incorrect state (#13088)
🛠️ Other improvements
- Fix column name in
contains_any
example (#13090) - update user-defined-functions for 0.19.x (#13071)
- Fix some links, and make
map_batches
warning more evident (#13081) - Linting updates (#13069)
- take pl.concat out of StringCache context manager in "mismatched string cache" error message (#13076)
- add Enum to dtype list (#13080)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @mcrumiller, @reswqa, @ritchie46 and @stinodego
Python Polars 0.20.0
This version includes quite a few breaking changes. We are preparing for the 1.0
release and aim to make the upgrade from 0.20
to 1.0
as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with 1.0
.
Check out the upgrade guide for help navigating the upgrade to this version.
Please bear with us while we continue to make Polars the best tool it can be!
🏆 Highlights
- Add new
Enum
categorical data type which allows a fixed set of categories (#11822)
💥 Breaking changes
- Use Object Store instead of fsspec for
read_parquet
(#13044) - Reimplement
replace
expression on the Rust side (#13002) - Preserve left and right join keys in outer joins (#12963)
- Update
update
signature (#12986) - Update
Expr.count
to ignore null values by default (#12934) - Scheduled removal of previously deprecated functionality (#12885)
- Allow all
DataType
objects to be instantiated (#12470) - Change
value_counts
resulting column name fromcounts
tocount
(#12506) - Change default
join
behavior with regard to nulls, addjoin_nulls
parameter to keep existing behavior (#12840) - Default to exact checking for integers in assertion utils (#12331)
- Set default dtype for Series to
Null
when no data is present (#12807) - Update
lit
behavior for list/tuple inputs (#12559) - Change
DataType.is_nested
from property to classmethod (#12453) - Update constructors for Array and Decimal (#12837)
- Smaller integer data types for datetime components (#12070)
- Fix
NaN
ordering to make NaNs compare greater than any other float, and equal to themselves (#12721)
⚠️ Deprecations
- Rename
write_database
parameterif_exists
toif_table_exists
(#12783)
🚀 Performance improvements
- Avoid dispatching to expression engine for various
Series
methods (#13010) - Elide allocation in outer join materialization (#12992)
- Avoid dispatching
Series.head/tail
to the expression engine (#12946) - Ensure we reduce for
any/all_horizontal
(#12976) - Add fast paths for UTC in
truncate
(#12965) - Use
select_seq
for expression dispatch (#12962) - Improve
rolling_median
algorithm (#12704) - Use fast path for non-null data in new SQL-like null matching (#12874)
- Optimize
DataFrame.iter_rows
for smaller buffer sizes (#12804) - Speed up initializing
Series
from a list of NumPy arrays (#12785)
✨ Enhancements
- Add
str.contains_any
andstr.replace_many
(Aho-Corasick algorithms) (#13073) - Auto-infer credentials from
.aws
folder (#13062) - Support private cloud S3 storage in
scan_parquet
(#13060) - Use Object Store instead of fsspec for
read_parquet
(#13044) - Avoid dispatching to expression engine for various
Series
methods (#13010) - Allow order operators (<,>,>=,<=) on Enum types (#12982)
- Reimplement
replace
expression on the Rust side (#13002) - Expand set of NumPy functions which emit
inefficient map_*
warning (#13039) - Use tokio semaphore for concurrency handling (#13026)
- Improve and expressify
hist
(#13014) - Update
describe
to use newcount
implementation (#12990) - Add default
to_struct
Series name consistent with the usual default Series name (empty string) (#12998) - Preserve left and right join keys in outer joins (#12963)
- Clarify "inefficient
map_elements
" warning message (#12978) - Allow
end
beforestart
indate/time_range
(#12964) - Update
update
signature (#12986) - Minor update to
Array
data type repr (#12973) - Implement group-tuples for
Null
dtype (#12975) - Cast to an enum from int (#12954)
- Move categorical ordering into dtype (#12911)
- Avoid importing interchange module by default (#12927)
- Update
Expr.count
to ignore null values by default (#12934) - Raise if expression passed as scalar to DataFrame constructor (#12916)
- Update
repr
ofStruct
data type class (#12922) - Enable partial predicate pushdown past window expressions (#12710)
- Add
merge
mode towrite_delta
and remove pyarrow to delta conversions (#12392) - Add
str.reverse
(#12878) - Allow all
DataType
objects to be instantiated (#12470) - Specific performance warnings from Rust to Python (#12802)
- Change
value_counts
resulting column name fromcounts
tocount
(#12506) - Implement
std
andvar
forDuration
columns (#12865) - Change default
join
behavior with regard to nulls, addjoin_nulls
parameter to keep existing behavior (#12840) - Enhance
write_database
return (indicate the number of rows affected by the operation) (#12830) - Add dedicated
Decimal
selector (#12852) - Preserve base dtype when raising to
UInt
power (#10446) - Default to exact checking for integers in assertion utils (#12331)
- Improve
__repr__
implementation forExpr
(#12770) - Support SQL subqueries for
JOIN
andFROM
(#12819)
🐞 Bug fixes
- Fix off-by-one error in
quantile(method="nearest")
(#13058) - Fix incorrect schema inference on nested columns (#13057)
- Don't raise for
datetime_range
if starting on ambiguous datetime and earliest was specified (#13050) - Parse
json_decode
per max buffer length (#13029) - Parse
00:00
time zone as UTC (#13034) - Fix timeout errors in concurrent downloads (#13023)
- Streamline
align_frames
and fix edge-case where the identical frame object appears more than once (#13007) - Fix SQL substring indexing (#13016)
- Allow broadcasting in
ranges
(#11900) - Prevent deadlock in
sink_csv
(#12991) - Don't get mutable if buffer is sliced (#12979)
- Support parameterized
read_database
calls against cursors that only take positional args (#12967) - Fix
truncate
when truncating by multiple weeks (#12948) - Fix segfault / memory corruption after plugins return
Err
result (#12953) - Raise a proper python typed exception when IO writers try to write to an non existent folder (#12936)
- Don't panic when
ambiguous
parameter is not Utf8 (#12913) - Raise a proper python typed exception when the CSV writer tries to write to an non existent folder (#12919)
- Patch
rolling_var
/rolling_std
numerical stability (#12909) - Fix incorrect Int16
min
/max
due to incorrect SIMD mask construction (#12908) - Improve handling of decimal conversion with
to_numpy
in the absence of pyarrow (#12888) - Fix OOB error in list set operations on empty frame (#12845)
- Fix error message for uninstantiated
Enum
types (#12886) - Fix repr of
Expr.gather
(which was still showing deprecated take) (#12864) - Fix
Array
dtype equality (#12853) - Fix
nan_min/max
incorrectly aggregating chunks with addition (#12848) - Revert type hint change on expression inputs (#12792)
- More accurate type hinting for
collect_all
functions (#12796) - Use total float ordering in is_in (#12800)
- Handle aggregation for all-NaN groups in
group_by
(#12304)
🛠️ Other improvements
- Update version switcher for
0.20
(#12844) - Add upgrade guide for Python Polars 0.20 (#12872)
- Run doctests before other tests (#13047)
- Update
describe
calculation of min/max (#13027) - Minor typo fix (#13003)
- Resolve two interchange tests failing locally (#12999)
- Update outdated links to API in Expressions/Functions page (#12981)
- Expand docstrings for
count
(#12960) - Fix issue with docs for
group_by_dynamic
(#12906) - Prefer explicit
--no-cov
flag for py3.12/ubuntu test workflow (vs implicit/omitted) (#12889) - Scheduled removal of previously deprecated functionality (#12885)
- Fix references in deprecation notes (#12877)
- Fix typo in
hash
docstring (#12879) - Fix docstring for deprecated
list.take
(#12873) - Note that
list.take
is deprecated (#12867) - Fix failing tests (#12859)
- Add quotes to
pip install
with dependencies (#12799) - Fix parameter name reference in
update
docstring #12797
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Object905, @Yerachmiel-Feltzman, @alexander-beedie, @c-peters, @ion-elgreco, @jankislinger, @mcrumiller, @nameexhaustion, @oli-clive-griffin, @orlp, @rancomp, @ritchie46, @romanovacca, @stinodego and @xuestrange
Python Polars 0.19.19
✨ Enhancements
- Parquet support required deltabyte encoding (#12836)
🐞 Bug fixes
- Fix incorrect values from parquet RLE decoding (#12818)
- Write only one dict page per row rowgroup (#12831)
Thank you to all our contributors for making this release possible!
@nameexhaustion, @ritchie46 and @stinodego
Python Polars 0.19.18
✨ Enhancements
- support nested null in vstack/append/extend/concat (#12771)
- Improve error messages on attempted Arrow conversions involving incompatible/unknown dtypes (#12421)
- determine mode parallelism depending on current tasks (#12764)
- enable slice push down past
with_columns
(#12742) - Improve
write_database
, accounting for latestadbc
fixes/updates (#12713)
🐞 Bug fixes
- don't use streaming engine if aggregate is unknown (#12769)
- Enable special casing of sequence in list_to_struct (#12759)
- hold align_chunks_invariant (#12738)
- allow leading zero and plus in integer parsing (#12744)
- csv lines iter, always return remainder (#12739)
- fix oob in set operations (#12736)
- undo regression in ability to read certain parquet files (#12731)
🛠️ Other improvements
- Use latest
atoi_simd
release (#12748) - Fix invalid references to
xlsx2csv
dependency (#12741) - Remove pinned
aiohttp
dependency (#12733)
Thank you to all our contributors for making this release possible!
@0siride, @PierreAttard, @RoDmitry, @alexander-beedie, @dependabot, @dependabot[bot], @eitsupi, @kszlim, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Python Polars 0.19.17
✨ Enhancements
- Automatically wrap NumPy array as lit (#12709)
- Add
DataFrame.iter_columns
(#12653) - favour showing "adbc_driver_manager" over "adbc_driver_sqlite" in
show_versions
(#12690)
🐞 Bug fixes
- corr return nan if denominator is invalid (#12708)
- parquet decimal statistics and schema (#12705)
- support
append
/extend
with null series (#11824) (#12686) - address a numpy ndarray init regression (#12701)
- fix carrying over infinity into other windows (#12685)
🛠️ Other improvements
- Update URI prefix in examples (prefer "postgresql" to "postgres") (#12707)
- now that
scan_parquet
supports hive partitioning, remove note pointing toscan_pyarrow_dataset
(#12706) - Minor docstring fixes (#12688)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @c-peters, @ritchie46, @stinodego and @tkarabela
Python Polars 0.19.16
⚠️ Deprecations
- Rename
series_equal
/frame_equal
toequals
(#12618) - Rename
map_dict
toreplace
and change default behavior (#12599)
🚀 Performance improvements
- order(s) of magnitude speedup when initialising
List
dtypeSeries
from 2D numpy array (#12672) - improve
merge_local_rhs_categorical
traversal (#12660) - make values_size estimate correct for sliced arrays (#12658)
- improve parquet utf8 validation (#12655)
- parquet pre-allocate buffer in binary plain encode (#12652)
- optimize dict binary decoding in parquet (#12648)
- ensure we only check the values within bounds (#12633)
- parquet; elide recursion in hot path (#12625)
- improve cov/corr algorithm (#12590)
✨ Enhancements
- Join operations on local categoricals (#12657)
- Implement
PySeries.from_buffer
for boolean buffers (#12654) - Implement
PySeries.from_buffer
for numeric types (#12646) - use RLE_DICTIONARY for integers in parquet (#12647)
- extend recent
filter
syntax upgrades towhen/then
construct (#12603) - implement RLE_DICT encoding for utf8/binary columns (reduced parquet file size) (#12623)
- implement 'DeltaByteArray' decoding for parquet (#12602)
🐞 Bug fixes
- json null inference (#12677)
- cov/corr respect f32 type (#12676)
- fix ternary zip_with null broadcast (#12668)
- support negative slice on eager frame (#12644)
- fix concurrency budget assertion (#12641)
- fix oob in set operations (#12640)
- panic reading parquet nested struct column (#12614)
- Fix deprecation message for
DataFrame.sum
(#12619) - features:
performant,lazy,random
(#12600)
🛠️ Other improvements
- Use
range
instead ofnp.arange
in constructors (#12621) - update custom allocator instructions to include macOS (#12593)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @cardoso, @dmitrybugakov, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Python Polars 0.19.15
⚠️ Deprecations
- Rename
str.json_extract
tostr.json_decode
(#12586)
🚀 Performance improvements
- apply left side predicate pushdown also to right side on semi join (#12565)
- ensure streaming parquet download remains concurrent
~7x
(#12552)
✨ Enhancements
- warn if
by
column is not sorted in rolling aggregations (as opposed to raising), add warn_if_unsorted argument (#12398) - struct -> json encoding expression (#12583)
- Implement support for multi-character comments in
read_csv
(#12519) - Implement
LazyFrame.sink_ndjson
(#10786) - use JEMALLOC on all unix architectures (#12568)
- improve concurrency parameters (#12567)
- In explain(), rename PIPELINE to STREAMING so it's clearer what it means (#12547)
🐞 Bug fixes
- error when invalid list to array is given (#12584)
- parquet: do not extend existing nested that is already complete (#12569)
- accidental panic if predicate selects no files (#12575)
- fix lazy parquet slice with nested columns (#12558)
- ensure stats-evalutor exists (#12566)
- list schema of list
eval
(#12563) - ensure concurrency budget never locks (#12555)
- Fix lazy schema for
group_by_dynamic
androlling
(#12551) - address overflow on vec capacity calculation for
int_ranges
with negative step (#12548)
🛠️ Other improvements
- convert all recursive parquet deserialize to iterative (#12560)
- Minor cleanup in Expr class (#12549)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Qqwy, @alexander-beedie, @dmitrybugakov, @fernandocast, @gab23r, @itamarst, @nameexhaustion, @ritchie46, @stinodego and @uchiiii
Rust Polars 0.35.0
🏆 Highlights
- improve join performance through radix partitioned join (#12270)
💥 Breaking changes
- Rename cumulative functions
cumsum -> cum_sum
and similar (#12513) - Rename
take
togather
(#12528) - Add dedicated horizontal aggregation methods to
DataFrame
(#12492) - Rename
take_every
togather_every
(#12531) - Deprecate
parse_int
in favor ofto_integer
(#12464) - plugins add version and context (#12433)
- Fix
scan_csv
error type (#12355) - Rename
write_csv
parameterhas_header
toinclude_header
(#12351) - Rename
is_signed
tois_signed_integer
(#12220) - Rename
dt.seconds
todt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179) - Rename
ljust
/rjust
topad_end
/pad_start
(#11975)
🚀 Performance improvements
- speed up cov/corr with SIMD + strength-reduction
~3x 0.19.13/ ~2x numpy
(#12471) - apply predicates and statistics of parquet files in streaming mode (#12439)
- use online algorithm for cov/corr
~2x
(#12412) - indexvec in group-by (#12371)
- reduce allocations in hash join (#12368)
- change concurrency parameters (#12321)
- improve join performance through radix partitioned join (#12270)
- remove extra multiplication in hash_to_partition (#12233)
- allow non-power-of-two partitions (#12225)
- Reduce compute in error message for failed datetime parsing (#12147)
- improve parquet downloading (#12061)
✨ Enhancements
- Add dedicated horizontal aggregation methods to
DataFrame
(#12492) - support http scan_parquet (#12517)
- Add support for UTF-8 BOM option in
write_csv
andsink_csv
(#12253) - remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- Allow comparison of two local categories with the same hash (#12503)
- more changes for versioned plugins (#12504)
- plugins add version and context (#12433)
- include i128 in more primitive functions (#12413)
- write rolling functions as private expressions. (#12379)
- Add
round_sig_figs
expression for rounding to significant figures (#11959) - change concurrency parameters (#12321)
- deprecate
_saturating
in duration string language, make it the default (#12301) - auto infer
ambiguous
for truncate and round (#12204) - Rename
is_signed
tois_signed_integer
(#12220) - New
Config
options for numeric formatting: digit grouping and thousands/decimal separator (#12099) - allow non-aggregation predicate in ternary groupby (#12286)
- Add
name=
in.write_avro
to set schema name (#12255) - Add support for reading zstd compressed files (no-options) in read_csv (#12214)
- start prefetching all files immediately (#12201)
- Add
.list.to_array
expression (#12192) - consolidate & improve all casting failure error messages (#12168)
- tunable concurrency (#12171)
- support reverse sort in streaming (#12169)
- Add
.arr.to_list
expression (#12136) - add concurrency budget (#12117)
- Introduce ignore_nulls for str.concat (#12108)
- casting utf8 to temporal (#12072)
- Add supertype for
List
/Array
(#12016) - enable eq and neq for array dtype (#12020)
- Expressify n of shift (#12004)
- add dedicated
name
namespace for operations that affect expression names (#11973)
🐞 Bug fixes
- fix incorrect ternary agg states (#12538)
- fix and improve ternary evaluation on groups (#12529)
- saturating sub in debug msg (#12525)
- fix panic when writing
Decimal
type to parquet (#12532) - pre-fefetch struct columns in async projection pd (#12514)
- rechunk cross join output in streaming (#12511)
- fix as_list logical types (#12507)
- fix streaming cross join on empty df (#12491)
- dont overflow when calculating date range over very long periods (#12479)
- Allow append/zip_with/extend on local categoricals (#12369)
- Do not panic if time is invalid (#12466)
- empty csv no-raise (#12434)
- Fix
scan_csv
error type (#12355) - binary operations in aggregation context on literals (#12430)
- update groups state after binary aggregation (#12415)
- Remove extra
\n
when reading file-like object wi… (#12333) - revert ternary special broadcast, ensure broadcast is always to max height (#12395)
- ensure first/last return null if empty (#12401)
- Do not cast lit if has same dtype (#12342)
- Fix index column name of rolling/dynamic group by (#12365)
- ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
- uint64 should be correctly extracted from python object (#12338)
- expr_output_name include literal (#12335)
- Fix Decimal dtype table repr (#12318)
- Fix behavior of month intervals in
date_range
(#12317) - scan emtpy csv miss row_count (#12316)
- zip_with also broadcast mask (#12309)
- respect hive_partitioning flag when dealing with multiple files (#12315)
- parquet, add row_count to empty file materialization (#12310)
- fix download ranges in parquet (#12313)
- object store path derivation for local URL (#12308)
- don't move right endpoint of windows in rolling in default
offset==-period
case (#12267) - Raise more informative error on invalid
reshape
input (#12288) - incorrect super type for literals in nested binary exprs (#12238)
- Update
null_count
after arithmetic (#12280) - fix ambiguous aggregation type (#12269)
- Consistently propagate nulls for
numpy
ufuncs (#12212) - respect return_scalar of list scalars (#12251)
- potential overflow (#12206)
- always start a new thread if the thread is already blocking (#12202)
- with_row_count should block predicate push down for lazy csv (#12187)
- rechunk failed-list series before iterate (#12189)
- Raise if *_horizontal without inputs (#12106)
- fix incorrect desc sort behavior (#12141)
take
should block predicate pushdown (#12130)- use null type when read from unknown row (#12128)
- boundary predicate to block all accumulated predicates in push down (#12105)
- make python
schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045) - fix panic when initializing Series with array of list dtype (#12148)
- Fix schema of arr.min/max (#12127)
- ensure filter predicate inputs exist in schema (#12089)
- str.concat on empty list (#12066)
- binary agg should group aware if literal not a scalar (#12043)
- Use Arrow schema for file readers (#12048)
- Error on duplicates in hive partitioning (#12040)
- display fmt for str split (#12039)
- sum_horizontal should not always cast to int (#12031)
- fix apply_to_inner's dtype (#12010)
- Fix padding for non-ASCII strings (#12008)
- inline parts of unstable unicode module for stable (#12003)
- fix dot visualization of anonymous scans (#12002)
- SQL table aliases (#11988)
🛠️ Other improvements
- Rename cumulative functions
cumsum -> cum_sum
and similar (#12513) - fix and improve ternary evaluation on groups (#12529)
- Rename
take
togather
(#12528) - Add dedicated horizontal aggregation methods to
DataFrame
(#12492) - Rename
take_every
togather_every
(#12531) - Add
polars-ds
to list of community plugins (#12527) - add schema test (#12523)
- remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- add test for previous commit (#12510)
- Support Python 3.12 (#12094)
- Fix some typos (#12485)
- Deprecate
parse_int
in favor ofto_integer
(#12464) - update rustc (#12468)
- rename the
DataType
in the polars-arrow crate toArrowDataType
for clarity, preventing conflation with our own/nativeDataType
(#12459) - Replace outdated dev dependency
tempdir
(#12462) - move cov/corr to polars-ops (#12411)
- use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
- dprint/markdown link checker minor updates (#12409)
- replace as_u64 with dirty_hash (#12327)
- Fix ruff linting invocation (#12350)
- Rename
write_csv
parameterhas_header
toinclude_header
(#12351) - Build and verify Rust examples in docs (#12334)
- Fix some feature flags (#12325)
- Organize Cargo.toml (#12323)
- remove fxhash (#12322)
- Run rustfmt on doc examples (#12319)
- Consolidate "getting started" and "user guide" sections (#12246)
- deprecate
_saturating
in duration string language, make it the default (#12301) - simplify expr checking in predicate push down (#12287)
- Replace dev dependency
avro-rs
withapache-avro
(#12295) - Run
clippy
on all targets (#12293) - Add top-level
make clippy
, simplify Rust linting workflows (#12290) - ensure we git-ignore ALL
.venv
dirs (#12289) - incorrect super type for literals in nested binary exprs (#12238)
- remove unwrap from group_by (#12263)
- update object_store (#12006) (#12273)
- Remove recommended setting from IDE docs (#12275)
- Add feature flag for
list.eval
(#12254) - factor out some shared code in
truncate_impl
(#12229) - update Cargo.lock (#12226)
- Make all functions in string namespace non-anonymous (#12215)
- Rename
dt.seconds
todt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179) - use enum for Ambiguous (#12193)
- Standardize project name formatting across docs (#12185)
- Update
sqlparser
to0.39
(#12173) - pin ring (#12176)
- Refactor
FunctionExpr
module (#12162) - Fix tests for pyarrow 14 (#12170)
- Fix triggers for docs deployment (#12159)
- Make all functions in binary namespace non-anonymous (#12126)
- Consolidate contributing info (#12109)
- Fix typo in user-guide/expressions/plugins.md (#12115)
- Update CODEOWNERS (#12107)
- visualize plugin directory layout in user guide (#12092)
- Minor improvements to the docs website (#12084)
- reshape and repeat_by non-anoymous (#12064)
- upgrade zstd to 0.13 in
polars-parquet
(#12062) - Direct CONTRIBUTING to the docs website (#12042)
- inline parquet2 (#12026)
- remove parquet logic from
polars-arrow
and consolidate logic inpolars-parquet
crat...
Python Polars 0.19.14
🏆 Highlights
⚠️ Deprecations
- Rename DataFrame column index methods (#12542)
- Rename
Series.set_at_idx
toscatter
(#12540) - Deprecate
Series.view
(#12539) - Rename cumulative functions
cumsum -> cum_sum
and similar (#12513) - Rename
take
togather
(#12528) - Add dedicated horizontal aggregation methods to
DataFrame
(#12492) - Rename
take_every
togather_every
(#12531) - Deprecate
Series.inner_dtype
property (#12494) - Deprecate
parse_int
in favor ofto_integer
(#12464) - Deprecate DataType method
is_not
(#12458) - Deprecate Series methods
is_boolean
andis_utf8
(#12457) - Add
DataType.is_integer
and other dtype groups (#12200)
🚀 Performance improvements
- speed up parquet download of streaming engine (#12544)
- speed up cov/corr with SIMD + strength-reduction
~3x 0.19.13/ ~2x numpy
(#12471) - apply predicates and statistics of parquet files in streaming mode (#12439)
- use online algorithm for cov/corr
~2x
(#12412) - make 1D numpy to polars conversion zero-copy for numeric data (#12403)
✨ Enhancements
- Add dedicated horizontal aggregation methods to
DataFrame
(#12492) - support http scan_parquet (#12517)
- Add support for UTF-8 BOM option in
write_csv
andsink_csv
(#12253) - remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- more changes for versioned plugins (#12504)
- plugins add version and context (#12433)
- Add
DataType.is_integer
and other dtype groups (#12200) - include i128 in more primitive functions (#12413)
- write rolling functions as private expressions. (#12379)
🐞 Bug fixes
- fix incorrect ternary agg states (#12538)
- fix and improve ternary evaluation on groups (#12529)
- saturating sub in debug msg (#12525)
- fix panic when writing
Decimal
type to parquet (#12532) - pre-fefetch struct columns in async projection pd (#12514)
- rechunk cross join output in streaming (#12511)
- Ensure behaviour of
Series
comparison withtimedelta
matches that of other types (#12497) - fix as_list logical types (#12507)
- fix streaming cross join on empty df (#12491)
- dont overflow when calculating date range over very long periods (#12479)
- Allow append/zip_with/extend on local categoricals (#12369)
- Do not panic if time is invalid (#12466)
- ensure explicit "return_dtype" is respected by
map_dicts
(#12436) - empty csv no-raise (#12434)
- Fix
scan_csv
error type (#12355) - binary operations in aggregation context on literals (#12430)
- raw HTML output alignment was incorrect for dtype in header (#12422)
- update groups state after binary aggregation (#12415)
- Remove extra
\n
when reading file-like object wi… (#12333) - Issue correct
PolarsInefficientMapWarning
for lshift/rshift operations (#12385) - revert ternary special broadcast, ensure broadcast is always to max height (#12395)
- ensure first/last return null if empty (#12401)
🛠️ Other improvements
- fix and improve ternary evaluation on groups (#12529)
- Add
polars-ds
to list of community plugins (#12527) - Future-proof consortium standard test (#12524)
- add schema test (#12523)
- remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- add test for previous commit (#12510)
- Update
polars-hash
reference (#12505) - Add note on hash stability and mention
polars-hash
(#12496) - Support Python 3.12 (#12094)
- Improved
import polars
timing test; now much more consistent/reliable (#12478) - Use
.with_columns()
in all.list
namespace examples (#12475) - update rustc (#12468)
- Fix docs trigger (#12449)
- Update for new maturin release (#12437)
- Remove 'experimental' tag for auto-structify setting (#12435)
- make "DataFrame" and "Series" case more consistent across docs/comments/errors (#12428)
- dprint/markdown link checker minor updates (#12409)
- Use
manylinux_2_17
for buildingx86-64
wheel (#12408) - Use manylinux 2.24 instead of 2.28 for compatibility reasons (#12397)
- use with_columns in is_in example, and fix some bullet points not rendering (#12383)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @abstractqqq, @alexander-beedie, @c-peters, @cmdlineluser, @hirohira9119, @ion-elgreco, @jerome3o, @nameexhaustion, @reswqa, @ritchie46, @stinodego and @uchiiii
Python Polars 0.19.13
🏆 Highlights
- improve join performance through radix partitioned join (#12270)
⚠️ Deprecations
- Rename
write_csv
parameterhas_header
toinclude_header
(#12351) - Deprecate
_saturating
in duration string language, make it the default (#12301) - Switch args for
Decimal
and set defaultscale=0
(#12224) - Rename
dt.seconds
todt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179) - Deprecate
DataFrame.as_dict
positional input (#12131)
🚀 Performance improvements
- indexvec in group-by (#12371)
- Reduce allocations in hash join (#12368)
- Change concurrency parameters (#12321)
- Improve join performance through radix partitioned join (#12270)
- Remove extra multiplication in hash_to_partition (#12233)
- Allow non-power-of-two partitions (#12225)
- Reduce compute in error message for failed datetime parsing (#12147)
✨ Enhancements
- Updated
BytecodeParser
for Python 3.12 (#12348) - Add
round_sig_figs
expression for rounding to significant figures (#11959) - Change concurrency parameters (#12321)
- Deprecate
_saturating
in duration string language, make it the default (#12301) - Auto-infer
ambiguous
for truncate and round (#12204) - Allow construction of
Datetime
series fromdatetime.date
array (#12175) - New
Config
options for numeric formatting: digit grouping and thousands/decimal separator (#12099) - Allow non-aggregation predicate in ternary groupby (#12286)
- Add
name=
in.write_avro
to set schema name (#12255) - Update
write_delta
to write large arrow types without casting (#12260) - Add support for reading zstd compressed files (no-options) in read_csv (#12214)
- Start prefetching all files immediately (#12201)
- Expose more options to plugin registration (#12197)
- Add
.list.to_array
expression (#12192) - Consolidate & improve all casting failure error messages (#12168)
- Add Binary dtype to hypothesis tests (#12140)
- Tunable concurrency (#12171)
- Support reverse sort in streaming (#12169)
- Add
.arr.to_list
expression (#12136) - Support decimals in assert utils (#12119)
- Add concurrency budget (#12117)
- Improved support for use of file-like objects with
DataFrame
"write" methods (#12113) - Introduce ignore_nulls for str.concat (#12108)
🐞 Bug fixes
- Do not cast lit if has same dtype (#12342)
- Fix index column name of rolling/dynamic group by (#12365)
- Ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
UInt64
should be correctly extracted from python object (#12338)- Ignore IDE-mediated DeprecationWarning when debugging tests under 3.12 (#12343)
- expr_output_name include literal (#12335)
- Fix Decimal dtype table repr (#12318)
- Fix behavior of month intervals in
date_range
(#12317) - Scan empty csv miss row_count (#12316)
- zip_with also broadcast mask (#12309)
- respect hive_partitioning flag when dealing with multiple files (#12315)
- parquet, add row_count to empty file materialization (#12310)
- Fix invalid DeprecationWarning generated from
date_range
defined with 'saturating' interval (#12311) - fix download ranges in parquet (#12313)
- object store path derivation for local URL (#12308)
- don't move right endpoint of windows in rolling in default
offset==-period
case (#12267) - Raise more informative error on invalid
reshape
input (#12288) - incorrect super type for literals in nested binary exprs (#12238)
- typo in exception message (#12278)
- fix ambiguous aggregation type (#12269)
- return frames from
read_excel
in the originally specified order (#12243) - Consistently propagate nulls for
numpy
ufuncs (#12212) - respect return_scalar of list scalars (#12251)
- fix plugins system on Windows (#12230)
- potential overflow (#12206)
- always start a new thread if the thread is already blocking (#12202)
- with_row_count should block predicate push down for lazy csv (#12187)
- rechunk failed-list series before iterate (#12189)
- Fix interchange protocol boolean buffer size (#12177)
- fix incorrect desc sort behavior (#12141)
take
should block predicate pushdown (#12130)- use null type when read from unknown row (#12128)
- boundary predicate to block all accumulated predicates in push down (#12105)
- make python
schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045) - fix panic when initializing Series with array of list dtype (#12148)
- Fix schema of arr.min/max (#12127)
- ensure filter predicate inputs exist in schema (#12089)
- Update
null_count
after arithmetic (#12280)
🛠️ Other improvements
- Workaround for maturin issue (#12370)
- Fix incorrect boundary column name in
group_by_dynamic
docstrings (#12366) - Fix typo in
rolling_*
docstrings (#12362) - Fix ruff linting invocation (#12350)
- Clean up conversion utils (#11789)
- Organize Cargo.toml (#12323)
- Consolidate "getting started" and "user guide" sections (#12246)
- Minor updates to prepare for Python 3.12 support (#12314)
- Move script for testing map warning (#12306)
- simplify expr checking in predicate push down (#12287)
- Remove external link (#12223)
- Fix rebase issue breaking CI (#12296)
- Add top-level
make clippy
, simplify Rust linting workflows (#12290) - ensure we git-ignore ALL
.venv
dirs (#12289) - incorrect super type for literals in nested binary exprs (#12238)
- Remove recommended setting from IDE docs (#12275)
- Clean up Python test workflow (#12261)
- clarify contains selector (#12265)
- Add
py-polars
to Cargo workspace (#12256) - Use
.with_columns
in some docstrings (#12250) - Add test for
scan_csv
plusslice
(#12239) - Fix emphasis formatting in docstring (#12240)
- Fix emphasis formatting in docstring (#12237)
- add deprecation notices to the docs for expressions moved into the new
name
namespace (#12236) - update Cargo.lock (#12226)
- make sort test work with unstable sort (#12221)
- Build Python wheels on
manylinux_2_28
(#12211) - Include
rust-toolchain.toml
with sdist/wheels (#12184) - Standardize project name formatting across docs (#12185)
- Update
sqlparser
to0.39
(#12173) - pin ring (#12176)
- Improve
strip_{prefix, suffix}
&strip_chars_{start, end}
(#12161) - Fix tests for pyarrow 14 (#12170)
- Fix rendering of note in
DataFrame.fold
(#12164) - Fix triggers for docs deployment (#12159)
- Refactor some tests (#12121)
- Consolidate contributing info (#12109)
- Fix typo in user-guide/expressions/plugins.md (#12115)
- Render docstring text in single backticks as code (#12096)
- use more ergonomic syntax in select/with_columns where possible (#12101)
- Update CODEOWNERS (#12107)
- visualize plugin directory layout in user guide (#12092)
- Minor tweak in code example in section Expressions/Aggregation (#12033)
- Minor tweak in code example in section Expressions/Missing data (#12080)
- Minor improvements to the docs website (#12084)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @alexander-beedie, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jrycw, @mcrumiller, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego and @wsyxbcl