Releases: pola-rs/polars
Python Polars 0.19.13-rc.1
⚠️ Deprecations
- Deprecate
DataFrame.as_dict
positional input (#12131)
🚀 Performance improvements
- Reduce compute in error message for failed datetime parsing (#12147)
✨ Enhancements
- tunable concurrency (#12171)
- support reverse sort in streaming (#12169)
- Add
.arr.to_list
expression (#12136) - Support decimals in assert utils (#12119)
- add concurrency budget (#12117)
- improved support for use of file-like objects with
DataFrame
"write" methods (#12113) - Introduce ignore_nulls for str.concat (#12108)
🐞 Bug fixes
- fix incorrect desc sort behavior (#12141)
take
should block predicate pushdown (#12130)- use null type when read from unknown row (#12128)
- boundary predicate to block all accumulated predicates in push down (#12105)
- make python
schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045) - fix panic when initializing Series with array of list dtype (#12148)
- Fix schema of arr.min/max (#12127)
- ensure filter predicate inputs exist in schema (#12089)
🛠️ Other improvements
- pin ring (#12176)
- Improve
strip_{prefix, suffix}
&strip_chars_{start, end}
(#12161) - Fix tests for pyarrow 14 (#12170)
- Fix rendering of note in
DataFrame.fold
(#12164) - Fix triggers for docs deployment (#12159)
- Refactor some tests (#12121)
- Consolidate contributing info (#12109)
- Fix typo in user-guide/expressions/plugins.md (#12115)
- Render docstring text in single backticks as code (#12096)
- use more ergonomic syntax in select/with_columns where possible (#12101)
- Update CODEOWNERS (#12107)
- visualize plugin directory layout in user guide (#12092)
- Minor tweak in code example in section Expressions/Aggregation (#12033)
- Minor tweak in code example in section Expressions/Missing data (#12080)
- Minor improvements to the docs website (#12084)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Priyansh121096, @alexander-beedie, @dependabot, @dependabot[bot], @jrycw, @moritzwilksch, @nameexhaustion, @reswqa, @ritchie46, @stefmolin and @stinodego
Python Polars 0.19.12
⚠️ Deprecations
- Deprecate
nans_compare_equal
parameter in assert utils (#12019) - Rename
ljust
/rjust
topad_end
/pad_start
(#11975) - Deprecate
shift_and_fill
in favor ofshift
(#11955) - Deprecate
clip_min
/clip_max
in favor ofclip
(#11961)
🚀 Performance improvements
- improve parquet downloading (#12061)
- fix regression non-null asof join (#11984)
- drasticly improve performance of limit on async parquet datasets (#11965)
✨ Enhancements
- Add supertype for
List
/Array
(#12016) - enable eq and neq for array dtype (#12020)
- Expressify n of shift (#12004)
- add dedicated
name
namespace for operations that affect expression names (#11973) - optimize asof_join and allow null/string keys (#11712)
- limit concurrent downloads in async parquet (#11971)
- sample fraction can take an expr (#11943)
- Add
infer_schema_length
topl.read_json
(#11724)
🐞 Bug fixes
- Fix
get_index
/iteration forArray
types (#12047) - improved xlsx2csv defaults for
read_excel
(#12081) - str.concat on empty list (#12066)
- fix issue with invalid
Mapping
objects used as schema being silently ignored (#12027) - improve ingest from
numpy
scalar values (#12025) - binary agg should group aware if literal not a scalar (#12043)
- Use Arrow schema for file readers (#12048)
- Error on duplicates in hive partitioning (#12040)
- display fmt for str split (#12039)
- sum_horizontal should not always cast to int (#12031)
- fix apply_to_inner's dtype (#12010)
- Allow inexact checking of nested integers (#12037)
- Fix padding for non-ASCII strings (#12008)
- fix dot visualization of anonymous scans (#12002)
- SQL table aliases (#11988)
- fix streaming multi-column/multi-dtype sort (#11981)
- ensure streaming parquet datasets deal with limits (#11977)
- implement proper hash for identifier in cse (#11960)
- fix take return dtype in group context. (#11949)
- fix panic in format of anonymous scans (#11951)
- sql In should work without specific ops (#11947)
- construct list series from any values subject to dtype (#11944)
🛠️ Other improvements
- minor updates to lint-related dependencies (#12073)
- Add Excel page to user guide (#12055)
- Direct CONTRIBUTING to the docs website (#12042)
- Replace
black
byruff format
(#11996) - Further assert utils refactor (#12015)
- Remove stacklevels checker utility script (#11962)
- Disable type checking for
dataframe_api_compat
dependency (#11997) - Fix release tag (#11994)
- optimize asof_join and allow null/string keys (#11712)
- Add
Development
andReleases
sections to the documentation (#11932) - include the "build" dir when running
make clean
for docs (#11970) - make cloning
PyExpr
consistent (#11956) - fix take return dtype in group context. (#11949)
- warn about scan_pyarrow_dataset's limitations and suggest scan_parquet instead (if possible) (#11952)
- Add
set_fmt_table_cell_list_len
to API docs (#11942)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Rohxn16, @alexander-beedie, @braaannigan, @brayanjuls, @messense, @nameexhaustion, @orlp, @reswqa, @ritchie46, @squnit, @stinodego and @universalmind303
Rust Polars 0.34.0
🏆 Highlights
- postfix
rolling
expression as a special case of window functions. (#11445) - support 'hive partitioning' aware readers (#11284)
💥 Breaking changes
- Rename
.list.lengths
and.str.lengths
(#11613) - Rename
write_csv
parameterquote
toquote_char
(#11583) - Add
disable_string_cache
(#11020)
🚀 Performance improvements
- fix regression non-null asof join (#11984)
- drasticly improve performance of limit on async parquet datasets (#11965)
- support multiple files in a single scan parquet node. (#11922)
- fix accidental quadratic behavior; cache null_count (#11889)
- fix quadratic behavior in append sorted check (#11893)
- properly push down slice before left/asof join (#11854)
- Improve performance of
cot
(cotangent) (#11717) - rechunk before grouping on multiple keys (#11711)
- process parquet statistics before downloading row-group (#11709)
- push down predicates that refer to group_by keys (#11687)
- slightly faster float equality (#11652)
- actually use projection information in async parquet reader (#11637)
- improve performance and fix panic in async parquet reader (#11607)
- use try_binary_elementwise over try_binary_elementwise_values (#11596)
- skip empty chunks in concat (#11565)
- improve sparse sample performance (#11544)
- early return in replace_time_zone if target and source time zones match (#11478)
- greatly improve parquet cloud reading (#11479)
- ensure we download row-groups concurrently. (#11464)
- don't load N metadata files when globbing N files (#11422)
- remove double memcopy (#11365)
- adress perf regression (#11354)
- improve dynamic_groupby_iter (#11341)
- improve and fix rolling windows by linear scanning (#11326)
- improve outer join materialization (#11241)
- use ryu and itoa for primitive serialization (#11193)
- use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
- Using cache for str.contains regex compilation (#11183)
✨ Enhancements
- optimize asof_join and allow null/string keys (#11712)
- limit concurrent downloads in async parquet (#11971)
- sample fraction can take an expr (#11943)
- Add
infer_schema_length
topl.read_json
(#11724) - improve error handling in scan_parquet and deal with file limits (#11938)
- support multiple files in a single scan parquet node. (#11922)
- error instead of panic in unsupported sinks (#11915)
- Introduce list.sample (#11845)
- don't require empty config for cloud scan_parquet (#11819)
- Expressify pct_change and move to ops (#11786)
- add
DATE
function for SQL (#11541) - right-align numeric columns (#7475)
- Add config setting to control how many List items are printed (#11409)
- allow specifying schema in
pl.scan_ndjson
(#10963) - easier arrow2/arrow-rs conversion (#11666)
- support multiple sources in scan_file (#11661)
- allow coalesce in streaming (#11633)
- Implement
schema
,schema_override
forpl.read_json
with array-like input (#11492) - add SQL support for
UNION [ALL] BY NAME
, add "diagonal_relaxed" strategy forpl.concat
(#11597) - improve performance and fix panic in async parquet reader (#11607)
- add time_unit argument to duration, default to "us" (#11586)
- elide overflow checks on i64 (#11563)
- add
INITCAP
string function for SQL (#9884) - Use IPC for (un)pickling dataframes/series (#11507)
- support left and right anti/semi joins from the SQL interface (#11501)
- expressify peak_min/peak_max (#11482)
IN(subquery)
and SQL Subquery Infrastructure (#11218)- Format null arrays in Series (#11289)
- postfix
rolling
expression as a special case of window functions. (#11445) - allow for "by" column to be of dtype Date in rolling_* functions (#11004)
- support 'abfss' for azure (#11413)
- multi-threaded async runtime (#11411)
- async parquet. (#11403)
- fail fast when invalid cloud settings; introduce retries arg (#11380)
- modernize CPU features (#11351)
- introduce 'label' instead of 'truncate' in group_by_dynamic, which can take
label='right'
(#11337) - Expressify list.shift (#11320)
- add gather_skip_nulls implementation (#11329)
- top_k and bottom_k supports pass an expr (#11344)
- support 'hive partitioning' aware readers (#11284)
- str.strip_chars supports take an expr argument (#11313)
- sample n can take an expr (#11257)
- Add
disable_string_cache
(#11020) - clip supports expr arguments and physical numeric dtype (#11288)
- Introduce list.drop_nulls (#11272)
- str.splitn and split_exact can take an expr argument by (#11275)
- introduce ambiguous option for dt.round (#11269)
- improve binary helper so we don't need to rechunk. (#11247)
- Adds
NULLIF
andCOALESCE
SQL functions (#11124) - better
tree-formatting
representation (#11176) - Support
duration + date
(#11190) - binary search and rechunk in chunked gather (#11199)
- Expressify str.strip_prefix & suffix (#11197)
- sql udfs (#10957)
- run cloud parquet reader in default engine (#11196)
- list.join's separator can be expression (#11167)
- argument every of datetime.truncate can be expression (#11155)
🐞 Bug fixes
- fix streaming multi-column/multi-dtype sort (#11981)
- ensure streaming parquet datasets deal with limits (#11977)
- implement proper hash for identifier in cse (#11960)
- fix take return dtype in group context. (#11949)
- sql In should work without specific ops (#11947)
- construct list series from any values subject to dtype (#11944)
- avoid integer overflow in offsets_to_groups when bigidx is enabled (#11901)
read_csv
for empty lines (#11924)- predicate push-down remove predicate refers to alias for more branch (#11887)
- use physcial append (#11894)
- recursively apply
cast_unchecked
in lists (#11884) - recursively check allowed streaming dtypes (#11879)
- fix project pushdown for double projection contains count (#11843)
- series.to_numpy fails with dtype=Null (#11858)
- panic on hive scan from cloud (#11847)
- Propagate validity when cast primitive to list (#11846)
- Edge cases for list count formatting (#11780)
- remove flag inconsistency 'map_many' (#11817)
- ensure projections containing only hive columns are projected (#11803)
- patch broken aHash AES intrinsics on ARM (#11801)
- fix key in object-store cache (#11790)
- handle logical types in plugins (#11788)
- make
PyLazyGroupby
reusable (#11769) - only exclude final output names of group_by key expressions (#11768)
- fix ambiguity wrt list aggregation states (#11758)
- Correctly process subseconds in
pl.duration
(#11748) - LazyFrame.drop_columns overflow issue when columns.len()>schema.len() (#11716)
- index_to_chunked_index's fast path is not correct (#11710)
- use actual number of read rows for hive materialization (#11690)
- return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
- fix seg fault in concat_str of empty series (#11704)
- Fix match on last item for
join_asof
withstrategy="nearest"
(#11673) - fix display str for peak_max and top_k (#11657)
- Fix input replacement logic for slice (#11631)
- slice expr can be taken in cse (#11628)
- ensure nested logical types are converted to physical (#11621)
- correctly convert nullability of nested parquet fields to arrow (#11619)
- improve performance and fix panic in async parquet reader (#11607)
- expand all literals before group_by (#11590)
- mark take_group_last function as unsafe (#11587)
- handle unary operators applied to numbers used in SQL
IN
clauses (#11574) - Align new_columns argument for
scan_csv
andread_csv
(#11575) - don't conflate supported UNION ops in the SQL parser with (currently) unsupported UNION "BY NAME" variations (#11576)
- incomplete reading of list types from parquet (#11578)
- respect identity in horizontal sum (#11559)
- bug in BitMask::get_u32 (#11560)
- take slice into account in parallel unions (#11558)
- correct schema empty df in hive partitioning read (#11557)
- ensure ListChunked::full_null uses physical types (#11554)
- respect 'hive_partitioning' argument in parquet (#11551)
- fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
- streamline
is_in
handling of mismatched dtypes and fix a minor regression (#11533) - catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
- rework SQL join constraint processing to properly account for all
USING
columns (#11518) - literal hash (#11508)
- Fix lazy schema for
cut
/qcut
whenallow_breaks=True
(#11287) - correct output schema of hive partition and projection at scan (#11499)
- correct projection pushdown in hive partitioned read (#11486)
- fix for
write_csv
when using non-default "quote" char (#11474) - fix deserialization of parquets with large string list columns causing stack overflow (#11471)
- Fix SQL
ANY
andALL
behaviour (#10879) - address multiple issues caused by implicit casting of
is_in
values to the column dtype being searched (#11427) - raise on invalid sort_by group lengths (#11423)
- fix outer join on bools (#11417)
- fix categorical collect (#11414)
- Free bitmap when slicing into a non-null array (#11405)
- async parquet. (#11403)
- Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
- Fix empty check when building a list (#11378)
- more cloud urls (#11361)
- ensure cloud globbing can deal with spaces (#11360)
- recognize more cloud urls (#11357)
- Fix
Series.__contains__
for None values and implementis_in
for null Series (#11345) - don't panic on multi-nodes in streaming conversion (#11343)
- ensure trailing quote is written for temporal data when CSV
quote_style
is non-numeric (#11328) - fix empty Series construction edge-case with Struct dtype (#11301)
- add missing feature flags on test...
Python Polars 0.19.12-rc.1
⚠️ Deprecations
- Deprecate
shift_and_fill
in favor ofshift
(#11955) - Deprecate
clip_min
/clip_max
in favor ofclip
(#11961)
🚀 Performance improvements
- fix regression non-null asof join (#11984)
- drasticly improve performance of limit on async parquet datasets (#11965)
✨ Enhancements
- optimize asof_join and allow null/string keys (#11712)
- limit concurrent downloads in async parquet (#11971)
- sample fraction can take an expr (#11943)
- Add
infer_schema_length
topl.read_json
(#11724)
🐞 Bug fixes
- fix streaming multi-column/multi-dtype sort (#11981)
- ensure streaming parquet datasets deal with limits (#11977)
- implement proper hash for identifier in cse (#11960)
- fix take return dtype in group context. (#11949)
- fix panic in format of anonymous scans (#11951)
- sql In should work without specific ops (#11947)
- construct list series from any values subject to dtype (#11944)
🛠️ Other improvements
- optimize asof_join and allow null/string keys (#11712)
- Add
Development
andReleases
sections to the documentation (#11932) - include the "build" dir when running
make clean
for docs (#11970) - make cloning
PyExpr
consistent (#11956) - fix take return dtype in group context. (#11949)
- warn about scan_pyarrow_dataset's limitations and suggest scan_parquet instead (if possible) (#11952)
- Add
set_fmt_table_cell_list_len
to API docs (#11942)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Rohxn16, @alexander-beedie, @messense, @orlp, @reswqa, @ritchie46, @squnit and @stinodego
Python Polars 0.19.11
⚠️ Deprecations
🚀 Performance improvements
- support multiple files in a single scan parquet node. (#11922)
✨ Enhancements
- improve error handling in scan_parquet and deal with file limits (#11938)
- support multiple files in a single scan parquet node. (#11922)
- error instead of panic in unsupported sinks (#11915)
- upcast int->float and date->datetime for certain Series comparisons (#11779)
🐞 Bug fixes
- avoid integer overflow in offsets_to_groups when bigidx is enabled (#11901)
read_csv
for empty lines (#11924)- raise suitable error on invalid predicates passed to
filter
method (#11928) - Fix
Array
data type initialization (#11907) - set null_count on categorical append (#11914)
- predicate push-down remove predicate refers to alias for more branch (#11887)
- address DataFrame construction error with lists of
numpy
arrays (#11905) - address issue with inadvertently shared options dict in
read_excel
(#11908) - raise a suitable error from
read_excel
and/orread_ods
when target sheet does not exist (#11906)
🛠️ Other improvements
- Fix typo in
read_excel
docstring (#11934) - Fix docstring for
diff
methods (#11921) - fix some typos and add polars-business to curated plugin list (#11916)
- add missing 'diagonal_relaxed' to
pl.concat
"how" param docstring signature (#11909)
Thank you to all our contributors for making this release possible!
@LaurynasMiksys, @alexander-beedie, @mcrumiller, @reswqa, @ritchie46, @romanovacca, @shenker, @stinodego and @uchiiii
Python Polars 0.19.10
⚠️ Deprecations
- Deprecate
DataType.is_nested
(#11844)
🚀 Performance improvements
- fix accidental quadratic behavior; cache null_count (#11889)
- fix quadratic behavior in append sorted check (#11893)
- optimise
read_database
Databricks queries made using SQLAlchemy connections (#11885) - properly push down slice before left/asof join (#11854)
✨ Enhancements
🐞 Bug fixes
- use physical append (#11894)
- Add
include_nulls
parameter toupdate
(#11830) - recursively apply
cast_unchecked
in lists (#11884) - recursively check allowed streaming dtypes (#11879)
- Frame slicing single column (#11825)
- fix project pushdown for double projection contains count (#11843)
- Propagate validity when cast primitive to list (#11846)
- Edge cases for list count formatting (#11780)
🛠️ Other improvements
- Further assert utils refactor (#11888)
- load 40x40 avatar from github and add loading=lazy attribute. (#11886)
- Fix Cargo warning for parquet2 dependency (#11882)
- Allow manual trigger for docs deployment (#11881)
- add section about plugins (#11855)
- fix incorrect example of valid time zones (#11873)
- fix typo in code example in section Expressions - Basic operators (#11848)
- Bump docs dependencies (#11852)
- add missing polars-ops tests to CI (#11859)
- Assert utils refactor (#11813)
Thank you to all our contributors for making this release possible!
@Walnut356, @alexander-beedie, @dannyvankooten, @dependabot, @dependabot[bot], @ewoolsey, @jrycw, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @rjthoen, @romanovacca and @stinodego
Python Polars 0.19.9
🏆 Highlights
- extend
filter
capabilities with new support for*args
predicates,**kwargs
constraints, and chained boolean masks (#11740)
⚠️ Deprecations
- Deprecate non-keyword args for
ewm
methods (#11804) - Deprecate
use_pyarrow
param forSeries.to_list
(#11784) - Rename
group_by_rolling
torolling
(#11761)
🚀 Performance improvements
- Improve
DataFrame.get_column
performance by ~35% (#11783) - rechunk before grouping on multiple keys (#11711)
- process parquet statistics before downloading row-group (#11709)
- push down predicates that refer to group_by keys (#11687)
- slightly faster float equality (#11652)
✨ Enhancements
- Expressify pct_change and move to ops (#11786)
- primitive kwargs in plugins (#11268)
- add
DATE
function for SQL (#11541) - Add config setting to control how many List items are printed (#11409)
- Use
OrderedDict
for schemas (#11742) - allow specifying schema in
pl.scan_ndjson
(#10963) - add support for "outer" mode to frame
update
method (#11688) - transparently support "qmark" parameterisation of SQLAlchemy queries in
read_database
(#11700) - support multiple sources in scan_file (#11661)
- support batched frame iteration over
read_database
queries (#11664) - column selector support for
DataFrame.melt
andLazyFrame.unnest
(#11662)
🐞 Bug fixes
- ensure projections containing only hive columns are projected (#11803)
- patch broken aHash AES intrinsics on ARM (#11801)
- fix key in object-store cache (#11790)
- handle logical types in plugins (#11788)
- Fix values printed by
assert_*_equal
AssertionError whenexact=False
(#11781) - make
PyLazyGroupby
reusable (#11769) - only exclude final output names of group_by key expressions (#11768)
- Fix subsecond parsing in timedelta conversions (#11759)
- fix ambiguity wrt list aggregation states (#11758)
- Correctly process subseconds in
pl.duration
(#11748) - use actual number of read rows for hive materialization (#11690)
- return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
- fix seg fault in concat_str of empty series (#11704)
- fix sort_by regression (#11679)
- Fix match on last item for
join_asof
withstrategy="nearest"
(#11673)
🛠️ Other improvements
- Bump lint dependencies (#11802)
- Minor updates to assertion utils and docstrings (#11798)
- Remove unused
_to_rust_syntax
util (#11795) - Minor tweak in code example in section Coming from Pandas (#11764)
- Fix Exception module paths (#11785)
- Rename
IntegralType
toIntegerType
(#11773) - more granular polars-ops imports (#11760)
- Link to
expand_selector
in user guide (#11722) - Add parametric test for
df.to_dict
/series.to_list
(#11757) - Minor fix in code example in section Coming from Pandas (#11745) (#11745)
- Move tests for
group_by_dynamic
into one module (#11741) - Update group_by_dynamic example (#11737)
- reorder pl.duration arguments (#11641)
- remove default features from some crates (#11680)
- *_horizontal dependent on reduce_expr to expression architecture (#11685)
- clarify that median is equivalent to the 50% percentile shown in
describe
metrics (#11694) - update rustc and fix future (#11696)
- Publish release after uploading assets (#11686)
- upgrade pyo3 to 0.20.0 (#11683)
- better align
help
command output following addition of some longer options (#11681) - sum_horizontal to expression architecture (#11659)
- add note about use of
polars-lts-cpu
for macOS x86-64/rosetta (#11660) - improve rank implementation, especially around nulls (#11651)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @cmdlineluser, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @rancomp, @reswqa, @ritchie46, @romanovacca, @sd2k, @stinodego, @svaningelgem and @thomasjpfan
Python Polars 0.19.8
🏆 Highlights
- Enable additional flags for x86-64 wheels (#11487)
⚠️ Deprecations
- Rename
.list.lengths
and.str.lengths
(#11613) - Deprecate default value for
radix
inparse_int
(#11615) - Rename
write_csv
parameterquote
toquote_char
(#11583)
🚀 Performance improvements
- actually use projection information in async parquet reader (#11637)
- improve performance and fix panic in async parquet reader (#11607)
- use try_binary_elementwise over try_binary_elementwise_values (#11596)
- skip empty chunks in concat (#11565)
- improve sparse sample performance (#11544)
✨ Enhancements
- Standardize error message format (#11598)
- allow coalesce in streaming (#11633)
- Implement
schema
,schema_override
forpl.read_json
with array-like input (#11492) - add SQL support for
UNION [ALL] BY NAME
, add "diagonal_relaxed" strategy forpl.concat
(#11597) - improve performance and fix panic in async parquet reader (#11607)
- add time_unit argument to duration, default to "us" (#11586)
- support
read_database
options passthrough to the underlying connection'sexecute
method (enables parameterised SQL queries, etc) (#11562) - elide overflow checks on i64 (#11563)
- add
INITCAP
string function for SQL (#9884)
🐞 Bug fixes
- Fix input replacement logic for slice (#11631)
- slice expr can be taken in cse (#11628)
- ensure nested logical types are converted to physical (#11621)
- correctly convert nullability of nested parquet fields to arrow (#11619)
- improve performance and fix panic in async parquet reader (#11607)
- normalize filepath in sink_parquet (#11605)
- parse time unit properly in pl.lit (#11573)
- expand all literals before group_by (#11590)
- fix as_dict with include_key=False for partition_by (#9865)
- mark take_group_last function as unsafe (#11587)
- handle unary operators applied to numbers used in SQL
IN
clauses (#11574) - Align new_columns argument for
scan_csv
andread_csv
(#11575) - Add initialization support for python Timedeltas (#11566)
- incomplete reading of list types from parquet (#11578)
- respect identity in horizontal sum (#11559)
- bug in BitMask::get_u32 (#11560)
- take slice into account in parallel unions (#11558)
- correct schema empty df in hive partitioning read (#11557)
- ensure ListChunked::full_null uses physical types (#11554)
- respect 'hive_partitioning' argument in parquet (#11551)
- fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
- streamline
is_in
handling of mismatched dtypes and fix a minor regression (#11533) - fix comparing tz-aware series with stdlib datetime (#11480)
- catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
- rework SQL join constraint processing to properly account for all
USING
columns (#11518)
🛠️ Other improvements
- Improved user guide for cloud functionality (#11646)
- Improve some docstrings (#11644)
- Disable clippy lint "too many arguments" for
py-polars
(#11616) - Make backwardfill and forwardfill function expr non-anonymous (#11630)
- Make all expr in dt namespace non-anonymous (#11627)
- Fix changelog for language-specific breaking changes (#11617)
- Make value_counts and unique_counts function expr non-anonymous (#11601)
- Make arg_min(max), diff in list namespace non-anonymous (#11602)
- Rename
write_csv
parameterquote
toquote_char
(#11583) - improve struct documentation (#11585)
- Remove
**kwargs
fromLazyFrame.collect()
(#11567) - use a generic consistent total ordering, also for floats (#11468)
- fix lints (#11555)
- Remove toolchain specification workaround (#11552)
- Trigger Python release from Actions workflow dispatch (#11538)
- Enable additional flags for x86-64 wheels (#11487)
Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @TheDataScientistNL, @alexander-beedie, @andysham, @c-peters, @jhorstmann, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @romanovacca, @stinodego and @svaningelgem
Python Polars 0.19.7
🏆 Highlights
- Postfix
rolling
expression as a special case of window functions. (#11445) - Use IPC for (un)pickling dataframes/series (#11507)
🚀 Performance improvements
- early return in replace_time_zone if target and source time zones match (#11478)
- greatly improve parquet cloud reading (#11479)
- ensure we download row-groups concurrently. (#11464)
✨ Enhancements
- support left and right anti/semi joins from the SQL interface (#11501)
- Add
left_on
andright_on
parameters todf.update
(#11277) - expressify peak_min/peak_max (#11482)
IN(subquery)
and SQL Subquery Infrastructure (#11218)- add ODBC connection string support to
read_database
(#11448) - postfix
rolling
expression as a special case of window functions. (#11445) - allow for "by" column to be of dtype Date in rolling_* functions (#11004)
- rework
ColumnFactory
to additionally support tab-complete forcol
in IPython (#11435)
🐞 Bug fixes
- literal hash (#11508)
- Fix lazy schema for
cut
/qcut
whenallow_breaks=True
(#11287) - correct output schema of hive partition and projection at scan (#11499)
- correct projection pushdown in hive partitioned read (#11486)
- fix for
write_csv
when using non-default "quote" char (#11474) - fix deserialization of parquets with large string list columns causing stack overflow (#11471)
- enable
read_database
fallback for Snowflake warehouses/connections that don't support Arrow resultsets (#11447) - Fix SQL
ANY
andALL
behaviour (#10879) - partially address some PyCharm tooltip/signature issues with decorated methods (#11428)
- address multiple issues caused by implicit casting of
is_in
values to the column dtype being searched (#11427)
🛠️ Other improvements
- minor changes in peak-min/max (#11491)
- align cloud url regex in rust and python (#11481)
- Test sdist before releasing (#11494)
- Unpin maturin version, fix release workflow (#11483)
- More release workflow refactor (#11472)
- Set some env vars for release (#11463)
- move
repeat_by
to polars-ops (#11461) - upgrade to nightly-10-02 (#11460)
- Update contributing guide to include memory requirement (#11458)
- add missing docs entry for rolling (#11456)
- use with_columns in shift examples (#11453)
- Add wheels as assets to GitHub release (#11452)
- Build more wheels for
polars-lts-cpu
/polars-u64-idx
(#11430)
Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @ritchie46, @romanovacca, @stinodego, @svaningelgem and Romano Vacca
Python Polars 0.19.6
🚀 Performance improvements
- don't load N metadata files when globbing N files (#11422)
🐞 Bug fixes
- raise on invalid sort_by group lengths (#11423)
- fix outer join on bools (#11417)
- fix categorical collect (#11414)
- fix opaque python reader schema (#11412)
- async parquet. (#11403)
- Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
- handle ambiguous datetimes in pl.lit (#11386)
- fix panic in hive read of booleans (#11376)
🛠️ Other improvements
- Split Python release into build / release jobs (#11421)
- Refactor Python release workflow (#11382)
- clarify use of "batch_size" for
read_database
(#11377) - large windows runner for release (#11370)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @bowlofeggs, @c-peters, @jonashaag, @orlp, @ritchie46 and @stinodego