Skip to content

Releases: pola-rs/polars

Python Polars 0.19.13-rc.1

01 Nov 22:33
5c72df9
Compare
Choose a tag to compare
Pre-release

⚠️ Deprecations

  • Deprecate DataFrame.as_dict positional input (#12131)

🚀 Performance improvements

  • Reduce compute in error message for failed datetime parsing (#12147)

✨ Enhancements

  • tunable concurrency (#12171)
  • support reverse sort in streaming (#12169)
  • Add .arr.to_list expression (#12136)
  • Support decimals in assert utils (#12119)
  • add concurrency budget (#12117)
  • improved support for use of file-like objects with DataFrame "write" methods (#12113)
  • Introduce ignore_nulls for str.concat (#12108)

🐞 Bug fixes

  • fix incorrect desc sort behavior (#12141)
  • take should block predicate pushdown (#12130)
  • use null type when read from unknown row (#12128)
  • boundary predicate to block all accumulated predicates in push down (#12105)
  • make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
  • fix panic when initializing Series with array of list dtype (#12148)
  • Fix schema of arr.min/max (#12127)
  • ensure filter predicate inputs exist in schema (#12089)

🛠️ Other improvements

  • pin ring (#12176)
  • Improve strip_{prefix, suffix} & strip_chars_{start, end} (#12161)
  • Fix tests for pyarrow 14 (#12170)
  • Fix rendering of note in DataFrame.fold (#12164)
  • Fix triggers for docs deployment (#12159)
  • Refactor some tests (#12121)
  • Consolidate contributing info (#12109)
  • Fix typo in user-guide/expressions/plugins.md (#12115)
  • Render docstring text in single backticks as code (#12096)
  • use more ergonomic syntax in select/with_columns where possible (#12101)
  • Update CODEOWNERS (#12107)
  • visualize plugin directory layout in user guide (#12092)
  • Minor tweak in code example in section Expressions/Aggregation (#12033)
  • Minor tweak in code example in section Expressions/Missing data (#12080)
  • Minor improvements to the docs website (#12084)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Priyansh121096, @alexander-beedie, @dependabot, @dependabot[bot], @jrycw, @moritzwilksch, @nameexhaustion, @reswqa, @ritchie46, @stefmolin and @stinodego

Python Polars 0.19.12

28 Oct 15:44
4c6cc4c
Compare
Choose a tag to compare

⚠️ Deprecations

  • Deprecate nans_compare_equal parameter in assert utils (#12019)
  • Rename ljust/rjust to pad_end/pad_start (#11975)
  • Deprecate shift_and_fill in favor of shift (#11955)
  • Deprecate clip_min/clip_max in favor of clip (#11961)

🚀 Performance improvements

  • improve parquet downloading (#12061)
  • fix regression non-null asof join (#11984)
  • drasticly improve performance of limit on async parquet datasets (#11965)

✨ Enhancements

  • Add supertype for List/Array (#12016)
  • enable eq and neq for array dtype (#12020)
  • Expressify n of shift (#12004)
  • add dedicated name namespace for operations that affect expression names (#11973)
  • optimize asof_join and allow null/string keys (#11712)
  • limit concurrent downloads in async parquet (#11971)
  • sample fraction can take an expr (#11943)
  • Add infer_schema_length to pl.read_json (#11724)

🐞 Bug fixes

  • Fix get_index/iteration for Array types (#12047)
  • improved xlsx2csv defaults for read_excel (#12081)
  • str.concat on empty list (#12066)
  • fix issue with invalid Mapping objects used as schema being silently ignored (#12027)
  • improve ingest from numpy scalar values (#12025)
  • binary agg should group aware if literal not a scalar (#12043)
  • Use Arrow schema for file readers (#12048)
  • Error on duplicates in hive partitioning (#12040)
  • display fmt for str split (#12039)
  • sum_horizontal should not always cast to int (#12031)
  • fix apply_to_inner's dtype (#12010)
  • Allow inexact checking of nested integers (#12037)
  • Fix padding for non-ASCII strings (#12008)
  • fix dot visualization of anonymous scans (#12002)
  • SQL table aliases (#11988)
  • fix streaming multi-column/multi-dtype sort (#11981)
  • ensure streaming parquet datasets deal with limits (#11977)
  • implement proper hash for identifier in cse (#11960)
  • fix take return dtype in group context. (#11949)
  • fix panic in format of anonymous scans (#11951)
  • sql In should work without specific ops (#11947)
  • construct list series from any values subject to dtype (#11944)

🛠️ Other improvements

  • minor updates to lint-related dependencies (#12073)
  • Add Excel page to user guide (#12055)
  • Direct CONTRIBUTING to the docs website (#12042)
  • Replace black by ruff format (#11996)
  • Further assert utils refactor (#12015)
  • Remove stacklevels checker utility script (#11962)
  • Disable type checking for dataframe_api_compat dependency (#11997)
  • Fix release tag (#11994)
  • optimize asof_join and allow null/string keys (#11712)
  • Add Development and Releases sections to the documentation (#11932)
  • include the "build" dir when running make clean for docs (#11970)
  • make cloning PyExpr consistent (#11956)
  • fix take return dtype in group context. (#11949)
  • warn about scan_pyarrow_dataset's limitations and suggest scan_parquet instead (if possible) (#11952)
  • Add set_fmt_table_cell_list_len to API docs (#11942)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Rohxn16, @alexander-beedie, @braaannigan, @brayanjuls, @messense, @nameexhaustion, @orlp, @reswqa, @ritchie46, @squnit, @stinodego and @universalmind303

Rust Polars 0.34.0

24 Oct 16:11
60adaef
Compare
Choose a tag to compare

🏆 Highlights

  • postfix rolling expression as a special case of window functions. (#11445)
  • support 'hive partitioning' aware readers (#11284)

💥 Breaking changes

  • Rename .list.lengths and .str.lengths (#11613)
  • Rename write_csv parameter quote to quote_char (#11583)
  • Add disable_string_cache (#11020)

🚀 Performance improvements

  • fix regression non-null asof join (#11984)
  • drasticly improve performance of limit on async parquet datasets (#11965)
  • support multiple files in a single scan parquet node. (#11922)
  • fix accidental quadratic behavior; cache null_count (#11889)
  • fix quadratic behavior in append sorted check (#11893)
  • properly push down slice before left/asof join (#11854)
  • Improve performance of cot (cotangent) (#11717)
  • rechunk before grouping on multiple keys (#11711)
  • process parquet statistics before downloading row-group (#11709)
  • push down predicates that refer to group_by keys (#11687)
  • slightly faster float equality (#11652)
  • actually use projection information in async parquet reader (#11637)
  • improve performance and fix panic in async parquet reader (#11607)
  • use try_binary_elementwise over try_binary_elementwise_values (#11596)
  • skip empty chunks in concat (#11565)
  • improve sparse sample performance (#11544)
  • early return in replace_time_zone if target and source time zones match (#11478)
  • greatly improve parquet cloud reading (#11479)
  • ensure we download row-groups concurrently. (#11464)
  • don't load N metadata files when globbing N files (#11422)
  • remove double memcopy (#11365)
  • adress perf regression (#11354)
  • improve dynamic_groupby_iter (#11341)
  • improve and fix rolling windows by linear scanning (#11326)
  • improve outer join materialization (#11241)
  • use ryu and itoa for primitive serialization (#11193)
  • use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
  • Using cache for str.contains regex compilation (#11183)

✨ Enhancements

  • optimize asof_join and allow null/string keys (#11712)
  • limit concurrent downloads in async parquet (#11971)
  • sample fraction can take an expr (#11943)
  • Add infer_schema_length to pl.read_json (#11724)
  • improve error handling in scan_parquet and deal with file limits (#11938)
  • support multiple files in a single scan parquet node. (#11922)
  • error instead of panic in unsupported sinks (#11915)
  • Introduce list.sample (#11845)
  • don't require empty config for cloud scan_parquet (#11819)
  • Expressify pct_change and move to ops (#11786)
  • add DATE function for SQL (#11541)
  • right-align numeric columns (#7475)
  • Add config setting to control how many List items are printed (#11409)
  • allow specifying schema in pl.scan_ndjson (#10963)
  • easier arrow2/arrow-rs conversion (#11666)
  • support multiple sources in scan_file (#11661)
  • allow coalesce in streaming (#11633)
  • Implement schema, schema_override for pl.read_json with array-like input (#11492)
  • add SQL support for UNION [ALL] BY NAME, add "diagonal_relaxed" strategy for pl.concat (#11597)
  • improve performance and fix panic in async parquet reader (#11607)
  • add time_unit argument to duration, default to "us" (#11586)
  • elide overflow checks on i64 (#11563)
  • add INITCAP string function for SQL (#9884)
  • Use IPC for (un)pickling dataframes/series (#11507)
  • support left and right anti/semi joins from the SQL interface (#11501)
  • expressify peak_min/peak_max (#11482)
  • IN(subquery) and SQL Subquery Infrastructure (#11218)
  • Format null arrays in Series (#11289)
  • postfix rolling expression as a special case of window functions. (#11445)
  • allow for "by" column to be of dtype Date in rolling_* functions (#11004)
  • support 'abfss' for azure (#11413)
  • multi-threaded async runtime (#11411)
  • async parquet. (#11403)
  • fail fast when invalid cloud settings; introduce retries arg (#11380)
  • modernize CPU features (#11351)
  • introduce 'label' instead of 'truncate' in group_by_dynamic, which can take label='right' (#11337)
  • Expressify list.shift (#11320)
  • add gather_skip_nulls implementation (#11329)
  • top_k and bottom_k supports pass an expr (#11344)
  • support 'hive partitioning' aware readers (#11284)
  • str.strip_chars supports take an expr argument (#11313)
  • sample n can take an expr (#11257)
  • Add disable_string_cache (#11020)
  • clip supports expr arguments and physical numeric dtype (#11288)
  • Introduce list.drop_nulls (#11272)
  • str.splitn and split_exact can take an expr argument by (#11275)
  • introduce ambiguous option for dt.round (#11269)
  • improve binary helper so we don't need to rechunk. (#11247)
  • Adds NULLIF and COALESCE SQL functions (#11124)
  • better tree-formatting representation (#11176)
  • Support duration + date (#11190)
  • binary search and rechunk in chunked gather (#11199)
  • Expressify str.strip_prefix & suffix (#11197)
  • sql udfs (#10957)
  • run cloud parquet reader in default engine (#11196)
  • list.join's separator can be expression (#11167)
  • argument every of datetime.truncate can be expression (#11155)

🐞 Bug fixes

  • fix streaming multi-column/multi-dtype sort (#11981)
  • ensure streaming parquet datasets deal with limits (#11977)
  • implement proper hash for identifier in cse (#11960)
  • fix take return dtype in group context. (#11949)
  • sql In should work without specific ops (#11947)
  • construct list series from any values subject to dtype (#11944)
  • avoid integer overflow in offsets_to_groups when bigidx is enabled (#11901)
  • read_csv for empty lines (#11924)
  • predicate push-down remove predicate refers to alias for more branch (#11887)
  • use physcial append (#11894)
  • recursively apply cast_unchecked in lists (#11884)
  • recursively check allowed streaming dtypes (#11879)
  • fix project pushdown for double projection contains count (#11843)
  • series.to_numpy fails with dtype=Null (#11858)
  • panic on hive scan from cloud (#11847)
  • Propagate validity when cast primitive to list (#11846)
  • Edge cases for list count formatting (#11780)
  • remove flag inconsistency 'map_many' (#11817)
  • ensure projections containing only hive columns are projected (#11803)
  • patch broken aHash AES intrinsics on ARM (#11801)
  • fix key in object-store cache (#11790)
  • handle logical types in plugins (#11788)
  • make PyLazyGroupby reusable (#11769)
  • only exclude final output names of group_by key expressions (#11768)
  • fix ambiguity wrt list aggregation states (#11758)
  • Correctly process subseconds in pl.duration (#11748)
  • LazyFrame.drop_columns overflow issue when columns.len()>schema.len() (#11716)
  • index_to_chunked_index's fast path is not correct (#11710)
  • use actual number of read rows for hive materialization (#11690)
  • return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
  • fix seg fault in concat_str of empty series (#11704)
  • Fix match on last item for join_asof with strategy="nearest" (#11673)
  • fix display str for peak_max and top_k (#11657)
  • Fix input replacement logic for slice (#11631)
  • slice expr can be taken in cse (#11628)
  • ensure nested logical types are converted to physical (#11621)
  • correctly convert nullability of nested parquet fields to arrow (#11619)
  • improve performance and fix panic in async parquet reader (#11607)
  • expand all literals before group_by (#11590)
  • mark take_group_last function as unsafe (#11587)
  • handle unary operators applied to numbers used in SQL IN clauses (#11574)
  • Align new_columns argument for scan_csv and read_csv (#11575)
  • don't conflate supported UNION ops in the SQL parser with (currently) unsupported UNION "BY NAME" variations (#11576)
  • incomplete reading of list types from parquet (#11578)
  • respect identity in horizontal sum (#11559)
  • bug in BitMask::get_u32 (#11560)
  • take slice into account in parallel unions (#11558)
  • correct schema empty df in hive partitioning read (#11557)
  • ensure ListChunked::full_null uses physical types (#11554)
  • respect 'hive_partitioning' argument in parquet (#11551)
  • fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
  • streamline is_in handling of mismatched dtypes and fix a minor regression (#11533)
  • catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
  • rework SQL join constraint processing to properly account for all USING columns (#11518)
  • literal hash (#11508)
  • Fix lazy schema for cut/qcut when allow_breaks=True (#11287)
  • correct output schema of hive partition and projection at scan (#11499)
  • correct projection pushdown in hive partitioned read (#11486)
  • fix for write_csv when using non-default "quote" char (#11474)
  • fix deserialization of parquets with large string list columns causing stack overflow (#11471)
  • Fix SQL ANY and ALL behaviour (#10879)
  • address multiple issues caused by implicit casting of is_in values to the column dtype being searched (#11427)
  • raise on invalid sort_by group lengths (#11423)
  • fix outer join on bools (#11417)
  • fix categorical collect (#11414)
  • Free bitmap when slicing into a non-null array (#11405)
  • async parquet. (#11403)
  • Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
  • Fix empty check when building a list (#11378)
  • more cloud urls (#11361)
  • ensure cloud globbing can deal with spaces (#11360)
  • recognize more cloud urls (#11357)
  • Fix Series.__contains__ for None values and implement is_in for null Series (#11345)
  • don't panic on multi-nodes in streaming conversion (#11343)
  • ensure trailing quote is written for temporal data when CSV quote_style is non-numeric (#11328)
  • fix empty Series construction edge-case with Struct dtype (#11301)
  • add missing feature flags on test...
Read more

Python Polars 0.19.12-rc.1

24 Oct 15:57
5a32aab
Compare
Choose a tag to compare
Pre-release

⚠️ Deprecations

  • Deprecate shift_and_fill in favor of shift (#11955)
  • Deprecate clip_min/clip_max in favor of clip (#11961)

🚀 Performance improvements

  • fix regression non-null asof join (#11984)
  • drasticly improve performance of limit on async parquet datasets (#11965)

✨ Enhancements

  • optimize asof_join and allow null/string keys (#11712)
  • limit concurrent downloads in async parquet (#11971)
  • sample fraction can take an expr (#11943)
  • Add infer_schema_length to pl.read_json (#11724)

🐞 Bug fixes

  • fix streaming multi-column/multi-dtype sort (#11981)
  • ensure streaming parquet datasets deal with limits (#11977)
  • implement proper hash for identifier in cse (#11960)
  • fix take return dtype in group context. (#11949)
  • fix panic in format of anonymous scans (#11951)
  • sql In should work without specific ops (#11947)
  • construct list series from any values subject to dtype (#11944)

🛠️ Other improvements

  • optimize asof_join and allow null/string keys (#11712)
  • Add Development and Releases sections to the documentation (#11932)
  • include the "build" dir when running make clean for docs (#11970)
  • make cloning PyExpr consistent (#11956)
  • fix take return dtype in group context. (#11949)
  • warn about scan_pyarrow_dataset's limitations and suggest scan_parquet instead (if possible) (#11952)
  • Add set_fmt_table_cell_list_len to API docs (#11942)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Rohxn16, @alexander-beedie, @messense, @orlp, @reswqa, @ritchie46, @squnit and @stinodego

Python Polars 0.19.11

22 Oct 20:44
80e860d
Compare
Choose a tag to compare

⚠️ Deprecations

  • Rename shift parameter from periods to n (#11923)
  • Fix Array data type initialization (#11907)

🚀 Performance improvements

  • support multiple files in a single scan parquet node. (#11922)

✨ Enhancements

  • improve error handling in scan_parquet and deal with file limits (#11938)
  • support multiple files in a single scan parquet node. (#11922)
  • error instead of panic in unsupported sinks (#11915)
  • upcast int->float and date->datetime for certain Series comparisons (#11779)

🐞 Bug fixes

  • avoid integer overflow in offsets_to_groups when bigidx is enabled (#11901)
  • read_csv for empty lines (#11924)
  • raise suitable error on invalid predicates passed to filter method (#11928)
  • Fix Array data type initialization (#11907)
  • set null_count on categorical append (#11914)
  • predicate push-down remove predicate refers to alias for more branch (#11887)
  • address DataFrame construction error with lists of numpy arrays (#11905)
  • address issue with inadvertently shared options dict in read_excel (#11908)
  • raise a suitable error from read_excel and/or read_ods when target sheet does not exist (#11906)

🛠️ Other improvements

  • Fix typo in read_excel docstring (#11934)
  • Fix docstring for diff methods (#11921)
  • fix some typos and add polars-business to curated plugin list (#11916)
  • add missing 'diagonal_relaxed' to pl.concat "how" param docstring signature (#11909)

Thank you to all our contributors for making this release possible!
@LaurynasMiksys, @alexander-beedie, @mcrumiller, @reswqa, @ritchie46, @romanovacca, @shenker, @stinodego and @uchiiii

Python Polars 0.19.10

20 Oct 20:48
eb469b4
Compare
Choose a tag to compare

⚠️ Deprecations

  • Deprecate DataType.is_nested (#11844)

🚀 Performance improvements

  • fix accidental quadratic behavior; cache null_count (#11889)
  • fix quadratic behavior in append sorted check (#11893)
  • optimise read_database Databricks queries made using SQLAlchemy connections (#11885)
  • properly push down slice before left/asof join (#11854)

✨ Enhancements

  • Introduce list.sample (#11845)
  • don't require empty config for cloud scan_parquet (#11819)

🐞 Bug fixes

  • use physical append (#11894)
  • Add include_nulls parameter to update (#11830)
  • recursively apply cast_unchecked in lists (#11884)
  • recursively check allowed streaming dtypes (#11879)
  • Frame slicing single column (#11825)
  • fix project pushdown for double projection contains count (#11843)
  • Propagate validity when cast primitive to list (#11846)
  • Edge cases for list count formatting (#11780)

🛠️ Other improvements

  • Further assert utils refactor (#11888)
  • load 40x40 avatar from github and add loading=lazy attribute. (#11886)
  • Fix Cargo warning for parquet2 dependency (#11882)
  • Allow manual trigger for docs deployment (#11881)
  • add section about plugins (#11855)
  • fix incorrect example of valid time zones (#11873)
  • fix typo in code example in section Expressions - Basic operators (#11848)
  • Bump docs dependencies (#11852)
  • add missing polars-ops tests to CI (#11859)
  • Assert utils refactor (#11813)

Thank you to all our contributors for making this release possible!
@Walnut356, @alexander-beedie, @dannyvankooten, @dependabot, @dependabot[bot], @ewoolsey, @jrycw, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @rjthoen, @romanovacca and @stinodego

Python Polars 0.19.9

17 Oct 19:22
8d29d3c
Compare
Choose a tag to compare

🏆 Highlights

  • extend filter capabilities with new support for *args predicates, **kwargs constraints, and chained boolean masks (#11740)

⚠️ Deprecations

  • Deprecate non-keyword args for ewm methods (#11804)
  • Deprecate use_pyarrow param for Series.to_list (#11784)
  • Rename group_by_rolling to rolling (#11761)

🚀 Performance improvements

  • Improve DataFrame.get_column performance by ~35% (#11783)
  • rechunk before grouping on multiple keys (#11711)
  • process parquet statistics before downloading row-group (#11709)
  • push down predicates that refer to group_by keys (#11687)
  • slightly faster float equality (#11652)

✨ Enhancements

  • Expressify pct_change and move to ops (#11786)
  • primitive kwargs in plugins (#11268)
  • add DATE function for SQL (#11541)
  • Add config setting to control how many List items are printed (#11409)
  • Use OrderedDict for schemas (#11742)
  • allow specifying schema in pl.scan_ndjson (#10963)
  • add support for "outer" mode to frame update method (#11688)
  • transparently support "qmark" parameterisation of SQLAlchemy queries in read_database (#11700)
  • support multiple sources in scan_file (#11661)
  • support batched frame iteration over read_database queries (#11664)
  • column selector support for DataFrame.melt and LazyFrame.unnest (#11662)

🐞 Bug fixes

  • ensure projections containing only hive columns are projected (#11803)
  • patch broken aHash AES intrinsics on ARM (#11801)
  • fix key in object-store cache (#11790)
  • handle logical types in plugins (#11788)
  • Fix values printed by assert_*_equal AssertionError when exact=False (#11781)
  • make PyLazyGroupby reusable (#11769)
  • only exclude final output names of group_by key expressions (#11768)
  • Fix subsecond parsing in timedelta conversions (#11759)
  • fix ambiguity wrt list aggregation states (#11758)
  • Correctly process subseconds in pl.duration (#11748)
  • use actual number of read rows for hive materialization (#11690)
  • return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
  • fix seg fault in concat_str of empty series (#11704)
  • fix sort_by regression (#11679)
  • Fix match on last item for join_asof with strategy="nearest" (#11673)

🛠️ Other improvements

  • Bump lint dependencies (#11802)
  • Minor updates to assertion utils and docstrings (#11798)
  • Remove unused _to_rust_syntax util (#11795)
  • Minor tweak in code example in section Coming from Pandas (#11764)
  • Fix Exception module paths (#11785)
  • Rename IntegralType to IntegerType (#11773)
  • more granular polars-ops imports (#11760)
  • Link to expand_selector in user guide (#11722)
  • Add parametric test for df.to_dict/series.to_list (#11757)
  • Minor fix in code example in section Coming from Pandas (#11745) (#11745)
  • Move tests for group_by_dynamic into one module (#11741)
  • Update group_by_dynamic example (#11737)
  • reorder pl.duration arguments (#11641)
  • remove default features from some crates (#11680)
  • *_horizontal dependent on reduce_expr to expression architecture (#11685)
  • clarify that median is equivalent to the 50% percentile shown in describe metrics (#11694)
  • update rustc and fix future (#11696)
  • Publish release after uploading assets (#11686)
  • upgrade pyo3 to 0.20.0 (#11683)
  • better align help command output following addition of some longer options (#11681)
  • sum_horizontal to expression architecture (#11659)
  • add note about use of polars-lts-cpu for macOS x86-64/rosetta (#11660)
  • improve rank implementation, especially around nulls (#11651)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @cmdlineluser, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @rancomp, @reswqa, @ritchie46, @romanovacca, @sd2k, @stinodego, @svaningelgem and @thomasjpfan

Python Polars 0.19.8

10 Oct 16:38
9524b29
Compare
Choose a tag to compare

🏆 Highlights

  • Enable additional flags for x86-64 wheels (#11487)

⚠️ Deprecations

  • Rename .list.lengths and .str.lengths (#11613)
  • Deprecate default value for radix in parse_int (#11615)
  • Rename write_csv parameter quote to quote_char (#11583)

🚀 Performance improvements

  • actually use projection information in async parquet reader (#11637)
  • improve performance and fix panic in async parquet reader (#11607)
  • use try_binary_elementwise over try_binary_elementwise_values (#11596)
  • skip empty chunks in concat (#11565)
  • improve sparse sample performance (#11544)

✨ Enhancements

  • Standardize error message format (#11598)
  • allow coalesce in streaming (#11633)
  • Implement schema, schema_override for pl.read_json with array-like input (#11492)
  • add SQL support for UNION [ALL] BY NAME, add "diagonal_relaxed" strategy for pl.concat (#11597)
  • improve performance and fix panic in async parquet reader (#11607)
  • add time_unit argument to duration, default to "us" (#11586)
  • support read_database options passthrough to the underlying connection's execute method (enables parameterised SQL queries, etc) (#11562)
  • elide overflow checks on i64 (#11563)
  • add INITCAP string function for SQL (#9884)

🐞 Bug fixes

  • Fix input replacement logic for slice (#11631)
  • slice expr can be taken in cse (#11628)
  • ensure nested logical types are converted to physical (#11621)
  • correctly convert nullability of nested parquet fields to arrow (#11619)
  • improve performance and fix panic in async parquet reader (#11607)
  • normalize filepath in sink_parquet (#11605)
  • parse time unit properly in pl.lit (#11573)
  • expand all literals before group_by (#11590)
  • fix as_dict with include_key=False for partition_by (#9865)
  • mark take_group_last function as unsafe (#11587)
  • handle unary operators applied to numbers used in SQL IN clauses (#11574)
  • Align new_columns argument for scan_csv and read_csv (#11575)
  • Add initialization support for python Timedeltas (#11566)
  • incomplete reading of list types from parquet (#11578)
  • respect identity in horizontal sum (#11559)
  • bug in BitMask::get_u32 (#11560)
  • take slice into account in parallel unions (#11558)
  • correct schema empty df in hive partitioning read (#11557)
  • ensure ListChunked::full_null uses physical types (#11554)
  • respect 'hive_partitioning' argument in parquet (#11551)
  • fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
  • streamline is_in handling of mismatched dtypes and fix a minor regression (#11533)
  • fix comparing tz-aware series with stdlib datetime (#11480)
  • catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
  • rework SQL join constraint processing to properly account for all USING columns (#11518)

🛠️ Other improvements

  • Improved user guide for cloud functionality (#11646)
  • Improve some docstrings (#11644)
  • Disable clippy lint "too many arguments" for py-polars (#11616)
  • Make backwardfill and forwardfill function expr non-anonymous (#11630)
  • Make all expr in dt namespace non-anonymous (#11627)
  • Fix changelog for language-specific breaking changes (#11617)
  • Make value_counts and unique_counts function expr non-anonymous (#11601)
  • Make arg_min(max), diff in list namespace non-anonymous (#11602)
  • Rename write_csv parameter quote to quote_char (#11583)
  • improve struct documentation (#11585)
  • Remove **kwargs from LazyFrame.collect() (#11567)
  • use a generic consistent total ordering, also for floats (#11468)
  • fix lints (#11555)
  • Remove toolchain specification workaround (#11552)
  • Trigger Python release from Actions workflow dispatch (#11538)
  • Enable additional flags for x86-64 wheels (#11487)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @TheDataScientistNL, @alexander-beedie, @andysham, @c-peters, @jhorstmann, @mcrumiller, @nameexhaustion, @orlp, @reswqa, @ritchie46, @romanovacca, @stinodego and @svaningelgem

Python Polars 0.19.7

04 Oct 12:31
4fce242
Compare
Choose a tag to compare

🏆 Highlights

  • Postfix rolling expression as a special case of window functions. (#11445)
  • Use IPC for (un)pickling dataframes/series (#11507)

🚀 Performance improvements

  • early return in replace_time_zone if target and source time zones match (#11478)
  • greatly improve parquet cloud reading (#11479)
  • ensure we download row-groups concurrently. (#11464)

✨ Enhancements

  • support left and right anti/semi joins from the SQL interface (#11501)
  • Add left_on and right_on parameters to df.update (#11277)
  • expressify peak_min/peak_max (#11482)
  • IN(subquery) and SQL Subquery Infrastructure (#11218)
  • add ODBC connection string support to read_database (#11448)
  • postfix rolling expression as a special case of window functions. (#11445)
  • allow for "by" column to be of dtype Date in rolling_* functions (#11004)
  • rework ColumnFactory to additionally support tab-complete for col in IPython (#11435)

🐞 Bug fixes

  • literal hash (#11508)
  • Fix lazy schema for cut/qcut when allow_breaks=True (#11287)
  • correct output schema of hive partition and projection at scan (#11499)
  • correct projection pushdown in hive partitioned read (#11486)
  • fix for write_csv when using non-default "quote" char (#11474)
  • fix deserialization of parquets with large string list columns causing stack overflow (#11471)
  • enable read_database fallback for Snowflake warehouses/connections that don't support Arrow resultsets (#11447)
  • Fix SQL ANY and ALL behaviour (#10879)
  • partially address some PyCharm tooltip/signature issues with decorated methods (#11428)
  • address multiple issues caused by implicit casting of is_in values to the column dtype being searched (#11427)

🛠️ Other improvements

  • minor changes in peak-min/max (#11491)
  • align cloud url regex in rust and python (#11481)
  • Test sdist before releasing (#11494)
  • Unpin maturin version, fix release workflow (#11483)
  • More release workflow refactor (#11472)
  • Set some env vars for release (#11463)
  • move repeat_by to polars-ops (#11461)
  • upgrade to nightly-10-02 (#11460)
  • Update contributing guide to include memory requirement (#11458)
  • add missing docs entry for rolling (#11456)
  • use with_columns in shift examples (#11453)
  • Add wheels as assets to GitHub release (#11452)
  • Build more wheels for polars-lts-cpu/polars-u64-idx (#11430)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @ritchie46, @romanovacca, @stinodego, @svaningelgem and Romano Vacca

Python Polars 0.19.6

29 Sep 19:18
dcd0229
Compare
Choose a tag to compare

🚀 Performance improvements

  • don't load N metadata files when globbing N files (#11422)

🐞 Bug fixes

  • raise on invalid sort_by group lengths (#11423)
  • fix outer join on bools (#11417)
  • fix categorical collect (#11414)
  • fix opaque python reader schema (#11412)
  • async parquet. (#11403)
  • Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
  • handle ambiguous datetimes in pl.lit (#11386)
  • fix panic in hive read of booleans (#11376)

🛠️ Other improvements

  • Split Python release into build / release jobs (#11421)
  • Refactor Python release workflow (#11382)
  • clarify use of "batch_size" for read_database (#11377)
  • large windows runner for release (#11370)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @bowlofeggs, @c-peters, @jonashaag, @orlp, @ritchie46 and @stinodego