Skip to content

Releases: pola-rs/polars

Python Polars 0.20.1

18 Dec 16:58
2f676fb
Compare
Choose a tag to compare

🐞 Bug fixes

  • repeat_by should not raise if by contains nulls (#13105)
  • [csv] raise on single quote char (#13104)
  • Raise if scan zstd compressed csv file (#13102)
  • allow timeunit-less dtype in pl.lit creation (#12997)
  • Don't check map length if input is literal (#13098)
  • rolling_quantile can get incorrect state (#13088)

🛠️ Other improvements

  • Fix column name in contains_any example (#13090)
  • update user-defined-functions for 0.19.x (#13071)
  • Fix some links, and make map_batches warning more evident (#13081)
  • Linting updates (#13069)
  • take pl.concat out of StringCache context manager in "mismatched string cache" error message (#13076)
  • add Enum to dtype list (#13080)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @mcrumiller, @reswqa, @ritchie46 and @stinodego

Python Polars 0.20.0

16 Dec 15:31
f96d2cd
Compare
Choose a tag to compare

This version includes quite a few breaking changes. We are preparing for the 1.0 release and aim to make the upgrade from 0.20 to 1.0 as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with 1.0.

Check out the upgrade guide for help navigating the upgrade to this version.

Please bear with us while we continue to make Polars the best tool it can be!

🏆 Highlights

  • Add new Enum categorical data type which allows a fixed set of categories (#11822)

💥 Breaking changes

  • Use Object Store instead of fsspec for read_parquet (#13044)
  • Reimplement replace expression on the Rust side (#13002)
  • Preserve left and right join keys in outer joins (#12963)
  • Update update signature (#12986)
  • Update Expr.count to ignore null values by default (#12934)
  • Scheduled removal of previously deprecated functionality (#12885)
  • Allow all DataType objects to be instantiated (#12470)
  • Change value_counts resulting column name from counts to count (#12506)
  • Change default join behavior with regard to nulls, add join_nulls parameter to keep existing behavior (#12840)
  • Default to exact checking for integers in assertion utils (#12331)
  • Set default dtype for Series to Null when no data is present (#12807)
  • Update lit behavior for list/tuple inputs (#12559)
  • Change DataType.is_nested from property to classmethod (#12453)
  • Update constructors for Array and Decimal (#12837)
  • Smaller integer data types for datetime components (#12070)
  • Fix NaN ordering to make NaNs compare greater than any other float, and equal to themselves (#12721)

⚠️ Deprecations

  • Rename write_database parameter if_exists to if_table_exists (#12783)

🚀 Performance improvements

  • Avoid dispatching to expression engine for various Series methods (#13010)
  • Elide allocation in outer join materialization (#12992)
  • Avoid dispatching Series.head/tail to the expression engine (#12946)
  • Ensure we reduce for any/all_horizontal (#12976)
  • Add fast paths for UTC in truncate (#12965)
  • Use select_seq for expression dispatch (#12962)
  • Improve rolling_median algorithm (#12704)
  • Use fast path for non-null data in new SQL-like null matching (#12874)
  • Optimize DataFrame.iter_rows for smaller buffer sizes (#12804)
  • Speed up initializing Series from a list of NumPy arrays (#12785)

✨ Enhancements

  • Add str.contains_any and str.replace_many (Aho-Corasick algorithms) (#13073)
  • Auto-infer credentials from .aws folder (#13062)
  • Support private cloud S3 storage in scan_parquet (#13060)
  • Use Object Store instead of fsspec for read_parquet (#13044)
  • Avoid dispatching to expression engine for various Series methods (#13010)
  • Allow order operators (<,>,>=,<=) on Enum types (#12982)
  • Reimplement replace expression on the Rust side (#13002)
  • Expand set of NumPy functions which emit inefficient map_* warning (#13039)
  • Use tokio semaphore for concurrency handling (#13026)
  • Improve and expressify hist (#13014)
  • Update describe to use new count implementation (#12990)
  • Add default to_struct Series name consistent with the usual default Series name (empty string) (#12998)
  • Preserve left and right join keys in outer joins (#12963)
  • Clarify "inefficient map_elements" warning message (#12978)
  • Allow end before start in date/time_range (#12964)
  • Update update signature (#12986)
  • Minor update to Array data type repr (#12973)
  • Implement group-tuples for Null dtype (#12975)
  • Cast to an enum from int (#12954)
  • Move categorical ordering into dtype (#12911)
  • Avoid importing interchange module by default (#12927)
  • Update Expr.count to ignore null values by default (#12934)
  • Raise if expression passed as scalar to DataFrame constructor (#12916)
  • Update repr of Struct data type class (#12922)
  • Enable partial predicate pushdown past window expressions (#12710)
  • Add merge mode to write_delta and remove pyarrow to delta conversions (#12392)
  • Add str.reverse (#12878)
  • Allow all DataType objects to be instantiated (#12470)
  • Specific performance warnings from Rust to Python (#12802)
  • Change value_counts resulting column name from counts to count (#12506)
  • Implement std and var for Duration columns (#12865)
  • Change default join behavior with regard to nulls, add join_nulls parameter to keep existing behavior (#12840)
  • Enhance write_database return (indicate the number of rows affected by the operation) (#12830)
  • Add dedicated Decimal selector (#12852)
  • Preserve base dtype when raising to UInt power (#10446)
  • Default to exact checking for integers in assertion utils (#12331)
  • Improve __repr__ implementation for Expr (#12770)
  • Support SQL subqueries for JOIN and FROM (#12819)

🐞 Bug fixes

  • Fix off-by-one error in quantile(method="nearest") (#13058)
  • Fix incorrect schema inference on nested columns (#13057)
  • Don't raise for datetime_range if starting on ambiguous datetime and earliest was specified (#13050)
  • Parse json_decode per max buffer length (#13029)
  • Parse 00:00 time zone as UTC (#13034)
  • Fix timeout errors in concurrent downloads (#13023)
  • Streamline align_frames and fix edge-case where the identical frame object appears more than once (#13007)
  • Fix SQL substring indexing (#13016)
  • Allow broadcasting in ranges (#11900)
  • Prevent deadlock in sink_csv (#12991)
  • Don't get mutable if buffer is sliced (#12979)
  • Support parameterized read_database calls against cursors that only take positional args (#12967)
  • Fix truncate when truncating by multiple weeks (#12948)
  • Fix segfault / memory corruption after plugins return Err result (#12953)
  • Raise a proper python typed exception when IO writers try to write to an non existent folder (#12936)
  • Don't panic when ambiguous parameter is not Utf8 (#12913)
  • Raise a proper python typed exception when the CSV writer tries to write to an non existent folder (#12919)
  • Patch rolling_var/rolling_std numerical stability (#12909)
  • Fix incorrect Int16 min/max due to incorrect SIMD mask construction (#12908)
  • Improve handling of decimal conversion with to_numpy in the absence of pyarrow (#12888)
  • Fix OOB error in list set operations on empty frame (#12845)
  • Fix error message for uninstantiated Enum types (#12886)
  • Fix repr of Expr.gather (which was still showing deprecated take) (#12864)
  • Fix Array dtype equality (#12853)
  • Fix nan_min/max incorrectly aggregating chunks with addition (#12848)
  • Revert type hint change on expression inputs (#12792)
  • More accurate type hinting for collect_all functions (#12796)
  • Use total float ordering in is_in (#12800)
  • Handle aggregation for all-NaN groups in group_by (#12304)

🛠️ Other improvements

  • Update version switcher for 0.20 (#12844)
  • Add upgrade guide for Python Polars 0.20 (#12872)
  • Run doctests before other tests (#13047)
  • Update describe calculation of min/max (#13027)
  • Minor typo fix (#13003)
  • Resolve two interchange tests failing locally (#12999)
  • Update outdated links to API in Expressions/Functions page (#12981)
  • Expand docstrings for count (#12960)
  • Fix issue with docs for group_by_dynamic (#12906)
  • Prefer explicit --no-cov flag for py3.12/ubuntu test workflow (vs implicit/omitted) (#12889)
  • Scheduled removal of previously deprecated functionality (#12885)
  • Fix references in deprecation notes (#12877)
  • Fix typo in hash docstring (#12879)
  • Fix docstring for deprecated list.take (#12873)
  • Note that list.take is deprecated (#12867)
  • Fix failing tests (#12859)
  • Add quotes to pip install with dependencies (#12799)
  • Fix parameter name reference in update docstring #12797

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Object905, @Yerachmiel-Feltzman, @alexander-beedie, @c-peters, @ion-elgreco, @jankislinger, @mcrumiller, @nameexhaustion, @oli-clive-griffin, @orlp, @rancomp, @ritchie46, @romanovacca, @stinodego and @xuestrange

Python Polars 0.19.19

01 Dec 19:21
Compare
Choose a tag to compare

✨ Enhancements

  • Parquet support required deltabyte encoding (#12836)

🐞 Bug fixes

  • Fix incorrect values from parquet RLE decoding (#12818)
  • Write only one dict page per row rowgroup (#12831)

Thank you to all our contributors for making this release possible!
@nameexhaustion, @ritchie46 and @stinodego

Python Polars 0.19.18

29 Nov 17:39
d3ecfe1
Compare
Choose a tag to compare

✨ Enhancements

  • support nested null in vstack/append/extend/concat (#12771)
  • Improve error messages on attempted Arrow conversions involving incompatible/unknown dtypes (#12421)
  • determine mode parallelism depending on current tasks (#12764)
  • enable slice push down past with_columns (#12742)
  • Improve write_database, accounting for latest adbc fixes/updates (#12713)

🐞 Bug fixes

  • don't use streaming engine if aggregate is unknown (#12769)
  • Enable special casing of sequence in list_to_struct (#12759)
  • hold align_chunks_invariant (#12738)
  • allow leading zero and plus in integer parsing (#12744)
  • csv lines iter, always return remainder (#12739)
  • fix oob in set operations (#12736)
  • undo regression in ability to read certain parquet files (#12731)

🛠️ Other improvements

  • Use latest atoi_simd release (#12748)
  • Fix invalid references to xlsx2csv dependency (#12741)
  • Remove pinned aiohttp dependency (#12733)

Thank you to all our contributors for making this release possible!
@0siride, @PierreAttard, @RoDmitry, @alexander-beedie, @dependabot, @dependabot[bot], @eitsupi, @kszlim, @nameexhaustion, @orlp, @ritchie46 and @stinodego

Python Polars 0.19.17

27 Nov 13:24
38d016b
Compare
Choose a tag to compare

✨ Enhancements

  • Automatically wrap NumPy array as lit (#12709)
  • Add DataFrame.iter_columns (#12653)
  • favour showing "adbc_driver_manager" over "adbc_driver_sqlite" in show_versions (#12690)

🐞 Bug fixes

  • corr return nan if denominator is invalid (#12708)
  • parquet decimal statistics and schema (#12705)
  • support append/extend with null series (#11824) (#12686)
  • address a numpy ndarray init regression (#12701)
  • fix carrying over infinity into other windows (#12685)

🛠️ Other improvements

  • Update URI prefix in examples (prefer "postgresql" to "postgres") (#12707)
  • now that scan_parquet supports hive partitioning, remove note pointing to scan_pyarrow_dataset (#12706)
  • Minor docstring fixes (#12688)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @c-peters, @ritchie46, @stinodego and @tkarabela

Python Polars 0.19.16

25 Nov 12:39
de2a5ef
Compare
Choose a tag to compare

⚠️ Deprecations

  • Rename series_equal/frame_equal to equals (#12618)
  • Rename map_dict to replace and change default behavior (#12599)

🚀 Performance improvements

  • order(s) of magnitude speedup when initialising List dtype Series from 2D numpy array (#12672)
  • improve merge_local_rhs_categorical traversal (#12660)
  • make values_size estimate correct for sliced arrays (#12658)
  • improve parquet utf8 validation (#12655)
  • parquet pre-allocate buffer in binary plain encode (#12652)
  • optimize dict binary decoding in parquet (#12648)
  • ensure we only check the values within bounds (#12633)
  • parquet; elide recursion in hot path (#12625)
  • improve cov/corr algorithm (#12590)

✨ Enhancements

  • Join operations on local categoricals (#12657)
  • Implement PySeries.from_buffer for boolean buffers (#12654)
  • Implement PySeries.from_buffer for numeric types (#12646)
  • use RLE_DICTIONARY for integers in parquet (#12647)
  • extend recent filter syntax upgrades to when/then construct (#12603)
  • implement RLE_DICT encoding for utf8/binary columns (reduced parquet file size) (#12623)
  • implement 'DeltaByteArray' decoding for parquet (#12602)

🐞 Bug fixes

  • json null inference (#12677)
  • cov/corr respect f32 type (#12676)
  • fix ternary zip_with null broadcast (#12668)
  • support negative slice on eager frame (#12644)
  • fix concurrency budget assertion (#12641)
  • fix oob in set operations (#12640)
  • panic reading parquet nested struct column (#12614)
  • Fix deprecation message for DataFrame.sum (#12619)
  • features: performant,lazy,random (#12600)

🛠️ Other improvements

  • Use range instead of np.arange in constructors (#12621)
  • update custom allocator instructions to include macOS (#12593)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @cardoso, @dmitrybugakov, @nameexhaustion, @orlp, @ritchie46 and @stinodego

Python Polars 0.19.15

20 Nov 14:33
2adc669
Compare
Choose a tag to compare

⚠️ Deprecations

  • Rename str.json_extract to str.json_decode (#12586)

🚀 Performance improvements

  • apply left side predicate pushdown also to right side on semi join (#12565)
  • ensure streaming parquet download remains concurrent ~7x (#12552)

✨ Enhancements

  • warn if by column is not sorted in rolling aggregations (as opposed to raising), add warn_if_unsorted argument (#12398)
  • struct -> json encoding expression (#12583)
  • Implement support for multi-character comments in read_csv (#12519)
  • Implement LazyFrame.sink_ndjson (#10786)
  • use JEMALLOC on all unix architectures (#12568)
  • improve concurrency parameters (#12567)
  • In explain(), rename PIPELINE to STREAMING so it's clearer what it means (#12547)

🐞 Bug fixes

  • error when invalid list to array is given (#12584)
  • parquet: do not extend existing nested that is already complete (#12569)
  • accidental panic if predicate selects no files (#12575)
  • fix lazy parquet slice with nested columns (#12558)
  • ensure stats-evalutor exists (#12566)
  • list schema of list eval (#12563)
  • ensure concurrency budget never locks (#12555)
  • Fix lazy schema for group_by_dynamic and rolling (#12551)
  • address overflow on vec capacity calculation for int_ranges with negative step (#12548)

🛠️ Other improvements

  • convert all recursive parquet deserialize to iterative (#12560)
  • Minor cleanup in Expr class (#12549)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Qqwy, @alexander-beedie, @dmitrybugakov, @fernandocast, @gab23r, @itamarst, @nameexhaustion, @ritchie46, @stinodego and @uchiiii

Rust Polars 0.35.0

17 Nov 13:31
b13afbe
Compare
Choose a tag to compare

🏆 Highlights

  • improve join performance through radix partitioned join (#12270)

💥 Breaking changes

  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Deprecate parse_int in favor of to_integer (#12464)
  • plugins add version and context (#12433)
  • Fix scan_csv error type (#12355)
  • Rename write_csv parameter has_header to include_header (#12351)
  • Rename is_signed to is_signed_integer (#12220)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • Rename ljust/rjust to pad_end/pad_start (#11975)

🚀 Performance improvements

  • speed up cov/corr with SIMD + strength-reduction ~3x 0.19.13/ ~2x numpy (#12471)
  • apply predicates and statistics of parquet files in streaming mode (#12439)
  • use online algorithm for cov/corr ~2x (#12412)
  • indexvec in group-by (#12371)
  • reduce allocations in hash join (#12368)
  • change concurrency parameters (#12321)
  • improve join performance through radix partitioned join (#12270)
  • remove extra multiplication in hash_to_partition (#12233)
  • allow non-power-of-two partitions (#12225)
  • Reduce compute in error message for failed datetime parsing (#12147)
  • improve parquet downloading (#12061)

✨ Enhancements

  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • support http scan_parquet (#12517)
  • Add support for UTF-8 BOM option in write_csv and sink_csv (#12253)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • Allow comparison of two local categories with the same hash (#12503)
  • more changes for versioned plugins (#12504)
  • plugins add version and context (#12433)
  • include i128 in more primitive functions (#12413)
  • write rolling functions as private expressions. (#12379)
  • Add round_sig_figs expression for rounding to significant figures (#11959)
  • change concurrency parameters (#12321)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • auto infer ambiguous for truncate and round (#12204)
  • Rename is_signed to is_signed_integer (#12220)
  • New Config options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
  • allow non-aggregation predicate in ternary groupby (#12286)
  • Add name= in .write_avro to set schema name (#12255)
  • Add support for reading zstd compressed files (no-options) in read_csv (#12214)
  • start prefetching all files immediately (#12201)
  • Add .list.to_array expression (#12192)
  • consolidate & improve all casting failure error messages (#12168)
  • tunable concurrency (#12171)
  • support reverse sort in streaming (#12169)
  • Add .arr.to_list expression (#12136)
  • add concurrency budget (#12117)
  • Introduce ignore_nulls for str.concat (#12108)
  • casting utf8 to temporal (#12072)
  • Add supertype for List/Array (#12016)
  • enable eq and neq for array dtype (#12020)
  • Expressify n of shift (#12004)
  • add dedicated name namespace for operations that affect expression names (#11973)

🐞 Bug fixes

  • fix incorrect ternary agg states (#12538)
  • fix and improve ternary evaluation on groups (#12529)
  • saturating sub in debug msg (#12525)
  • fix panic when writing Decimal type to parquet (#12532)
  • pre-fefetch struct columns in async projection pd (#12514)
  • rechunk cross join output in streaming (#12511)
  • fix as_list logical types (#12507)
  • fix streaming cross join on empty df (#12491)
  • dont overflow when calculating date range over very long periods (#12479)
  • Allow append/zip_with/extend on local categoricals (#12369)
  • Do not panic if time is invalid (#12466)
  • empty csv no-raise (#12434)
  • Fix scan_csv error type (#12355)
  • binary operations in aggregation context on literals (#12430)
  • update groups state after binary aggregation (#12415)
  • Remove extra \n when reading file-like object wi… (#12333)
  • revert ternary special broadcast, ensure broadcast is always to max height (#12395)
  • ensure first/last return null if empty (#12401)
  • Do not cast lit if has same dtype (#12342)
  • Fix index column name of rolling/dynamic group by (#12365)
  • ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
  • uint64 should be correctly extracted from python object (#12338)
  • expr_output_name include literal (#12335)
  • Fix Decimal dtype table repr (#12318)
  • Fix behavior of month intervals in date_range (#12317)
  • scan emtpy csv miss row_count (#12316)
  • zip_with also broadcast mask (#12309)
  • respect hive_partitioning flag when dealing with multiple files (#12315)
  • parquet, add row_count to empty file materialization (#12310)
  • fix download ranges in parquet (#12313)
  • object store path derivation for local URL (#12308)
  • don't move right endpoint of windows in rolling in default offset==-period case (#12267)
  • Raise more informative error on invalid reshape input (#12288)
  • incorrect super type for literals in nested binary exprs (#12238)
  • Update null_count after arithmetic (#12280)
  • fix ambiguous aggregation type (#12269)
  • Consistently propagate nulls for numpy ufuncs (#12212)
  • respect return_scalar of list scalars (#12251)
  • potential overflow (#12206)
  • always start a new thread if the thread is already blocking (#12202)
  • with_row_count should block predicate push down for lazy csv (#12187)
  • rechunk failed-list series before iterate (#12189)
  • Raise if *_horizontal without inputs (#12106)
  • fix incorrect desc sort behavior (#12141)
  • take should block predicate pushdown (#12130)
  • use null type when read from unknown row (#12128)
  • boundary predicate to block all accumulated predicates in push down (#12105)
  • make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
  • fix panic when initializing Series with array of list dtype (#12148)
  • Fix schema of arr.min/max (#12127)
  • ensure filter predicate inputs exist in schema (#12089)
  • str.concat on empty list (#12066)
  • binary agg should group aware if literal not a scalar (#12043)
  • Use Arrow schema for file readers (#12048)
  • Error on duplicates in hive partitioning (#12040)
  • display fmt for str split (#12039)
  • sum_horizontal should not always cast to int (#12031)
  • fix apply_to_inner's dtype (#12010)
  • Fix padding for non-ASCII strings (#12008)
  • inline parts of unstable unicode module for stable (#12003)
  • fix dot visualization of anonymous scans (#12002)
  • SQL table aliases (#11988)

🛠️ Other improvements

  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • fix and improve ternary evaluation on groups (#12529)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Add polars-ds to list of community plugins (#12527)
  • add schema test (#12523)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • add test for previous commit (#12510)
  • Support Python 3.12 (#12094)
  • Fix some typos (#12485)
  • Deprecate parse_int in favor of to_integer (#12464)
  • update rustc (#12468)
  • rename the DataType in the polars-arrow crate to ArrowDataType for clarity, preventing conflation with our own/native DataType (#12459)
  • Replace outdated dev dependency tempdir (#12462)
  • move cov/corr to polars-ops (#12411)
  • use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
  • dprint/markdown link checker minor updates (#12409)
  • replace as_u64 with dirty_hash (#12327)
  • Fix ruff linting invocation (#12350)
  • Rename write_csv parameter has_header to include_header (#12351)
  • Build and verify Rust examples in docs (#12334)
  • Fix some feature flags (#12325)
  • Organize Cargo.toml (#12323)
  • remove fxhash (#12322)
  • Run rustfmt on doc examples (#12319)
  • Consolidate "getting started" and "user guide" sections (#12246)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • simplify expr checking in predicate push down (#12287)
  • Replace dev dependency avro-rs with apache-avro (#12295)
  • Run clippy on all targets (#12293)
  • Add top-level make clippy, simplify Rust linting workflows (#12290)
  • ensure we git-ignore ALL .venv dirs (#12289)
  • incorrect super type for literals in nested binary exprs (#12238)
  • remove unwrap from group_by (#12263)
  • update object_store (#12006) (#12273)
  • Remove recommended setting from IDE docs (#12275)
  • Add feature flag for list.eval (#12254)
  • factor out some shared code in truncate_impl (#12229)
  • update Cargo.lock (#12226)
  • Make all functions in string namespace non-anonymous (#12215)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • use enum for Ambiguous (#12193)
  • Standardize project name formatting across docs (#12185)
  • Update sqlparser to 0.39 (#12173)
  • pin ring (#12176)
  • Refactor FunctionExpr module (#12162)
  • Fix tests for pyarrow 14 (#12170)
  • Fix triggers for docs deployment (#12159)
  • Make all functions in binary namespace non-anonymous (#12126)
  • Consolidate contributing info (#12109)
  • Fix typo in user-guide/expressions/plugins.md (#12115)
  • Update CODEOWNERS (#12107)
  • visualize plugin directory layout in user guide (#12092)
  • Minor improvements to the docs website (#12084)
  • reshape and repeat_by non-anoymous (#12064)
  • upgrade zstd to 0.13 in polars-parquet (#12062)
  • Direct CONTRIBUTING to the docs website (#12042)
  • inline parquet2 (#12026)
  • remove parquet logic from polars-arrow and consolidate logic in polars-parquet crat...
Read more

Python Polars 0.19.14

17 Nov 19:17
0c56f9b
Compare
Choose a tag to compare

🏆 Highlights

  • Support Python 3.12 (#12094)
  • make 1D numpy to polars conversion zero-copy for numeric data (#12403)

⚠️ Deprecations

  • Rename DataFrame column index methods (#12542)
  • Rename Series.set_at_idx to scatter (#12540)
  • Deprecate Series.view (#12539)
  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Deprecate Series.inner_dtype property (#12494)
  • Deprecate parse_int in favor of to_integer (#12464)
  • Deprecate DataType method is_not (#12458)
  • Deprecate Series methods is_boolean and is_utf8 (#12457)
  • Add DataType.is_integer and other dtype groups (#12200)

🚀 Performance improvements

  • speed up parquet download of streaming engine (#12544)
  • speed up cov/corr with SIMD + strength-reduction ~3x 0.19.13/ ~2x numpy (#12471)
  • apply predicates and statistics of parquet files in streaming mode (#12439)
  • use online algorithm for cov/corr ~2x (#12412)
  • make 1D numpy to polars conversion zero-copy for numeric data (#12403)

✨ Enhancements

  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • support http scan_parquet (#12517)
  • Add support for UTF-8 BOM option in write_csv and sink_csv (#12253)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • more changes for versioned plugins (#12504)
  • plugins add version and context (#12433)
  • Add DataType.is_integer and other dtype groups (#12200)
  • include i128 in more primitive functions (#12413)
  • write rolling functions as private expressions. (#12379)

🐞 Bug fixes

  • fix incorrect ternary agg states (#12538)
  • fix and improve ternary evaluation on groups (#12529)
  • saturating sub in debug msg (#12525)
  • fix panic when writing Decimal type to parquet (#12532)
  • pre-fefetch struct columns in async projection pd (#12514)
  • rechunk cross join output in streaming (#12511)
  • Ensure behaviour ofSeries comparison with timedelta matches that of other types (#12497)
  • fix as_list logical types (#12507)
  • fix streaming cross join on empty df (#12491)
  • dont overflow when calculating date range over very long periods (#12479)
  • Allow append/zip_with/extend on local categoricals (#12369)
  • Do not panic if time is invalid (#12466)
  • ensure explicit "return_dtype" is respected by map_dicts (#12436)
  • empty csv no-raise (#12434)
  • Fix scan_csv error type (#12355)
  • binary operations in aggregation context on literals (#12430)
  • raw HTML output alignment was incorrect for dtype in header (#12422)
  • update groups state after binary aggregation (#12415)
  • Remove extra \n when reading file-like object wi… (#12333)
  • Issue correct PolarsInefficientMapWarning for lshift/rshift operations (#12385)
  • revert ternary special broadcast, ensure broadcast is always to max height (#12395)
  • ensure first/last return null if empty (#12401)

🛠️ Other improvements

  • fix and improve ternary evaluation on groups (#12529)
  • Add polars-ds to list of community plugins (#12527)
  • Future-proof consortium standard test (#12524)
  • add schema test (#12523)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • add test for previous commit (#12510)
  • Update polars-hash reference (#12505)
  • Add note on hash stability and mention polars-hash (#12496)
  • Support Python 3.12 (#12094)
  • Improved import polars timing test; now much more consistent/reliable (#12478)
  • Use .with_columns() in all .list namespace examples (#12475)
  • update rustc (#12468)
  • Fix docs trigger (#12449)
  • Update for new maturin release (#12437)
  • Remove 'experimental' tag for auto-structify setting (#12435)
  • make "DataFrame" and "Series" case more consistent across docs/comments/errors (#12428)
  • dprint/markdown link checker minor updates (#12409)
  • Use manylinux_2_17 for building x86-64 wheel (#12408)
  • Use manylinux 2.24 instead of 2.28 for compatibility reasons (#12397)
  • use with_columns in is_in example, and fix some bullet points not rendering (#12383)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @abstractqqq, @alexander-beedie, @c-peters, @cmdlineluser, @hirohira9119, @ion-elgreco, @jerome3o, @nameexhaustion, @reswqa, @ritchie46, @stinodego and @uchiiii

Python Polars 0.19.13

10 Nov 16:23
24b6a54
Compare
Choose a tag to compare

🏆 Highlights

  • improve join performance through radix partitioned join (#12270)

⚠️ Deprecations

  • Rename write_csv parameter has_header to include_header (#12351)
  • Deprecate _saturating in duration string language, make it the default (#12301)
  • Switch args for Decimal and set default scale=0 (#12224)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • Deprecate DataFrame.as_dict positional input (#12131)

🚀 Performance improvements

  • indexvec in group-by (#12371)
  • Reduce allocations in hash join (#12368)
  • Change concurrency parameters (#12321)
  • Improve join performance through radix partitioned join (#12270)
  • Remove extra multiplication in hash_to_partition (#12233)
  • Allow non-power-of-two partitions (#12225)
  • Reduce compute in error message for failed datetime parsing (#12147)

✨ Enhancements

  • Updated BytecodeParser for Python 3.12 (#12348)
  • Add round_sig_figs expression for rounding to significant figures (#11959)
  • Change concurrency parameters (#12321)
  • Deprecate _saturating in duration string language, make it the default (#12301)
  • Auto-infer ambiguous for truncate and round (#12204)
  • Allow construction of Datetime series from datetime.date array (#12175)
  • New Config options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
  • Allow non-aggregation predicate in ternary groupby (#12286)
  • Add name= in .write_avro to set schema name (#12255)
  • Update write_delta to write large arrow types without casting (#12260)
  • Add support for reading zstd compressed files (no-options) in read_csv (#12214)
  • Start prefetching all files immediately (#12201)
  • Expose more options to plugin registration (#12197)
  • Add .list.to_array expression (#12192)
  • Consolidate & improve all casting failure error messages (#12168)
  • Add Binary dtype to hypothesis tests (#12140)
  • Tunable concurrency (#12171)
  • Support reverse sort in streaming (#12169)
  • Add .arr.to_list expression (#12136)
  • Support decimals in assert utils (#12119)
  • Add concurrency budget (#12117)
  • Improved support for use of file-like objects with DataFrame "write" methods (#12113)
  • Introduce ignore_nulls for str.concat (#12108)

🐞 Bug fixes

  • Do not cast lit if has same dtype (#12342)
  • Fix index column name of rolling/dynamic group by (#12365)
  • Ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
  • UInt64 should be correctly extracted from python object (#12338)
  • Ignore IDE-mediated DeprecationWarning when debugging tests under 3.12 (#12343)
  • expr_output_name include literal (#12335)
  • Fix Decimal dtype table repr (#12318)
  • Fix behavior of month intervals in date_range (#12317)
  • Scan empty csv miss row_count (#12316)
  • zip_with also broadcast mask (#12309)
  • respect hive_partitioning flag when dealing with multiple files (#12315)
  • parquet, add row_count to empty file materialization (#12310)
  • Fix invalid DeprecationWarning generated from date_range defined with 'saturating' interval (#12311)
  • fix download ranges in parquet (#12313)
  • object store path derivation for local URL (#12308)
  • don't move right endpoint of windows in rolling in default offset==-period case (#12267)
  • Raise more informative error on invalid reshape input (#12288)
  • incorrect super type for literals in nested binary exprs (#12238)
  • typo in exception message (#12278)
  • fix ambiguous aggregation type (#12269)
  • return frames from read_excel in the originally specified order (#12243)
  • Consistently propagate nulls for numpy ufuncs (#12212)
  • respect return_scalar of list scalars (#12251)
  • fix plugins system on Windows (#12230)
  • potential overflow (#12206)
  • always start a new thread if the thread is already blocking (#12202)
  • with_row_count should block predicate push down for lazy csv (#12187)
  • rechunk failed-list series before iterate (#12189)
  • Fix interchange protocol boolean buffer size (#12177)
  • fix incorrect desc sort behavior (#12141)
  • take should block predicate pushdown (#12130)
  • use null type when read from unknown row (#12128)
  • boundary predicate to block all accumulated predicates in push down (#12105)
  • make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
  • fix panic when initializing Series with array of list dtype (#12148)
  • Fix schema of arr.min/max (#12127)
  • ensure filter predicate inputs exist in schema (#12089)
  • Update null_count after arithmetic (#12280)

🛠️ Other improvements

  • Workaround for maturin issue (#12370)
  • Fix incorrect boundary column name in group_by_dynamic docstrings (#12366)
  • Fix typo in rolling_* docstrings (#12362)
  • Fix ruff linting invocation (#12350)
  • Clean up conversion utils (#11789)
  • Organize Cargo.toml (#12323)
  • Consolidate "getting started" and "user guide" sections (#12246)
  • Minor updates to prepare for Python 3.12 support (#12314)
  • Move script for testing map warning (#12306)
  • simplify expr checking in predicate push down (#12287)
  • Remove external link (#12223)
  • Fix rebase issue breaking CI (#12296)
  • Add top-level make clippy, simplify Rust linting workflows (#12290)
  • ensure we git-ignore ALL .venv dirs (#12289)
  • incorrect super type for literals in nested binary exprs (#12238)
  • Remove recommended setting from IDE docs (#12275)
  • Clean up Python test workflow (#12261)
  • clarify contains selector (#12265)
  • Add py-polars to Cargo workspace (#12256)
  • Use .with_columns in some docstrings (#12250)
  • Add test for scan_csv plus slice (#12239)
  • Fix emphasis formatting in docstring (#12240)
  • Fix emphasis formatting in docstring (#12237)
  • add deprecation notices to the docs for expressions moved into the new name namespace (#12236)
  • update Cargo.lock (#12226)
  • make sort test work with unstable sort (#12221)
  • Build Python wheels on manylinux_2_28 (#12211)
  • Include rust-toolchain.toml with sdist/wheels (#12184)
  • Standardize project name formatting across docs (#12185)
  • Update sqlparser to 0.39 (#12173)
  • pin ring (#12176)
  • Improve strip_{prefix, suffix} & strip_chars_{start, end} (#12161)
  • Fix tests for pyarrow 14 (#12170)
  • Fix rendering of note in DataFrame.fold (#12164)
  • Fix triggers for docs deployment (#12159)
  • Refactor some tests (#12121)
  • Consolidate contributing info (#12109)
  • Fix typo in user-guide/expressions/plugins.md (#12115)
  • Render docstring text in single backticks as code (#12096)
  • use more ergonomic syntax in select/with_columns where possible (#12101)
  • Update CODEOWNERS (#12107)
  • visualize plugin directory layout in user guide (#12092)
  • Minor tweak in code example in section Expressions/Aggregation (#12033)
  • Minor tweak in code example in section Expressions/Missing data (#12080)
  • Minor improvements to the docs website (#12084)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @alexander-beedie, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jrycw, @mcrumiller, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego and @wsyxbcl