Skip to content

Releases: pola-rs/polars

Python Polars 0.19.5

27 Sep 15:35
b83bf67
Compare
Choose a tag to compare

🚀 Performance improvements

  • remove double memcopy (#11365)
  • adress perf regression (#11354)

🐞 Bug fixes

  • revert invalid runtime check (#11363)
  • more cloud urls (#11361)
  • ensure cloud globbing can deal with spaces (#11360)
  • recognize more cloud urls (#11357)

🛠️ Other improvements

  • Disable version warning banner for now (#11359)
  • Fix error message reference to infer_schema_length (#11358)
  • Mark some tests as slow (#11350)
  • improve parametric tests for group_by_rolling by skipping overflowing cases (#11286)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @jonashaag, @orlp, @ritchie46 and @stinodego

Python Polars 0.19.4

27 Sep 10:01
66f0a6d
Compare
Choose a tag to compare

🏆 Highlights

⚠️ Deprecations

  • Add disable_string_cache (#11020)

🚀 Performance improvements

  • improve dynamic_groupby_iter (#11341)
  • improve and fix rolling windows by linear scanning (#11326)
  • faster init from pydantic models that have a small number of fields, and support direct init from SQLModel data (often used with FastAPI) (#11263)
  • improve outer join materialization (#11241)
  • use ryu and itoa for primitive serialization (#11193)
  • use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
  • Using cache for str.contains regex compilation (#11183)

✨ Enhancements

  • introduce 'label' instead of 'truncate' in group_by_dynamic, which can take label='right' (#11337)
  • Expressify list.shift (#11320)
  • top_k and bottom_k supports pass an expr (#11344)
  • add "pyxlsb" engine support to read_excel (for excel binary workbook files) (#11248)
  • support 'hive partitioning' aware readers (#11284)
  • str.strip_chars supports take an expr argument (#11313)
  • sample n can take an expr (#11257)
  • Add disable_string_cache (#11020)
  • clip supports expr arguments and physical numeric dtype (#11288)
  • Introduce list.drop_nulls (#11272)
  • str.splitn and split_exact can take an expr argument by (#11275)
  • introduce ambiguous option for dt.round (#11269)
  • Adds NULLIF and COALESCE SQL functions (#11124)
  • better tree-formatting representation (#11176)
  • natively support reading parquet for aws, gcp and azure (#11210)
  • Expressify str.strip_prefix & suffix (#11197)
  • Add support for Iceberg (#10375)
  • list.join's separator can be expression (#11167)
  • argument every of datetime.truncate can be expression (#11155)

🐞 Bug fixes

  • Fix Series.__contains__ for None values and implement is_in for null Series (#11345)
  • don't panic on multi-nodes in streaming conversion (#11343)
  • ensure trailing quote is written for temporal data when CSV quote_style is non-numeric (#11328)
  • clarify has_validity docstring and fix several cases where the presence of a bitmask was used to incorrectly infer the existence of null values (#11319)
  • fix empty Series construction edge-case with Struct dtype (#11301)
  • DataFrame init from collections.namedtuple values (#11314)
  • Exclude functools wrapper frames in find_stacklevel (#11292)
  • set partitions independent of thread pool (#11304)
  • address VSCode issue with autocomplete on selector expressions in editor/console (#11235)
  • consume duplicates in rolling_by window (#11261)
  • handle url encoded paths in objectpath creation (#11240)
  • use POOL when writing csv (#11222)
  • don't conflate saved Config JSON string with file path (#11098)
  • is_in for bool evaluate has_false incorrectly (#11217)
  • improve handling of database drivers that can return arrow data (#11201)
  • fix nullable filter mask in group_by (#11207)
  • replace n-th in filter (#11206)
  • fix translation of Series-nested datetime/date values for scan_pyarrow predicates (#11195)
  • address unexpected expression name from use of unary - or + operators (#11158)
  • impl hash for more function expr (#11182)
  • list.join's separator can be expression (#11167)
  • Add some missing expr type hint for series (#11171)
  • consistently use negative every as the default for offset in group_by_dynamic (#11164)
  • Make pl.struct serializable (#11169)
  • only raise on actual parameter collision when "dtypes" specified in read_excel "read_csv_options" (#11162)
  • propagate null value for str/binary starts/ends_with and contains (#11141)

🛠️ Other improvements

  • simplify/clarify group_by_dynamic examples (#11335)
  • tighten assert_frame_equal for LazyFrames (don't collect until after the schema has been checked) (#11331)
  • unify display for namespaced function expr (#11342)
  • add lazy pivot example (#11325)
  • Use GITHUB_TOKEN to get contributor information for docs (#11321)
  • Enable version warning banner (#11322)
  • cross-reference null_count from has_validity (clarifies the correct way to check for nulls) (#11323)
  • Pin pydantic in dev requirements <2.4.0 (#11312)
  • remove default auto-explode for map_many_private (#11270)
  • Add type alias IntoExprColumn (#11296)
  • update a few dependencies (#11283)
  • Properly skip ADBC test (#11282)
  • Fix some minor Makefile issues (#11276)
  • update sponsors (#11271)
  • parametric tests for group_by_rolling (#11262)
  • Make some list function expr non-anonymous (#11230)
  • Mention the performant feature only once (#11223)
  • remove unneeded indirection (#11233)
  • remove unneeded mutex around object-store (#11224)
  • clarify every/period/offset in group_by_dynamic (#11175)
  • Fix read_database batch_size docstring (#11132)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @billylanchantin, @jonashaag, @mcrumiller, @orlp, @ptiza, @reswqa, @ritchie46, @stinodego and @universalmind303

Rust Polars 0.33.0

17 Sep 16:31
7f8cd7d
Compare
Choose a tag to compare

🏆 Highlights

  • implementing sink_csv for LazyFrame (#10682)

💥 Breaking changes

  • empty product returns identity (#10842)
  • return f64 for rank when method="average" (#10734)
  • Rename groupby to group_by (#10654)
  • Read/write support for IPC streams in DataFrames (#10606)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)

⚠️ Deprecations

  • Rename is_first/last to is_first/last_distinct (#11130)
  • Rename count_match to count_matches (#11028)
  • Rename strip to strip_chars (#10813)
  • Add datetime_range expression function (#10213)
  • Rename Series/Expr.rolling_apply to rolling_map (#10750)

🚀 Performance improvements

  • improve performance of fast projection (#10945)
  • parse time zones outside of downcast_iter() in replace_time_zone (#10713)
  • use binary abstraction for atan2 (#10588)
  • use binary abstraction in pow (#10562)

✨ Enhancements

  • Expressify str.split argument. (#11117)
  • Expressify argument of binary contains (#11091)
  • dt.offset_by supports broadcasting lhs (#11095)
  • Expressify argument of binary starts_with and ends_with (#11076)
  • json_extract supports extract static and string value to list dtype (#11057)
  • add quote_style="never" option for write_csv (#11015)
  • add support for nextest (#11048)
  • Add literal for str count_match (#10996)
  • More dtypes supports cast to list (#11025)
  • ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
  • Add strip_prefix and strip_suffix to the string namespace (#10958)
  • Add datetime_range expression function (#10213)
  • add proper cache for Regex compilation (#10934)
  • implementation of array_to_string (#10839)
  • apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
  • accept expr in str.count_match (#10900)
  • accept expressions in .offset_by (#9967)
  • implement drop as special case of select (#10885)
  • Supports is_last operation (#10760)
  • activate cse for group_by (again) (#10749)
  • add pairwise float sum implementation (#10756)
  • implementing sink_csv for LazyFrame (#10682)
  • Supports series unique & arg_unique & n_unique for list (#10743)
  • repeat_by should also support broadcasting of LHS (#10735)
  • deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
  • is_first also supports numeric list type. (#10727)
  • improve slice pushdown in unions (#10723)
  • Support min and max strategy for binary & str columns fill null (#10673)
  • support broadcasting in list set operations (#10668)
  • add truncate_ragged_lines (#10660)
  • supports cast to list (#10623)
  • Rename groupby to group_by (#10654)
  • preserve whitespace in notebook output (#10644)
  • Read/write support for IPC streams in DataFrames (#10606)
  • improve binary (arity) generics (#10622)
  • propagate null is in is_in and more generic array construction (#10614)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • frame-level cast support (#10504)
  • Add failed column to cast exception (#10507)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)

🐞 Bug fixes

  • Correct hash and fmt for struct expr (#11119)
  • enforce sortedness of by argument in rolling_* functions (#11002)
  • Filter on empty objectChunked should not throw error (#11073)
  • ensure null_count statistics accounts for null array (#11070)
  • toggle off cse if ext_context is used (#11051)
  • Correct field dtype of string concat (#11055)
  • pushed-down expr should be considered when evaluating ExternalContext (#11023)
  • fix rolling_* functions when "by" has nanosecond resolution (#11005)
  • Don't reuse member for Selector::Add (#11026)
  • fix the construction of List<Null> (#10969)
  • allow singular null in regex pattern (#10948)
  • compute length of null array in explode (#10946)
  • Allow exactly one value in start/end for int_range (#10914)
  • count was falsy tagged as cse in group by (#10917)
  • Retain original dtype when deserializing an empty list (#10893)
  • CSE don't accept opaque functions (#10905)
  • Make int_range(s) exclusive on the upper bound when step is negative (#10898)
  • fix conversion from decimal to float (#10776)
  • Add broadcasting for list comparisons (#10857)
  • don't overflow length before checking limit (#10883)
  • fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
  • tag amortized iter unsafe and add safe alternatives (#10881)
  • use pool in dataframe arithmetic (#10864)
  • remove debug println! from datetime fn (#10862)
  • repair polars_err string interpolation (#10863)
  • make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
  • empty product returns identity (#10842)
  • never panic in hash/equality doesn't hold in cse (#10836)
  • Improve bound checks on temporal ranges (#10837)
  • var/std behavior around few elements (#10828)
  • Fix divided by zero error when read empty csv in streaming mode (#10819)
  • fix equality of quantile aggregation node (#10816)
  • Reading an only-header csv file in streaming mode should not panic (#10810)
  • get_single_leaf can't handle Expr::Count (#10790)
  • string to decimal parsing (#10712)
  • support groupby literal in streaming (#10771)
  • ORDER BY on unselected columns (#10752)
  • Fix is_in cannot cast list type for float (#10769)
  • fix unicode truncation in json parsing (#10761)
  • Error message of list unique should not display inner type (#10748)
  • create chunks_mut entry in vtable (#10745)
  • Prevent panic on sample_n with replacement from empty df (#10731)
  • only preserve sortedness flag in replace_time_zone when safe (#10738)
  • Error on value_counts on column named "counts" (#10737)
  • Build Series from empty Series vector (#10558)
  • return f64 for rank when method="average" (#10734)
  • Keep min/max and arg_min/arg_max consistent. (#10716)
  • Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
  • Cast small int type when scan csv in streaming mode. (#10679)
  • Reused input series in rolling_apply should not be orderly (#10694)
  • re-sort buffer when update window swap the whole buffer (#10696)
  • Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
  • Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
  • AllHorizontal format string (#10658)
  • List<null> chunked builder should take care of series name (#10642)
  • respect 'ignore_errors=False' in csv parser (#10641)
  • fix rename + projection pushdown (#10624)
  • fix int/float downcast in is_in (#10620)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Fix serialization for categorical chunked. (#10609)
  • join_asof missing tolerance implementation, address edge-cases (#10482)
  • Take input_schema to create physical expr for Selection (#10571)
  • fix serialization of empty lists (#10563)
  • Clear window cache after evaluate predication expr (#10505)
  • Parsing regex col in Expr::Columns (#10551)
  • sanitize column naming in boolean ops (#10531)
  • fix build for wasm (#10536)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • fix build for wasm (#9502)
  • rollback cse in groupby: python 0.18.15 (#10491)

🛠️ Other improvements

  • Removed duplicated example (#11109)
  • Add CODEOWNERS for docs folder (#11107)
  • Refactor starts_with and ends_with for string (#11085)
  • Integrate user guide (#11089)
  • remove feature gate join/groupby in polars-core (#10965)
  • Add Documentation issue type (#11042)
  • complete intra-docs in api documentation (#11007)
  • genericize take implementation (#10976)
  • genericize PolarsDataType (#10952)
  • enhance internal crates readme with reference to main crate (#10928)
  • Add Duration method for checking full days (#10850)
  • apply with_name in more places (#10899)
  • never compare opaque functions (#10906)
  • eliminate repetition in utf8 datetime functions (#10860)
  • Fix issue templates for bug reports (#10896)
  • remove LocalProjection (#10886)
  • request verbose logging output of minimal reproducable examples (#10882)
  • Reorganize range expression module (#10871)
  • introduce with_name for Series/ChunkedArray (#10859)
  • Further refactor temporal range functions (#10844)
  • Refactor range related functions (#10830)
  • Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
  • Fix some broken links / formatting (#10772)
  • Improve docs for polars-lazy (#10729)
  • update rustc nightly_2023-08-26 (#10467)
  • default to rust native flate2 lib (#10733)
  • Clear GitHub Actions caches weekly (#10715)
  • move 'is_in' to polars-ops (#10645)
  • Clean up schema calculation for date_range (#10653)
  • remove unused apply functions and add fallible generic apply functions (#10621)
  • Enforce up-to-date Cargo.lock (#10555)
  • make binary chunkedarray functions DRY (#10607)
  • bump MSRV to 1.65 (#10568)
  • genericize chunk implementation (#10506)
  • use ChunkArray::(try_)from_chunk_iter (#10497)
  • add VSCode rust-analyzer settings (#10498)
  • Update URLs for dev documentation (#10495)
  • Update features for latest flate2 release (#10492)

Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @an...

Read more

Python Polars 0.19.3

15 Sep 15:32
e8949ff
Compare
Choose a tag to compare

🏆 Highlights

⚠️ Deprecations

  • Rename is_first/last to is_first/last_distinct (#11130)
  • Rename count_match to count_matches (#11028)
  • Rename strip to strip_chars (#10813)
  • Add datetime_range expression function (#10213)

🚀 Performance improvements

  • optimize _unpack_schema() (#11080)
  • optimize polars.utils._post_apply_columns() (#11086)
  • optimize polars.utils._post_apply_columns() (#11041)
  • optimize _unpack_schema() (#10960)
  • improve performance of fast projection (#10945)

✨ Enhancements

  • Expressify str.split argument. (#11117)
  • Polars plugins (#10924)
  • better async_collect (#10912)
  • Expressify argument of binary contains (#11091)
  • dt.offset_by supports broadcasting lhs (#11095)
  • Expressify argument of binary starts_with and ends_with (#11076)
  • add OpenOffice spreadsheet support via new pl.read_ods function (#11011)
  • json_extract supports extract static and string value to list dtype (#11057)
  • add quote_style="never" option for write_csv (#11015)
  • Add literal for str count_match (#10996)
  • More dtypes supports cast to list (#11025)
  • Add strip_prefix and strip_suffix to the string namespace (#10958)
  • improve read_excel table data identification (#10953)
  • Add from_dataframe fast path and improve typing (#10979)
  • add openpyxl as a new/optional engine for read_excel (#6183)
  • Add datetime_range expression function (#10213)

🐞 Bug fixes

  • Correct hash and fmt for struct expr (#11119)
  • enforce sortedness of by argument in rolling_* functions (#11002)
  • Make Series.__getitem__ raise an IndexError (#11061)
  • Filter on empty objectChunked should not throw error (#11073)
  • ensure null_count statistics accounts for null array (#11070)
  • toggle off cse if ext_context is used (#11051)
  • Correct field dtype of string concat (#11055)
  • fix partial schema init with read_dicts and reduce latency of small-frame creation (#11047)
  • pushed-down expr should be considered when evaluating ExternalContext (#11023)
  • fix rolling_* functions when "by" has nanosecond resolution (#11005)
  • Don't reuse member for Selector::Add (#11026)
  • ensure series_equal properly accounts for dtypes when strict=True (#11012)
  • fix the construction of List<Null> (#10969)
  • write_excel "hidden_columns" parameter fails when taking a selector (#10987)
  • allow singular null in regex pattern (#10948)
  • compute length of null array in explode (#10946)

🛠️ Other improvements

  • remove low contrast coloring from visited links (#11133)
  • Ignore matplotlib warning (#11129)
  • Do not run user guide examples by default (#11128)
  • Ignore matplotlib mypy warnings (#11126)
  • Add deprecation message in groupby docs (#11121)
  • Removed duplicated example (#11109)
  • Add CODEOWNERS for docs folder (#11107)
  • Refactor starts_with and ends_with for string (#11085)
  • Integrate user guide (#11089)
  • remove mentions of the deprecated random module (#11087)
  • simplify SchemaDefinition type alias (#11077)
  • put fetch explanation in a "notes" block to better highlight it in the docs (#11058)
  • remove feature gate join/groupby in polars-core (#10965)
  • Add Documentation issue type (#11042)
  • warn that "by" argument must be sorted for results to be correct in rolling_* functions (#11013)
  • Adds missing method refs in LazyDataFrame API docs (#11027)
  • Add lint for boolean trap (#11010)
  • Add private LazyFrame method for setting sink optimizations (#10988)
  • Enable a few more ruff lints (#10998)
  • document polars string duration language in temporal range functions (#10978)
  • Additional tests for interchange get_data_buffer (#10966)
  • genericize PolarsDataType (#10952)
  • Document that filter, drop_nulls, left join preserve order (#10955)
  • add note about adbc flight sql driver (#10949)
  • Revert pydantic >= 2.0.0 requirement (#10944)
  • note that pl.duration represents fixed durations, point to offset_by for non-fixed (#10927)
  • Test S3 functionality using moto server (#10164)

Thank you to all our contributors for making this release possible!
@I8dNLo, @KacpiW, @MarcoGorelli, @Object905, @Qqwy, @TNieuwdorp, @alexander-beedie, @antoniocali, @bvanelli, @cjackal, @henrikig, @jakob-keller, @mrogowski11, @nameexhaustion, @orlp, @reswqa, @ritchie46, @s-banach, @stinodego, @svaningelgem and @thomasjpfan

Python Polars 0.19.2

05 Sep 14:33
5aa9d04
Compare
Choose a tag to compare

🏆 Highlights

  • Add syntactic sugar for col("foo") -> col.foo (#10874)

⚠️ Deprecations

  • Rename Expr.is_not() to not_() (#10838)

✨ Enhancements

  • allow individual Config options to be easily reset to their default value (#10922)
  • accept expr in str.count_match (#10900)
  • allow additional glimpse customisation, fix strings repr (#10895)
  • accept expressions in .offset_by (#9967)
  • support schema overrides for frames created from databases (#10884)
  • Add syntactic sugar for col("foo") -> col.foo (#10874)
  • support negative indexing in set_at_idx (#10891)
  • implement drop as special case of select (#10885)
  • raise a more helpful error when non-query statements passed to read_database (#10851)

🐞 Bug fixes

  • Allow exactly one value in start/end for int_range (#10914)
  • fix(rust, python): raise error when function didn't receive any inputs (#8635)
  • count was falsy tagged as cse in group by (#10917)
  • CSE don't accept opaque functions (#10905)
  • Make int_range(s) exclusive on the upper bound when step is negative (#10898)
  • don't overflow length before checking limit (#10883)
  • fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
  • use pool in dataframe arithmetic (#10864)
  • repair polars_err string interpolation (#10863)
  • make count_match docs and extract_all docs/impl consistent around zero matches (#10854)

🛠️ Other improvements

  • Set minimum version for pydantic to 2.0.0 (#10923)
  • fix and clarify docs for Expr.map_elements (#10647)
  • fix rendering of bullet points in dt.round (#10911)
  • add test for 10875 (#10913)
  • apply with_name in more places (#10899)
  • never compare opaque functions (#10906)
  • eliminate repetition in utf8 datetime functions (#10860)
  • Fix issue templates for bug reports (#10896)
  • request verbose logging output of minimal reproducable examples (#10882)
  • add a note about read_database connection/cursor behaviour (#10873)
  • introduce with_name for Series/ChunkedArray (#10859)

Thank you to all our contributors for making this release possible!
@Barsik-sus, @MarcoGorelli, @alexander-beedie, @c-peters, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @jeroenjanssens, @orlp, @ritchie46, @stinodego and @wdoppenberg

Python Polars 0.19.1

01 Sep 05:31
ad73217
Compare
Choose a tag to compare

💥 Breaking changes

  • empty product returns identity and product ignores nulls (#10842)

✨ Enhancements

  • add binary, boolean, categorical, date, object, and time selectors (#10806)
  • Supports is_last operation (#10760)
  • minor typing improvement for DataFrame.__iter__ (#10825)
  • Add custom error for allow_copy=False (#10822)

🐞 Bug fixes

  • empty product returns identity (#10842)
  • never panic in hash/equality doesn't hold in cse (#10836)
  • Improve bound checks on temporal ranges (#10837)
  • var/std behavior around few elements (#10828)
  • Fix divided by zero error when read empty csv in streaming mode (#10819)
  • behaviour of reversed(df) (#10823)
  • fix equality of quantile aggregation node (#10816)
  • Reading an only-header csv file in streaming mode should not panic (#10810)

🛠️ Other improvements

  • Refactor range related functions (#10830)
  • map-related docstring updates (#10779)
  • Move sink tests to streaming module (#10821)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @orlp, @reswqa, @ritchie46 and @stinodego

Python Polars 0.19.0

30 Aug 14:04
b1f60cd
Compare
Choose a tag to compare

An upgrade guide is available on our website.

🏆 Highlights

  • implementing sink_csv for LazyFrame (#10682)
  • Support DataFrame init from queries against users' existing database connections (#10649)
  • Rename groupby to group_by (#10656)

💥 Breaking changes

  • return f64 for rank when method="average" (#10734)
  • Update a lot of error types (#10637)
  • Remove deprecated behavior from vertical aggregations (#10602)
  • Read/write support for IPC streams in DataFrames (#10606)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Improve consistency of parsing expression input (#9512)
  • allow from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)
  • Improve some error types and messages (#10470)

⚠️ Deprecations

  • Rename map to map_batches (#10801)
  • Rename GroupBy.apply to map_groups (#10799)
  • Rename DataFrame.apply to map_rows (#10797)
  • Rename Series/Expr.rolling_apply to rolling_map (#10750)
  • Rename Series/Expr.apply to map_elements (#10678)
  • Rename groupby to group_by (#10656)
  • Deprecate some parameters of cut/qcut (#10484)

🚀 Performance improvements

  • parse time zones outside of downcast_iter() in replace_time_zone (#10713)
  • use binary abstraction for atan2 (#10588)
  • use binary abstraction in pow (#10562)

✨ Enhancements

  • activate cse for group_by (again) (#10749)
  • implementing sink_csv for LazyFrame (#10682)
  • Supports series unique & arg_unique & n_unique for list (#10743)
  • repeat_by should also support broadcasting of LHS (#10735)
  • deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
  • is_first also supports numeric list type. (#10727)
  • improve slice pushdown in unions (#10723)
  • Explicitly implement Protocol for interchange classes (#10688)
  • Support min and max strategy for binary & str columns fill null (#10673)
  • support broadcasting in list set operations (#10668)
  • csv: add schema argument (#10665)
  • Support DataFrame init from queries against users' existing database connections (#10649)
  • add truncate_ragged_lines (#10660)
  • supports cast to list (#10623)
  • Update a lot of error types (#10637)
  • preserve whitespace in notebook output (#10644)
  • Remove deprecated behavior from vertical aggregations (#10602)
  • support selector usage in write_excel arguments (#10589)
  • Add LazyFrame.collect_async and pl.collect_all_async (#10616)
  • Read/write support for IPC streams in DataFrames (#10606)
  • propagate null is in is_in and more generic array construction (#10614)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • frame-level cast support (#10504)
  • Improve consistency of parsing expression input (#9512)
  • Add failed column to cast exception (#10507)
  • allow from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
  • Remove deprecated get_idx_type - use get_index_type instead (#10556)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)
  • Improve some error types and messages (#10470)
  • suggest str.to_datetime instead of apply and stdlib strptime (#10266)

🐞 Bug fixes

  • get_single_leaf can't handle Expr::Count (#10790)
  • support groupby literal in streaming (#10771)
  • ORDER BY on unselected columns (#10752)
  • Fix is_in cannot cast list type for float (#10769)
  • whitespace CSS in Notebook HTML updated to use pre-wrap instead of pre (#10739)
  • only preserve sortedness flag in replace_time_zone when safe (#10738)
  • Error on value_counts on column named "counts" (#10737)
  • return f64 for rank when method="average" (#10734)
  • Keep min/max and arg_min/arg_max consistent. (#10716)
  • use time zone from dtype to overwrite output time zone when initialising Series (#10689)
  • Cast small int type when scan csv in streaming mode. (#10679)
  • raise exception with invalid on arg type for join_asof (#10690)
  • Reused input series in rolling_apply should not be orderly (#10694)
  • re-sort buffer when update window swap the whole buffer (#10696)
  • Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
  • Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
  • Correctly handle time zones in write_delta (#10633)
  • fix apply for empty series in threading mode (#10651)
  • respect 'ignore_errors=False' in csv parser (#10641)
  • fix rename + projection pushdown (#10624)
  • fix int/float downcast in is_in (#10620)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Fix serialization for categorical chunked. (#10609)
  • Take input_schema to create physical expr for Selection (#10571)
  • Clear window cache after evaluate predication expr (#10505)
  • Parsing regex col in Expr::Columns (#10551)
  • sanitize column naming in boolean ops (#10531)
  • Fix write_delta with schema in delta_write_options (#10541)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • respect pl.Config options relating to shape, column names, and types when rendering HTML (#10449)

🛠️ Other improvements

  • update cargo.lock (#10800)
  • Create .venv in repo root (#10789)
  • refactored write_database unit tests to properly separate concerns (#10773)
  • Fix some broken links / formatting (#10772)
  • Document chained when-then behaviour more prominently (#10759)
  • Fix test failing due to new adbc release (#10763)
  • Unpin connectorx and bump other Python dependencies (#10753)
  • add note to testing docs about module import (#10741)
  • Clear GitHub Actions caches weekly (#10715)
  • Update for new pyarrow 13.0.0 behavior (#10691)
  • Fix minor issue with sink_parquet docs (#10669)
  • Remove deprecate_renamed_methods util (#10537)
  • add "see also" entries to ne/eq_missing and update related examples (#10667)
  • fix potential memory leak from usage of inspect.currentframe (#10630)
  • give more relevant example for polars.apply (#10631)
  • Bump ruff and enable new setting (#10626)
  • Add docstrings for Expr.meta namespace (#10617)
  • Enforce up-to-date Cargo.lock (#10555)
  • deprecate DataFrame.replace (#10600)
  • ensure that make requirements fully refreshes unpinned packages/deps (#10591)
  • fix out-of-date explain default parameter (#10566)
  • Fix expr_dispatch decorator to work on methods with decorators (#10549)
  • Fix link to source code (#10542)
  • Add title to index page (#10539)
  • Disable SIM108 lint (#10519)
  • Keep versioned docs (#10500)
  • switch to pyo3/maturin-action (#10503)
  • Update URLs for dev documentation (#10495)
  • Skip failing test (#10496)
  • Add version switcher to API reference (#10488)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Object905, @OndrejSlamecka, @SeanTroyUWO, @VasanthakumarV, @alexander-beedie, @aminalaee, @braaannigan, @c-peters, @ion-elgreco, @lorepozo, @marki259, @mcrumiller, @messense, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @sdamashek, @stinodego, @svaningelgem, @titoeb, @trueb2, @washcycle and @zundertj

Python Polars 0.18.15

15 Aug 07:01
0357177
Compare
Choose a tag to compare

🐞 Bug fixes

  • rollback cse in groupby: python 0.18.15 (#10491)

🛠️ Other improvements

  • Mark import timing check as slow (#10487)
  • Gather all streaming tests (#10485)
  • Bump maturin to version 1.2.1 (#10479)

Thank you to all our contributors for making this release possible!
@ritchie46 and @stinodego

Rust Polars 0.32.0

14 Aug 13:49
ec0c91f
Compare
Choose a tag to compare

🏆 Highlights

  • common subexpression elemination (#9632)

💥 Breaking changes

  • remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)

⚠️ Deprecations

  • renaming approx_unique as approx_n_unique (#10290)
  • remove/deprecate cache and its logic (#10066)
  • Add date_ranges/time_ranges expression functions (#10005)

🚀 Performance improvements

  • pre-alloc int_ranges (#10399)
  • use hash as CSE Identifier (#10385)
  • re-use regex capture allocation (#10302) (#10335)
  • don't parallelize literal expressions (#10321)
  • fix O(n^2) in sorted check during append (#10241)
  • speedup mode on sorted data (#10084)
  • speedup boolean apply (#10073)
  • shrink alp/lp ~2.5x (#10039)
  • Remove fused arithmetic from expressions with literals (#10011)

✨ Enhancements

  • quote style option for csv writer (#10422)
  • add "raise_if_empty" flag to read_excel, read_csv, scan_csv, and read_csv_batched (#10409)
  • be more permissive on predicate pushdown to left side of left join (#10442)
  • add use_earliest to to_datetime / strptime (#10426)
  • {any/all}_horizontal to expression architecture (#10412)
  • serialize flags (#10140)
  • allow unaligned pointers in arrow FFI (#10403)
  • add line_terminator option to write_csv (#10373)
  • Add is_local and to_local to categorical namespace (#10372)
  • cse for groupby.agg and reduced cse collisions (#10381)
  • re-use regex capture allocation (#10302) (#10335)
  • Add Series.cat.uses_lexical_ordering (#10325)
  • improve datetime parsing error message (#10332)
  • allow sequential runners in select/with_columns (#10322)
  • improve err msg parsing time, date, datetime (#10298)
  • Add str.extract_groups (#10179)
  • add extra build profiles (#10268)
  • Extend datetime expression function with time zone/time unit parameters (#10235)
  • added gcs to gcp cloud schema in polars-core::cloud #10206. (#10207)
  • support writing duration type in json (#10112)
  • inline lit(Series).cast(..) to -> lit(Series.cast(..)) (#10092)
  • Move transpose naming to Rust (#10009)
  • cse in groupby's (#10062)
  • Adds sql CASE statement expressions (#10065)
  • Add date_ranges/time_ranges expression functions (#10005)
  • comm_subexpr_elim in streaming 'select/with_columns' (#10050)
  • common subexpression elemination (#9632)
  • Let qcut create evenly spaced probabilities (#9960)
  • sorted flag on singletons (#9933)
  • maintain sorted flag after partition_by (#9944)
  • keep sorted flag in streaming left join (#9932)
  • Add cloudpickle for serializing python UDFs (#9921)

🐞 Bug fixes

  • Fix incorrect handling of VisitRecursion::Skip. (#10452)
  • fix negative decimal parsing (#10444)
  • ensure sorted_sink hash equals the default path (#10464)
  • fix sum agg (#10459)
  • ensure last aggregation deals with default chunk (#10453)
  • fix cse input schema (#10450)
  • fix list groupby of array dtype (#10408)
  • correct AnyValue::hash (#10391)
  • finalize cast in partitioned groupby (#10359)
  • fix oob in 'last' (#10329)
  • fix categorical lexical sort (#10318)
  • Fix join validation (#10257)
  • Set correct dtype for .extract_groups() (#10306)
  • clear window cache and run windows on proper runners (#10303)
  • fix sorted fast path in streaming groupby wrt nulls (#10289)
  • fix nan aggregation in groupby (#10287)
  • check dtypes of single-column 'by' parameter in asof-join (#10284)
  • fix pyo3 link errors on macos (#10256)
  • fix empty streaming parquet file (#10252)
  • fix logical columns of streaming multi-column sort (#10250)
  • fix date/datetime parsing for short inputs with exact=False (#10231)
  • correct agg_sum for ChunkedArray. (#10243)
  • don't panic in wildcard apply (#10240)
  • fix cse profile (#10239)
  • correct struct null counts (#10142)
  • no cse in groupby until fixed (#10216)
  • fix is_in on empty series (#10195)
  • fix cse windows (#10197)
  • block predicate pushdown is_in and null producing … (#10194)
  • prevent re-ordering of dict keys inside .apply (#10172)
  • initialize fixed null values (#10192)
  • ensure window function run partitioned when cse is hit (#10170)
  • adjust for null values in str.replace fast path (#10132)
  • clear bit settings in list iteration (#10131)
  • use row-encoded for struct::is_sorted (#10129)
  • fix(rust, python): don't run file-caching in streaming mode (#10117)
  • Allow initialize of pl.Array in Dataframe using schema alone (#10100)
  • don't panic if masked out values are invalid in temporal kernels (#10114)
  • Fix struct get field by index out of bounds error. (#10097)
  • fix ub in simd-json (#10093)
  • fix invalid access when groupby rolling produces empty sets (#10109)
  • respect null_on_oob=False in list.take when pa… (#10105)
  • fix is_sorted for structs (#10099)
  • add file path to io error in scan_csv (#10076)
  • fix false positive in parquet stats evaluation (#10087)
  • fix error message from cast-timezone to replace-time-zone (#10089)
  • Address .col(regex).exclude() operations not executing. (#10025)
  • fix Boolean::isin(null values) (#10074)
  • predicate pushdown #10058 (#10071)
  • Fix weighted quantile for 0 weights (#10051)
  • fix incorrect state in projection pushdown with joins (#9987)
  • don't pass predicates referring to renamed literal… (#9965)
  • fix regression in regex expansion (#9952)
  • potential SO in csv infer schema (#9950)
  • raise on unsupported transpose and object types (#9946)
  • Fix as-of join when by groups are interleaved (#9938)

🛠️ Other improvements

  • fix and run polars-plan tests (#10465)
  • Simplify flag methods (#10429)
  • match_block_trailing_comma (#10414)
  • implement ChunkArray::(try_)from_chunk_iter (#10395)
  • add test for 10401 (#10405)
  • Bump some dependencies (#10396)
  • Move dependency version info to workspace level (#10295)
  • patch reedline until fix released (#10382)
  • remove wasm-timer dependency (#10347)
  • write down invariants of ChunkedArray (#10334)
  • fix typo in lib.rs (#10313)
  • Exclude examples from workspace default (#10309)
  • Update CODEOWNERS (#10261)
  • avoid outputting docs of dependencies (#10292)
  • Do not keep history in gh-pages branch (#10282)
  • Use workspace package info / organize dependencies section (#10279)
  • fix dead links in Rust documentation (#10251)
  • Fix make pre-commit command (#10205)
  • Fix make integration-tests command (#10202)
  • Replace "question" issues with link to Stack Overflow (#10230)
  • Update dependabot config (#10222)
  • Fix LICENSE symlink for moved crates (#10150)
  • Re-organize folder structure for Rust crates (#10141)
  • update to rustc nightly-2023-07-27 (#10139)
  • temporarily turn off fail-fast so that ubuntu tests run (#10133)
  • Refactor when/then/otherwise internals (#9922)
  • move replace_time_zone to polars-ops (#10078)
  • remove unneeded branch (#10082)
  • remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)
  • fix typo in contribution example (#10038)
  • correct example in API reference (#10032)
  • add developer contribution examples (#10013)
  • Update autolabeler again (#9984)
  • fix docs build and add to CI (#9904)
  • Minor makeover for Rust Makefile (#9874)

Thank you to all our contributors for making this release possible!
@0xbe7a, @CanglongCl, @JulianCologne, @MarcoGorelli, @OndrejSlamecka, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @TLouf, @alexander-beedie, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @duvenagep, @eltociear, @fsimkovic, @ion-elgreco, @jonashaag, @lfn3, @magarick, @mcrumiller, @orlp, @potzenhotz, @rea1bacon, @reswqa, @rikkaka, @ritchie46, @stinodego, @thomasaarholt, @varunmittal91 and @zundertj

Python Polars 0.18.14

14 Aug 13:48
ec0c91f
Compare
Choose a tag to compare

🏆 Highlights

  • Native implementation of dataframe interchange protocol (#10267)

⚠️ Deprecations

  • Deprecate behavior of list/tuple inputs for lit (#10461)

🚀 Performance improvements

  • optimise retrieval of values from df.item (~4-5x speedup) (#10411)
  • pre-alloc int_ranges (#10399)
  • use hash as CSE Identifier (#10385)

✨ Enhancements

  • quote style option for csv writer (#10422)
  • add "raise_if_empty" flag to read_excel, read_csv, scan_csv, and read_csv_batched (#10409)
  • add use_earliest to to_datetime / strptime (#10426)
  • add new "header_format" option for write_excel (#10392)
  • {any/all}_horizontal to expression architecture (#10412)
  • Native implementation of dataframe interchange protocol (#10267)
  • allow unaligned pointers in arrow FFI (#10403)
  • add line_terminator option to write_csv (#10373)
  • add explicit selector variants for signed/unsigned integers (#10384)
  • Add is_local and to_local to categorical namespace (#10372)
  • enhance selectors expansion function, so it can operate on a schema as well as a frame (#10341)
  • Order percentiles in describe (#10378)
  • cse for groupby.agg and reduced cse collisions (#10381)
  • improve take_every(0) exception (#10352)
  • add offset and length to get_ptr (#10361)

🐞 Bug fixes

  • fix pyarrow write_to_dataset wrt check_not_directory parameter (#10471)
  • fix negative decimal parsing (#10444)
  • ensure sorted_sink hash equals the default path (#10464)
  • address inconsistency in init from square numpy arrays with/without an explicit schema (#10445)
  • ensure last aggregation deals with default chunk (#10453)
  • fix cse input schema (#10450)
  • Fix by argument handling in join_asof (#10447)
  • fix potential OverflowError in testing asserts with huge UInt64 diffs (#10437)
  • Create delta compatible schema during writing (#10165)
  • fix list groupby of array dtype (#10408)
  • correct AnyValue::hash (#10391)
  • finalize cast in partitioned groupby (#10359)

🛠️ Other improvements

  • add vertical_relaxed example for pl.concat (#10472)
  • Run all streaming tests on the same test runner (#10469)
  • Organize OOC tests (#10463)
  • add test for 10417 (#10420)
  • Clean up some Sphinx settings (#10400)
  • add test for 10401 (#10405)
  • Address Ruff per file ignores (#10258)
  • Small improvement for PySeries.get_buffer (#10363)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @OndrejSlamecka, @alexander-beedie, @c-peters, @cmdlineluser, @drgif, @ion-elgreco, @lfn3, @orlp, @potzenhotz, @rea1bacon, @reswqa, @ritchie46, @stinodego and @zundertj