Rust Polars 0.27.0
🏆 Highlights
- Formalize list aggregation difference between groupbys, selection and window functions (#6487)
⚠️ Breaking changes
- error on string <-> date cmp (#6498)
- Formalize list aggregation difference between groupbys, selection and window functions (#6487)
- show where error messages originated (#6482)
str.strip
with multiple chars (#5929)
🚀 Performance improvements
- update string replacement codepaths following new benchmarking (#6777)
- improve dynamic groupby performance on sorted keys (#6599)
- faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (#6472)
- Improve rechunk check (#6268)
- reuse allocated scratches in ipc writer (#6287)
- use dedicated writer thread for sink_parquet (#6285)
- first check rev-map on categorical equality check (#6085)
- ensure set_at_idx is O(1) (#5977)
- use iterator instead of loop
polars_io::csv::parser::skip_condition
(#5157)
✨ Enhancements
- accept separator for pivot and to_dummies (#6780)
- feat(rust, python) rename 'tz' to 'time_zone' in convert_time_zone and replace_time_zone (#6784)
- rename with_time_zone to convert_time_zone and cast_time_zone to replace_time_zone (#6768)
- support timezone in csv writer (#6722)
- implement series abstractions for
Int128Type
(#6679) - parse timezone from Datetime (#6766)
- formally support duration division (#6758)
- add argmin/max for utf8 data (#6746)
- Support an ignore_nulls param for EWM calculations. (#5749) (#6742)
- deprecate tz_localize (#6693)
- guarantee schema-stable
col(dtype)
selection (#6674) - better-characterise
NotFound
exceptions (#6670) - disallow with_time_zone from/to tz-naive (#6659)
- let cast_time_zone work on tz-naive and deprecate tz-localize (#6649)
- implement fill_null for list data (#6635)
- expression functions should be nullable (#6629)
- add streamable udfs (#6614)
- is_first for struct dtype (#6595)
- Added from_str_radix method to StringNameSpace that allows to parse strings from any base to i32 (#6570)
- improve predicate pushdown (#6579)
- raise error on invalid binary cmp (#6564)
- let cast_time_zone accept None (#6539)
- add utc parameter to strptime (#6496)
- add meta 'has_multiple_outputs', 'is_regex_projec… (#6500)
- error on string <-> date cmp (#6498)
- show where error messages originated (#6482)
- faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (#6472)
- allow expr in str.contains (#6443)
- add float formatting option (#6432)
- allow expressions as arguments in
str.ends_with
(#6361) - accept expr in
str.starts_with
(#6355) - add strict parameter to decoding expressions (#6342)
- allow unordered struct creating from anyvalues (#6321)
- parse abbrev month name (#6314)
- add
dt.combine
for combining date and time components (#6121) - add sink_ipc (#6286)
- ensure ooc sort works ooc with all-constant values (#6235)
- The 1 billion row sort (#6156)
- optionally treat missing UTF8 values as the empty string at CSV parse-time (#6203)
- When moving error out of
LogicalPlan
, leave behind String with error message instead ofNone
(#6199) - generalize the cloud storage builders (#5972)
- Implement
DataFrame.unique(keep="none")
(#6169) - add
arr.take
expression (#6116) - allow
extend_constant
to work with date literals (#6114) - allow nested categorical cast (#6113)
- add a
rounded_corners
modifier topl.Config.set_tbl_formatting
(#6108) - Get polars to compile to wasm target (#6050)
- add search_sorted for arrays and utf8 dtype (#6083)
- improve error message when writing nested data to… (#6040)
- updated default table format from "UTF8_FULL" to "UTF8_FULL_CONDENSED" (#5967)
str.strip
with multiple chars (#5929)- support glob in parquet object_storage (#5928)
- read decimal as f64 (#5938)
- improve query plan scan formatting (#5937)
- allow all null cast (#5933)
- truncate by calendar weeks (#5759)
- merge sorted dataframes (#5817)
- impl hex and base64 for binary (#5892)
- streaming parquet from object_stores (#5871)
🐞 Bug fixes
- always rechunk if n_chunks > n_rows (#6786)
- fix ndjson empty array parsing (#6785)
- make some list expressions aware of groupby context (#6776)
- use explicit drop function node (#6769)
- don't set sorted flag if we reverse sort the left … (#6772)
- handle edge-case with string-literal replacement when the replace value looks like a capture group (#6765)
- respect skip_rows in glob parsing csv (#6754)
- Improve error message in DataFrame constructor (#6715)
- arrow map dtype conversion (#6732)
- dedicated
rename
implementation. (#6688) - return correct display/repr names for NaN-related expressions (#6721)
- strftime with time zone directive (#6673)
- improve error message in date_range with invalid units (#6671)
- remove uses of rayon global thread pool (#6682)
- true-divide output type (#6665)
- fix(rust, python) cast to and from fixed offsets (#6602)
- raise error on string numeric arithmetic (#6601)
- partially assert sortedness in groupby dynamic (#6593)
- fix(rust, python); raise oob if negative index given to take (#6590)
- fix predicate pushdown key check (#6577)
- fix schema of apply with many inputs on empty df (#6571)
- let lhs determine struct order in supertype (#6572)
- fix(rust, python) validate utc, fmt, and tz-aware in strptime (#6550)
- add strptime to filter boundary (#6560)
- list eval all null array (#6545)
- implement ser/de for BinaryChunked (#6543)
- raise if tz_localize called on UTC-aware (#6526)
- make concat_list group aware (#6527)
- error on invalid expanding expression (#6521)
- create from dicts directly as struct categorical (#6520)
- fix oob in arr.get by expressions (#6519)
- fix cse schema (#6518)
- panic when max_len -1 is reached (#6494)
- Formalize list aggregation difference between groupbys, selection and window functions (#6487)
- fix(rust, python) validate tz in with_time_zone (#6417)
- faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (#6472)
- use consistent floor division for floats/ints (#6460)
- split semi/anti join optimization (#6459)
- fix doc comment in ParallelStrategy (#6444)
- fix projection pushdown on double semi join (#6440)
- cumulative_eval ensure output dtype is respected (#6435)
- auto-detect %+ as tz-aware (#6434)
- correct error message in cast_time_zone (#6411)
- only use float simd on specific alignment (#6427)
- no early escape when window is equal to len in rolling_float (#6408)
- raise error on invalid sort_by argument (#6382)
- take offset into account with str.explode (#6384)
- Return empty batch for pl.read_csv_batched().next_… (#6381)
- implement ser/de for StructChunked (#6359)
- series of empty structs (#6347)
- don't cast nulls before trying normal cast (#6339)
- expand all nested wildcards in functions (#6334)
- fix groupby rolling by_key if groups are empty (#6333)
- parse abbrev month name (#6314)
- disallow alias in inline join expressions (#6312)
- feature flag "get_sink" ipc (#6306)
- block proj-pd and pred-pd on swapping rename (#6303)
- convert nested dictionary with i64 keys (#6299)
- fix panic dynamic_groupby on empty dataframe (#6294)
- Parse negative dates with polars parser (#6256)
- Add list inner dtype when printing Series (#6233)
- fix when then otherwise with arity and aggregation… (#6224)
- pass name to value counts in aggregation (#6221)
- don't set fast_explode on list of structs (#6220)
- explode of empty nullable list (#6190)
- fix empty streaming joins (#6149)
- fix streaming joins where the join order has been … (#6143)
- write tz-aware datetimes to csv (#6135)
- Print error message on mmap IPC file only in verbose mode (#6098)
- fix invalid dtype in chunked array after struct cast (#6093)
- don't run cse cache_states if no projections found (#6087)
- Update
read_csv
error message (#6082) - propogate nulls in binary arithmetic/aggregation (#6076)
- deal with unnest schema expansion in projection pd (#6063)
- correct output dtype for cummin/cumsum/cummax (#6062)
- block streaming on literal series/range (#6058)
- ndjson struct inference (#6049)
- deal with empty structs (#6039)
- fix aggregation that filters out all data (#6036)
- fix diff overflow (#6033)
- keep column names in is_null/is_not_null (#6032)
- keep name when sorting categorical in lexial order (#6029)
- properly set null anyvalue if categorical is neste… (#6025)
- make weekday tz-aware (#5989)
- fix categorical in struct anyvalue issue (#5987)
- fix invalid boolean simplification (#5976)
- allow empty sort on any dtype (#5975)
- properly deal with categoricals in streaming queries (#5974)
- don't panic on ignored context (#5958)
- don't allow named expression in arr.eval (#5957)
- fix panic in join expressions (#5954)
- block ordered predicates before explode (#5951)
- adhere to schema in arr.eval of empty list (#5947)
- fix arrow nested null conversion (#5946)
- allow None in arr.slice length (#5934)
- fix time to duration cast (#5932)
- error on addition with datetime/time (#5931)
- don't create categoricals in streaming (#5926)
- object filter should keep single chunk (#5913)
- csv, read escaped "" as missing (#5912)
- fix pivot of signed integers (#5909)
- fix latest oob in streaming convertion (#5902)
- fix
date + duration
offsets outside of nanosecond datetime bounds (#5889) - adapt k to len in topk (#5888)
🛠️ Other improvements
- propagate error in date_range with invalid time zone (#6759)
- update arrow to 0.16 (#6748)
- remove unreachable path in write_anyvalue (#6727)
- add groupby_dynamic to docs (#6725)
- chore(rust) disallow chunked datetime with_time_zone on tznaive, remove unnecessary with_time_zone (#6681)
- update Required Rust version to 1.58->1.62 (#6680)
- add test for raising error in apply (#6664)
- Minor documentation fix (#6657)
- Add release flow info to contributing guide (#6480)
- address todo and use regex in tz_aware check (#6479)
- Address chrono deprecation warnings (#6478)
- fix doc comment in ParallelStrategy (#6444)
- move binary to polars-ops (#6401)
- fix a typo in csv read example (#6389)
- remove roundtrip to builder (#6383)
- update rustc to 2023-01-19 (#6341)
- run cse optimization only if joins and caches… (#6337)
- update base64 requirement from 0.13 to 0.21 (#6249)
- Remove benches and criterion dependency (#6273)
- update chrono-tz requirement from 0.6 to 0.8 (#6255)
- Enable Dependabot (#5036)
- Add missing feature attributes for csv-file (#6229)
- don't set aggregated flag on null propagated aggregation. (#6191)
- Revert "Use auto_doc_cfg" (#6164)
- remove vertical take (#6112)
- add single threaded sort internally (#6103)
- mark
from_chunks
as unsafe (#6094) - replace exact instances of Option/Result combinators (#6088)
- ensure reverse indices exist in global string cache (#5970)
- refactored describe (#5922)
- don't decode into utf8 (#5910)
- remove unused deps (#5903)
Thank you to all our contributors for making this release possible!
@2-5, @AnatolyBuga, @ChayimFriedman2, @MarceColl, @MarcoGorelli, @MatveyF, @abalkin, @alexander-beedie, @c-peters, @cannero, @chitralverma, @cojmeister, @dannyvankooten, @dependabot, @dependabot[bot], @flowlight0, @gab23r, @gam-phon, @ghuls, @gitkwr, @huitseeker, @jgmartin, @jjerphan, @johngunerli, @josh, @jvanbuel, @n8henrie, @ozgrakkurt, @papparapa, @phaile2, @plaflamme, @rben01, @ritchie46, @romanovacca, @ropoctl, @sorhawell, @stinodego, @universalmind303, @winding-lines, @yuntai and @zundertj