Releases: pola-rs/polars
Python Polars 0.19.5
🚀 Performance improvements
🐞 Bug fixes
- revert invalid runtime check (#11363)
- more cloud urls (#11361)
- ensure cloud globbing can deal with spaces (#11360)
- recognize more cloud urls (#11357)
🛠️ Other improvements
- Disable version warning banner for now (#11359)
- Fix error message reference to
infer_schema_length
(#11358) - Mark some tests as slow (#11350)
- improve parametric tests for group_by_rolling by skipping overflowing cases (#11286)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @jonashaag, @orlp, @ritchie46 and @stinodego
Python Polars 0.19.4
🏆 Highlights
- support 'hive partitioning' aware readers (#11284)
- natively support reading parquet for aws, gcp and azure (#11210)
- Add support for Iceberg (#10375)
- The great expressification by @reswqa (#11320, #11344, #11313, #11257, #11288, #11275, #11197, #11167, #11155)
⚠️ Deprecations
- Add
disable_string_cache
(#11020)
🚀 Performance improvements
- improve dynamic_groupby_iter (#11341)
- improve and fix rolling windows by linear scanning (#11326)
- faster init from
pydantic
models that have a small number of fields, and support direct init from SQLModel data (often used with FastAPI) (#11263) - improve outer join materialization (#11241)
- use ryu and itoa for primitive serialization (#11193)
- use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
- Using cache for str.contains regex compilation (#11183)
✨ Enhancements
- introduce 'label' instead of 'truncate' in group_by_dynamic, which can take
label='right'
(#11337) - Expressify list.shift (#11320)
- top_k and bottom_k supports pass an expr (#11344)
- add "pyxlsb" engine support to
read_excel
(for excel binary workbook files) (#11248) - support 'hive partitioning' aware readers (#11284)
- str.strip_chars supports take an expr argument (#11313)
- sample n can take an expr (#11257)
- Add
disable_string_cache
(#11020) - clip supports expr arguments and physical numeric dtype (#11288)
- Introduce list.drop_nulls (#11272)
- str.splitn and split_exact can take an expr argument by (#11275)
- introduce ambiguous option for dt.round (#11269)
- Adds
NULLIF
andCOALESCE
SQL functions (#11124) - better
tree-formatting
representation (#11176) - natively support reading parquet for aws, gcp and azure (#11210)
- Expressify str.strip_prefix & suffix (#11197)
- Add support for Iceberg (#10375)
- list.join's separator can be expression (#11167)
- argument every of datetime.truncate can be expression (#11155)
🐞 Bug fixes
- Fix
Series.__contains__
for None values and implementis_in
for null Series (#11345) - don't panic on multi-nodes in streaming conversion (#11343)
- ensure trailing quote is written for temporal data when CSV
quote_style
is non-numeric (#11328) - clarify
has_validity
docstring and fix several cases where the presence of a bitmask was used to incorrectly infer the existence ofnull
values (#11319) - fix empty Series construction edge-case with Struct dtype (#11301)
- DataFrame init from
collections.namedtuple
values (#11314) - Exclude functools wrapper frames in
find_stacklevel
(#11292) - set partitions independent of thread pool (#11304)
- address VSCode issue with autocomplete on
selector
expressions in editor/console (#11235) - consume duplicates in rolling_by window (#11261)
- handle url encoded paths in objectpath creation (#11240)
- use POOL when writing csv (#11222)
- don't conflate saved
Config
JSON string with file path (#11098) - is_in for bool evaluate has_false incorrectly (#11217)
- improve handling of database drivers that can return arrow data (#11201)
- fix nullable filter mask in group_by (#11207)
- replace n-th in filter (#11206)
- fix translation of Series-nested datetime/date values for
scan_pyarrow
predicates (#11195) - address unexpected expression name from use of unary
-
or+
operators (#11158) - impl hash for more function expr (#11182)
- list.join's separator can be expression (#11167)
- Add some missing expr type hint for series (#11171)
- consistently use negative every as the default for offset in group_by_dynamic (#11164)
- Make pl.struct serializable (#11169)
- only raise on actual parameter collision when "dtypes" specified in
read_excel
"read_csv_options" (#11162) - propagate null value for str/binary starts/ends_with and contains (#11141)
🛠️ Other improvements
- simplify/clarify group_by_dynamic examples (#11335)
- tighten
assert_frame_equal
for LazyFrames (don't collect until after the schema has been checked) (#11331) - unify display for namespaced function expr (#11342)
- add lazy pivot example (#11325)
- Use
GITHUB_TOKEN
to get contributor information for docs (#11321) - Enable version warning banner (#11322)
- cross-reference
null_count
fromhas_validity
(clarifies the correct way to check for nulls) (#11323) - Pin pydantic in dev requirements
<2.4.0
(#11312) - remove default auto-explode for map_many_private (#11270)
- Add type alias
IntoExprColumn
(#11296) - update a few dependencies (#11283)
- Properly skip ADBC test (#11282)
- Fix some minor Makefile issues (#11276)
- update sponsors (#11271)
- parametric tests for group_by_rolling (#11262)
- Make some list function expr non-anonymous (#11230)
- Mention the
performant
feature only once (#11223) - remove unneeded indirection (#11233)
- remove unneeded mutex around object-store (#11224)
- clarify every/period/offset in group_by_dynamic (#11175)
- Fix
read_database
batch_size
docstring (#11132)
Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @billylanchantin, @jonashaag, @mcrumiller, @orlp, @ptiza, @reswqa, @ritchie46, @stinodego and @universalmind303
Rust Polars 0.33.0
🏆 Highlights
- implementing sink_csv for LazyFrame (#10682)
💥 Breaking changes
- empty product returns identity (#10842)
- return
f64
forrank
whenmethod="average"
(#10734) - Rename
groupby
togroup_by
(#10654) - Read/write support for IPC streams in DataFrames (#10606)
- Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - remove fixed_seed and add pl.set_random_seed (#10388)
- Make
arange
an alias forint_range
(#9983) date_range
/time_range
no longer return aList
type (#10526)- Remove various functionalities deprecated before
0.18
(#10527)
⚠️ Deprecations
- Rename
is_first/last
tois_first/last_distinct
(#11130) - Rename
count_match
tocount_matches
(#11028) - Rename
strip
tostrip_chars
(#10813) - Add
datetime_range
expression function (#10213) - Rename
Series/Expr.rolling_apply
torolling_map
(#10750)
🚀 Performance improvements
- improve performance of fast projection (#10945)
- parse time zones outside of downcast_iter() in replace_time_zone (#10713)
- use binary abstraction for atan2 (#10588)
- use binary abstraction in pow (#10562)
✨ Enhancements
- Expressify str.split argument. (#11117)
- Expressify argument of binary contains (#11091)
- dt.offset_by supports broadcasting lhs (#11095)
- Expressify argument of binary starts_with and ends_with (#11076)
- json_extract supports extract static and string value to list dtype (#11057)
- add quote_style="never" option for
write_csv
(#11015) - add support for nextest (#11048)
- Add
literal
for str count_match (#10996) - More dtypes supports cast to list (#11025)
- ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
- Add
strip_prefix
andstrip_suffix
to the string namespace (#10958) - Add
datetime_range
expression function (#10213) - add proper cache for Regex compilation (#10934)
- implementation of
array_to_string
(#10839) - apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
- accept expr in
str.count_match
(#10900) - accept expressions in
.offset_by
(#9967) - implement drop as special case of
select
(#10885) - Supports is_last operation (#10760)
- activate cse for group_by (again) (#10749)
- add pairwise float sum implementation (#10756)
- implementing sink_csv for LazyFrame (#10682)
- Supports series unique & arg_unique & n_unique for list (#10743)
- repeat_by should also support broadcasting of LHS (#10735)
- deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
- is_first also supports numeric list type. (#10727)
- improve slice pushdown in unions (#10723)
- Support min and max strategy for binary & str columns fill null (#10673)
- support broadcasting in list set operations (#10668)
- add
truncate_ragged_lines
(#10660) - supports cast to list (#10623)
- Rename
groupby
togroup_by
(#10654) - preserve whitespace in notebook output (#10644)
- Read/write support for IPC streams in DataFrames (#10606)
- improve binary (arity) generics (#10622)
- propagate null is in
is_in
and more generic array construction (#10614) - Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - frame-level
cast
support (#10504) - Add failed column to cast exception (#10507)
- Make
arange
an alias forint_range
(#9983) date_range
/time_range
no longer return aList
type (#10526)- Remove various functionalities deprecated before
0.18
(#10527)
🐞 Bug fixes
- Correct hash and fmt for struct expr (#11119)
- enforce sortedness of by argument in rolling_* functions (#11002)
- Filter on empty objectChunked should not throw error (#11073)
- ensure null_count statistics accounts for null array (#11070)
- toggle off cse if ext_context is used (#11051)
- Correct field dtype of string concat (#11055)
- pushed-down expr should be considered when evaluating ExternalContext (#11023)
- fix rolling_* functions when "by" has nanosecond resolution (#11005)
- Don't reuse member for Selector::Add (#11026)
- fix the construction of List<Null> (#10969)
- allow singular null in regex pattern (#10948)
- compute length of null array in explode (#10946)
- Allow exactly one value in start/end for
int_range
(#10914) - count was falsy tagged as cse in group by (#10917)
- Retain original dtype when deserializing an empty list (#10893)
- CSE don't accept opaque functions (#10905)
- Make
int_range(s)
exclusive on the upper bound when step is negative (#10898) - fix conversion from decimal to float (#10776)
- Add broadcasting for list comparisons (#10857)
- don't overflow length before checking limit (#10883)
- fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
- tag amortized iter unsafe and add safe alternatives (#10881)
- use pool in dataframe arithmetic (#10864)
- remove debug
println!
from datetime fn (#10862) - repair polars_err string interpolation (#10863)
- make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
- empty product returns identity (#10842)
- never panic in hash/equality doesn't hold in cse (#10836)
- Improve bound checks on temporal ranges (#10837)
- var/std behavior around few elements (#10828)
- Fix divided by zero error when read empty csv in streaming mode (#10819)
- fix equality of quantile aggregation node (#10816)
- Reading an only-header csv file in streaming mode should not panic (#10810)
- get_single_leaf can't handle Expr::Count (#10790)
- string to decimal parsing (#10712)
- support groupby literal in streaming (#10771)
ORDER BY
on unselected columns (#10752)- Fix is_in cannot cast list type for float (#10769)
- fix unicode truncation in json parsing (#10761)
- Error message of list unique should not display inner type (#10748)
- create
chunks_mut
entry in vtable (#10745) - Prevent panic on sample_n with replacement from empty df (#10731)
- only preserve sortedness flag in replace_time_zone when safe (#10738)
- Error on
value_counts
on column named"counts"
(#10737) - Build Series from empty Series vector (#10558)
- return
f64
forrank
whenmethod="average"
(#10734) - Keep min/max and arg_min/arg_max consistent. (#10716)
- Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
- Cast small int type when scan csv in streaming mode. (#10679)
- Reused input series in rolling_apply should not be orderly (#10694)
- re-sort buffer when update window swap the whole buffer (#10696)
- Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
- Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
AllHorizontal
format string (#10658)- List<null> chunked builder should take care of series name (#10642)
- respect 'ignore_errors=False' in csv parser (#10641)
- fix rename + projection pushdown (#10624)
- fix int/float downcast in
is_in
(#10620) - Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - Fix serialization for categorical chunked. (#10609)
- join_asof missing
tolerance
implementation, address edge-cases (#10482) - Take input_schema to create physical expr for Selection (#10571)
- fix serialization of empty lists (#10563)
- Clear window cache after evaluate predication expr (#10505)
- Parsing regex col in Expr::Columns (#10551)
- sanitize column naming in boolean ops (#10531)
- fix build for wasm (#10536)
- remove fixed_seed and add pl.set_random_seed (#10388)
- fix build for wasm (#9502)
- rollback cse in groupby: python 0.18.15 (#10491)
🛠️ Other improvements
- Removed duplicated example (#11109)
- Add CODEOWNERS for docs folder (#11107)
- Refactor starts_with and ends_with for string (#11085)
- Integrate user guide (#11089)
- remove feature gate join/groupby in polars-core (#10965)
- Add Documentation issue type (#11042)
- complete intra-docs in api documentation (#11007)
- genericize take implementation (#10976)
- genericize PolarsDataType (#10952)
- enhance internal crates readme with reference to main crate (#10928)
- Add
Duration
method for checking full days (#10850) - apply with_name in more places (#10899)
- never compare opaque functions (#10906)
- eliminate repetition in utf8 datetime functions (#10860)
- Fix issue templates for bug reports (#10896)
- remove
LocalProjection
(#10886) - request verbose logging output of minimal reproducable examples (#10882)
- Reorganize
range
expression module (#10871) - introduce with_name for Series/ChunkedArray (#10859)
- Further refactor temporal range functions (#10844)
- Refactor
range
related functions (#10830) - Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
- Fix some broken links / formatting (#10772)
- Improve docs for
polars-lazy
(#10729) - update rustc nightly_2023-08-26 (#10467)
- default to rust native flate2 lib (#10733)
- Clear GitHub Actions caches weekly (#10715)
- move 'is_in' to polars-ops (#10645)
- Clean up schema calculation for
date_range
(#10653) - remove unused apply functions and add fallible generic apply functions (#10621)
- Enforce up-to-date
Cargo.lock
(#10555) - make binary chunkedarray functions DRY (#10607)
- bump MSRV to 1.65 (#10568)
- genericize chunk implementation (#10506)
- use ChunkArray::(try_)from_chunk_iter (#10497)
- add VSCode rust-analyzer settings (#10498)
- Update URLs for dev documentation (#10495)
- Update features for latest
flate2
release (#10492)
Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @an...
Python Polars 0.19.3
🏆 Highlights
- Polars plugins (#10924)
⚠️ Deprecations
- Rename
is_first/last
tois_first/last_distinct
(#11130) - Rename
count_match
tocount_matches
(#11028) - Rename
strip
tostrip_chars
(#10813) - Add
datetime_range
expression function (#10213)
🚀 Performance improvements
- optimize
_unpack_schema()
(#11080) - optimize
polars.utils._post_apply_columns()
(#11086) - optimize
polars.utils._post_apply_columns()
(#11041) - optimize
_unpack_schema()
(#10960) - improve performance of fast projection (#10945)
✨ Enhancements
- Expressify str.split argument. (#11117)
- Polars plugins (#10924)
- better async_collect (#10912)
- Expressify argument of binary contains (#11091)
- dt.offset_by supports broadcasting lhs (#11095)
- Expressify argument of binary starts_with and ends_with (#11076)
- add OpenOffice spreadsheet support via new
pl.read_ods
function (#11011) - json_extract supports extract static and string value to list dtype (#11057)
- add quote_style="never" option for
write_csv
(#11015) - Add
literal
for str count_match (#10996) - More dtypes supports cast to list (#11025)
- Add
strip_prefix
andstrip_suffix
to the string namespace (#10958) - improve
read_excel
table data identification (#10953) - Add
from_dataframe
fast path and improve typing (#10979) - add
openpyxl
as a new/optional engine forread_excel
(#6183) - Add
datetime_range
expression function (#10213)
🐞 Bug fixes
- Correct hash and fmt for struct expr (#11119)
- enforce sortedness of by argument in rolling_* functions (#11002)
- Make
Series.__getitem__
raise an IndexError (#11061) - Filter on empty objectChunked should not throw error (#11073)
- ensure null_count statistics accounts for null array (#11070)
- toggle off cse if ext_context is used (#11051)
- Correct field dtype of string concat (#11055)
- fix partial schema init with
read_dicts
and reduce latency of small-frame creation (#11047) - pushed-down expr should be considered when evaluating ExternalContext (#11023)
- fix rolling_* functions when "by" has nanosecond resolution (#11005)
- Don't reuse member for Selector::Add (#11026)
- ensure
series_equal
properly accounts for dtypes when strict=True (#11012) - fix the construction of List<Null> (#10969)
- write_excel "hidden_columns" parameter fails when taking a selector (#10987)
- allow singular null in regex pattern (#10948)
- compute length of null array in explode (#10946)
🛠️ Other improvements
- remove low contrast coloring from visited links (#11133)
- Ignore matplotlib warning (#11129)
- Do not run user guide examples by default (#11128)
- Ignore matplotlib mypy warnings (#11126)
- Add deprecation message in groupby docs (#11121)
- Removed duplicated example (#11109)
- Add CODEOWNERS for docs folder (#11107)
- Refactor starts_with and ends_with for string (#11085)
- Integrate user guide (#11089)
- remove mentions of the deprecated random module (#11087)
- simplify
SchemaDefinition
type alias (#11077) - put
fetch
explanation in a "notes" block to better highlight it in the docs (#11058) - remove feature gate join/groupby in polars-core (#10965)
- Add Documentation issue type (#11042)
- warn that "by" argument must be sorted for results to be correct in rolling_* functions (#11013)
- Adds missing method refs in LazyDataFrame API docs (#11027)
- Add lint for boolean trap (#11010)
- Add private LazyFrame method for setting sink optimizations (#10988)
- Enable a few more ruff lints (#10998)
- document polars string duration language in temporal range functions (#10978)
- Additional tests for interchange
get_data_buffer
(#10966) - genericize PolarsDataType (#10952)
- Document that filter, drop_nulls, left join preserve order (#10955)
- add note about adbc flight sql driver (#10949)
- Revert
pydantic >= 2.0.0
requirement (#10944) - note that pl.duration represents fixed durations, point to offset_by for non-fixed (#10927)
- Test S3 functionality using moto server (#10164)
Thank you to all our contributors for making this release possible!
@I8dNLo, @KacpiW, @MarcoGorelli, @Object905, @Qqwy, @TNieuwdorp, @alexander-beedie, @antoniocali, @bvanelli, @cjackal, @henrikig, @jakob-keller, @mrogowski11, @nameexhaustion, @orlp, @reswqa, @ritchie46, @s-banach, @stinodego, @svaningelgem and @thomasjpfan
Python Polars 0.19.2
🏆 Highlights
- Add syntactic sugar for
col("foo")
->col.foo
(#10874)
⚠️ Deprecations
- Rename
Expr.is_not()
tonot_()
(#10838)
✨ Enhancements
- allow individual
Config
options to be easily reset to their default value (#10922) - accept expr in
str.count_match
(#10900) - allow additional
glimpse
customisation, fix strings repr (#10895) - accept expressions in
.offset_by
(#9967) - support schema overrides for frames created from databases (#10884)
- Add syntactic sugar for
col("foo")
->col.foo
(#10874) - support negative indexing in set_at_idx (#10891)
- implement drop as special case of
select
(#10885) - raise a more helpful error when non-query statements passed to
read_database
(#10851)
🐞 Bug fixes
- Allow exactly one value in start/end for
int_range
(#10914) - fix(rust, python): raise error when function didn't receive any inputs (#8635)
- count was falsy tagged as cse in group by (#10917)
- CSE don't accept opaque functions (#10905)
- Make
int_range(s)
exclusive on the upper bound when step is negative (#10898) - don't overflow length before checking limit (#10883)
- fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
- use pool in dataframe arithmetic (#10864)
- repair polars_err string interpolation (#10863)
- make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
🛠️ Other improvements
- Set minimum version for pydantic to
2.0.0
(#10923) - fix and clarify docs for
Expr.map_elements
(#10647) - fix rendering of bullet points in dt.round (#10911)
- add test for 10875 (#10913)
- apply with_name in more places (#10899)
- never compare opaque functions (#10906)
- eliminate repetition in utf8 datetime functions (#10860)
- Fix issue templates for bug reports (#10896)
- request verbose logging output of minimal reproducable examples (#10882)
- add a note about
read_database
connection/cursor behaviour (#10873) - introduce with_name for Series/ChunkedArray (#10859)
Thank you to all our contributors for making this release possible!
@Barsik-sus, @MarcoGorelli, @alexander-beedie, @c-peters, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @jeroenjanssens, @orlp, @ritchie46, @stinodego and @wdoppenberg
Python Polars 0.19.1
💥 Breaking changes
- empty product returns identity and product ignores nulls (#10842)
✨ Enhancements
- add
binary
,boolean
,categorical
,date
,object
, andtime
selectors (#10806) - Supports is_last operation (#10760)
- minor typing improvement for DataFrame.__iter__ (#10825)
- Add custom error for
allow_copy=False
(#10822)
🐞 Bug fixes
- empty product returns identity (#10842)
- never panic in hash/equality doesn't hold in cse (#10836)
- Improve bound checks on temporal ranges (#10837)
- var/std behavior around few elements (#10828)
- Fix divided by zero error when read empty csv in streaming mode (#10819)
- behaviour of
reversed(df)
(#10823) - fix equality of quantile aggregation node (#10816)
- Reading an only-header csv file in streaming mode should not panic (#10810)
🛠️ Other improvements
- Refactor
range
related functions (#10830) - map-related docstring updates (#10779)
- Move sink tests to streaming module (#10821)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @orlp, @reswqa, @ritchie46 and @stinodego
Python Polars 0.19.0
An upgrade guide is available on our website.
🏆 Highlights
- implementing sink_csv for LazyFrame (#10682)
- Support
DataFrame
init from queries against users' existing database connections (#10649) - Rename
groupby
togroup_by
(#10656)
💥 Breaking changes
- return
f64
forrank
whenmethod="average"
(#10734) - Update a lot of error types (#10637)
- Remove deprecated behavior from vertical aggregations (#10602)
- Read/write support for IPC streams in DataFrames (#10606)
- Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - Improve consistency of parsing expression input (#9512)
- allow
from_arrow
to take a generator of RecordBatches, change error type toTypeError
(#10529) - remove fixed_seed and add pl.set_random_seed (#10388)
- Make
arange
an alias forint_range
(#9983) date_range
/time_range
no longer return aList
type (#10526)- Remove various functionalities deprecated before
0.18
(#10527) - Improve some error types and messages (#10470)
⚠️ Deprecations
- Rename
map
tomap_batches
(#10801) - Rename
GroupBy.apply
tomap_groups
(#10799) - Rename
DataFrame.apply
tomap_rows
(#10797) - Rename
Series/Expr.rolling_apply
torolling_map
(#10750) - Rename
Series/Expr.apply
tomap_elements
(#10678) - Rename
groupby
togroup_by
(#10656) - Deprecate some parameters of
cut
/qcut
(#10484)
🚀 Performance improvements
- parse time zones outside of downcast_iter() in replace_time_zone (#10713)
- use binary abstraction for atan2 (#10588)
- use binary abstraction in pow (#10562)
✨ Enhancements
- activate cse for group_by (again) (#10749)
- implementing sink_csv for LazyFrame (#10682)
- Supports series unique & arg_unique & n_unique for list (#10743)
- repeat_by should also support broadcasting of LHS (#10735)
- deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
- is_first also supports numeric list type. (#10727)
- improve slice pushdown in unions (#10723)
- Explicitly implement
Protocol
for interchange classes (#10688) - Support min and max strategy for binary & str columns fill null (#10673)
- support broadcasting in list set operations (#10668)
- csv: add schema argument (#10665)
- Support
DataFrame
init from queries against users' existing database connections (#10649) - add
truncate_ragged_lines
(#10660) - supports cast to list (#10623)
- Update a lot of error types (#10637)
- preserve whitespace in notebook output (#10644)
- Remove deprecated behavior from vertical aggregations (#10602)
- support selector usage in
write_excel
arguments (#10589) - Add
LazyFrame.collect_async
andpl.collect_all_async
(#10616) - Read/write support for IPC streams in DataFrames (#10606)
- propagate null is in
is_in
and more generic array construction (#10614) - Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - frame-level
cast
support (#10504) - Improve consistency of parsing expression input (#9512)
- Add failed column to cast exception (#10507)
- allow
from_arrow
to take a generator of RecordBatches, change error type toTypeError
(#10529) - Remove deprecated
get_idx_type
- useget_index_type
instead (#10556) - Make
arange
an alias forint_range
(#9983) date_range
/time_range
no longer return aList
type (#10526)- Remove various functionalities deprecated before
0.18
(#10527) - Improve some error types and messages (#10470)
- suggest str.to_datetime instead of apply and stdlib strptime (#10266)
🐞 Bug fixes
- get_single_leaf can't handle Expr::Count (#10790)
- support groupby literal in streaming (#10771)
ORDER BY
on unselected columns (#10752)- Fix is_in cannot cast list type for float (#10769)
- whitespace CSS in Notebook HTML updated to use
pre-wrap
instead ofpre
(#10739) - only preserve sortedness flag in replace_time_zone when safe (#10738)
- Error on
value_counts
on column named"counts"
(#10737) - return
f64
forrank
whenmethod="average"
(#10734) - Keep min/max and arg_min/arg_max consistent. (#10716)
- use time zone from dtype to overwrite output time zone when initialising Series (#10689)
- Cast small int type when scan csv in streaming mode. (#10679)
- raise exception with invalid
on
arg type for join_asof (#10690) - Reused input series in rolling_apply should not be orderly (#10694)
- re-sort buffer when update window swap the whole buffer (#10696)
- Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
- Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
- Correctly handle time zones in
write_delta
(#10633) - fix apply for empty series in threading mode (#10651)
- respect 'ignore_errors=False' in csv parser (#10641)
- fix rename + projection pushdown (#10624)
- fix int/float downcast in
is_in
(#10620) - Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - Fix serialization for categorical chunked. (#10609)
- Take input_schema to create physical expr for Selection (#10571)
- Clear window cache after evaluate predication expr (#10505)
- Parsing regex col in Expr::Columns (#10551)
- sanitize column naming in boolean ops (#10531)
- Fix
write_delta
with schema indelta_write_options
(#10541) - remove fixed_seed and add pl.set_random_seed (#10388)
- respect
pl.Config
options relating to shape, column names, and types when rendering HTML (#10449)
🛠️ Other improvements
- update cargo.lock (#10800)
- Create
.venv
in repo root (#10789) - refactored
write_database
unit tests to properly separate concerns (#10773) - Fix some broken links / formatting (#10772)
- Document chained when-then behaviour more prominently (#10759)
- Fix test failing due to new
adbc
release (#10763) - Unpin
connectorx
and bump other Python dependencies (#10753) - add note to
testing
docs about module import (#10741) - Clear GitHub Actions caches weekly (#10715)
- Update for new pyarrow
13.0.0
behavior (#10691) - Fix minor issue with
sink_parquet
docs (#10669) - Remove
deprecate_renamed_methods
util (#10537) - add "see also" entries to ne/eq_missing and update related examples (#10667)
- fix potential memory leak from usage of
inspect.currentframe
(#10630) - give more relevant example for polars.apply (#10631)
- Bump ruff and enable new setting (#10626)
- Add docstrings for
Expr.meta
namespace (#10617) - Enforce up-to-date
Cargo.lock
(#10555) - deprecate DataFrame.replace (#10600)
- ensure that
make requirements
fully refreshes unpinned packages/deps (#10591) - fix out-of-date explain default parameter (#10566)
- Fix
expr_dispatch
decorator to work on methods with decorators (#10549) - Fix link to source code (#10542)
- Add title to index page (#10539)
- Disable SIM108 lint (#10519)
- Keep versioned docs (#10500)
- switch to
pyo3/maturin-action
(#10503) - Update URLs for dev documentation (#10495)
- Skip failing test (#10496)
- Add version switcher to API reference (#10488)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Object905, @OndrejSlamecka, @SeanTroyUWO, @VasanthakumarV, @alexander-beedie, @aminalaee, @braaannigan, @c-peters, @ion-elgreco, @lorepozo, @marki259, @mcrumiller, @messense, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @sdamashek, @stinodego, @svaningelgem, @titoeb, @trueb2, @washcycle and @zundertj
Python Polars 0.18.15
🐞 Bug fixes
- rollback cse in groupby: python 0.18.15 (#10491)
🛠️ Other improvements
- Mark import timing check as slow (#10487)
- Gather all streaming tests (#10485)
- Bump
maturin
to version 1.2.1 (#10479)
Thank you to all our contributors for making this release possible!
@ritchie46 and @stinodego
Rust Polars 0.32.0
🏆 Highlights
- common subexpression elemination (#9632)
💥 Breaking changes
- remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)
⚠️ Deprecations
- renaming
approx_unique
asapprox_n_unique
(#10290) - remove/deprecate cache and its logic (#10066)
- Add
date_ranges
/time_ranges
expression functions (#10005)
🚀 Performance improvements
- pre-alloc int_ranges (#10399)
- use hash as CSE Identifier (#10385)
- re-use regex capture allocation (#10302) (#10335)
- don't parallelize literal expressions (#10321)
- fix O(n^2) in sorted check during append (#10241)
- speedup mode on sorted data (#10084)
- speedup boolean apply (#10073)
- shrink alp/lp
~2.5x
(#10039) - Remove fused arithmetic from expressions with literals (#10011)
✨ Enhancements
- quote style option for csv writer (#10422)
- add "raise_if_empty" flag to
read_excel
,read_csv
,scan_csv
, andread_csv_batched
(#10409) - be more permissive on predicate pushdown to left side of left join (#10442)
- add
use_earliest
toto_datetime
/strptime
(#10426) - {any/all}_horizontal to expression architecture (#10412)
- serialize flags (#10140)
- allow unaligned pointers in arrow FFI (#10403)
- add line_terminator option to write_csv (#10373)
- Add
is_local
andto_local
to categorical namespace (#10372) - cse for groupby.agg and reduced cse collisions (#10381)
- re-use regex capture allocation (#10302) (#10335)
- Add
Series.cat.uses_lexical_ordering
(#10325) - improve datetime parsing error message (#10332)
- allow sequential runners in select/with_columns (#10322)
- improve err msg parsing
time
,date
,datetime
(#10298) - Add
str.extract_groups
(#10179) - add extra build profiles (#10268)
- Extend
datetime
expression function with time zone/time unit parameters (#10235) - added gcs to gcp cloud schema in polars-core::cloud #10206. (#10207)
- support writing duration type in json (#10112)
- inline
lit(Series).cast(..)
to ->lit(Series.cast(..))
(#10092) - Move transpose naming to Rust (#10009)
- cse in groupby's (#10062)
- Adds sql
CASE
statement expressions (#10065) - Add
date_ranges
/time_ranges
expression functions (#10005) - comm_subexpr_elim in streaming 'select/with_columns' (#10050)
- common subexpression elemination (#9632)
- Let qcut create evenly spaced probabilities (#9960)
- sorted flag on singletons (#9933)
- maintain sorted flag after partition_by (#9944)
- keep sorted flag in streaming left join (#9932)
- Add cloudpickle for serializing python UDFs (#9921)
🐞 Bug fixes
- Fix incorrect handling of VisitRecursion::Skip. (#10452)
- fix negative decimal parsing (#10444)
- ensure sorted_sink hash equals the default path (#10464)
- fix sum agg (#10459)
- ensure last aggregation deals with default chunk (#10453)
- fix cse input schema (#10450)
- fix list groupby of array dtype (#10408)
- correct AnyValue::hash (#10391)
- finalize cast in partitioned groupby (#10359)
- fix oob in 'last' (#10329)
- fix categorical lexical sort (#10318)
- Fix join validation (#10257)
- Set correct dtype for
.extract_groups()
(#10306) - clear window cache and run windows on proper runners (#10303)
- fix sorted fast path in streaming groupby wrt nulls (#10289)
- fix nan aggregation in groupby (#10287)
- check dtypes of single-column 'by' parameter in asof-join (#10284)
- fix pyo3 link errors on macos (#10256)
- fix empty streaming parquet file (#10252)
- fix logical columns of streaming multi-column sort (#10250)
- fix date/datetime parsing for short inputs with exact=False (#10231)
- correct agg_sum for ChunkedArray. (#10243)
- don't panic in wildcard apply (#10240)
- fix cse profile (#10239)
- correct struct null counts (#10142)
- no cse in groupby until fixed (#10216)
- fix
is_in
on empty series (#10195) - fix cse windows (#10197)
- block predicate pushdown is_in and null producing … (#10194)
- prevent re-ordering of dict keys inside
.apply
(#10172) - initialize fixed null values (#10192)
- ensure window function run partitioned when cse is hit (#10170)
- adjust for null values in str.replace fast path (#10132)
- clear bit settings in list iteration (#10131)
- use row-encoded for struct::is_sorted (#10129)
- fix(rust, python): don't run file-caching in streaming mode (#10117)
- Allow initialize of pl.Array in Dataframe using schema alone (#10100)
- don't panic if masked out values are invalid in temporal kernels (#10114)
- Fix struct get field by index out of bounds error. (#10097)
- fix ub in simd-json (#10093)
- fix invalid access when groupby rolling produces empty sets (#10109)
- respect
null_on_oob=False
inlist.take
when pa… (#10105) - fix is_sorted for structs (#10099)
- add file path to io error in scan_csv (#10076)
- fix false positive in parquet stats evaluation (#10087)
- fix error message from cast-timezone to replace-time-zone (#10089)
- Address
.col(regex).exclude()
operations not executing. (#10025) - fix Boolean::isin(null values) (#10074)
- predicate pushdown #10058 (#10071)
- Fix weighted quantile for 0 weights (#10051)
- fix incorrect state in projection pushdown with joins (#9987)
- don't pass predicates referring to renamed literal… (#9965)
- fix regression in regex expansion (#9952)
- potential SO in csv infer schema (#9950)
- raise on unsupported transpose and object types (#9946)
- Fix as-of join when
by
groups are interleaved (#9938)
🛠️ Other improvements
- fix and run polars-plan tests (#10465)
- Simplify flag methods (#10429)
- match_block_trailing_comma (#10414)
- implement ChunkArray::(try_)from_chunk_iter (#10395)
- add test for 10401 (#10405)
- Bump some dependencies (#10396)
- Move dependency version info to workspace level (#10295)
- patch reedline until fix released (#10382)
- remove wasm-timer dependency (#10347)
- write down invariants of ChunkedArray (#10334)
- fix typo in lib.rs (#10313)
- Exclude examples from workspace default (#10309)
- Update CODEOWNERS (#10261)
- avoid outputting docs of dependencies (#10292)
- Do not keep history in
gh-pages
branch (#10282) - Use workspace package info / organize dependencies section (#10279)
- fix dead links in Rust documentation (#10251)
- Fix
make pre-commit
command (#10205) - Fix
make integration-tests
command (#10202) - Replace "question" issues with link to Stack Overflow (#10230)
- Update dependabot config (#10222)
- Fix LICENSE symlink for moved crates (#10150)
- Re-organize folder structure for Rust crates (#10141)
- update to rustc nightly-2023-07-27 (#10139)
- temporarily turn off fail-fast so that ubuntu tests run (#10133)
- Refactor
when
/then
/otherwise
internals (#9922) - move replace_time_zone to polars-ops (#10078)
- remove unneeded branch (#10082)
- remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)
- fix typo in contribution example (#10038)
- correct example in API reference (#10032)
- add developer contribution examples (#10013)
- Update autolabeler again (#9984)
- fix docs build and add to CI (#9904)
- Minor makeover for Rust Makefile (#9874)
Thank you to all our contributors for making this release possible!
@0xbe7a, @CanglongCl, @JulianCologne, @MarcoGorelli, @OndrejSlamecka, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @TLouf, @alexander-beedie, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @duvenagep, @eltociear, @fsimkovic, @ion-elgreco, @jonashaag, @lfn3, @magarick, @mcrumiller, @orlp, @potzenhotz, @rea1bacon, @reswqa, @rikkaka, @ritchie46, @stinodego, @thomasaarholt, @varunmittal91 and @zundertj
Python Polars 0.18.14
🏆 Highlights
- Native implementation of dataframe interchange protocol (#10267)
⚠️ Deprecations
- Deprecate behavior of list/tuple inputs for
lit
(#10461)
🚀 Performance improvements
- optimise retrieval of values from
df.item
(~4-5x speedup) (#10411) - pre-alloc int_ranges (#10399)
- use hash as CSE Identifier (#10385)
✨ Enhancements
- quote style option for csv writer (#10422)
- add "raise_if_empty" flag to
read_excel
,read_csv
,scan_csv
, andread_csv_batched
(#10409) - add
use_earliest
toto_datetime
/strptime
(#10426) - add new "header_format" option for
write_excel
(#10392) - {any/all}_horizontal to expression architecture (#10412)
- Native implementation of dataframe interchange protocol (#10267)
- allow unaligned pointers in arrow FFI (#10403)
- add line_terminator option to write_csv (#10373)
- add explicit
selector
variants for signed/unsigned integers (#10384) - Add
is_local
andto_local
to categorical namespace (#10372) - enhance
selectors
expansion function, so it can operate on a schema as well as a frame (#10341) - Order percentiles in
describe
(#10378) - cse for groupby.agg and reduced cse collisions (#10381)
- improve take_every(0) exception (#10352)
- add offset and length to get_ptr (#10361)
🐞 Bug fixes
- fix pyarrow write_to_dataset wrt check_not_directory parameter (#10471)
- fix negative decimal parsing (#10444)
- ensure sorted_sink hash equals the default path (#10464)
- address inconsistency in init from square numpy arrays with/without an explicit schema (#10445)
- ensure last aggregation deals with default chunk (#10453)
- fix cse input schema (#10450)
- Fix by argument handling in join_asof (#10447)
- fix potential
OverflowError
in testing asserts with hugeUInt64
diffs (#10437) - Create delta compatible schema during writing (#10165)
- fix list groupby of array dtype (#10408)
- correct AnyValue::hash (#10391)
- finalize cast in partitioned groupby (#10359)
🛠️ Other improvements
- add
vertical_relaxed
example forpl.concat
(#10472) - Run all streaming tests on the same test runner (#10469)
- Organize OOC tests (#10463)
- add test for 10417 (#10420)
- Clean up some
Sphinx
settings (#10400) - add test for 10401 (#10405)
- Address Ruff per file ignores (#10258)
- Small improvement for PySeries.get_buffer (#10363)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @OndrejSlamecka, @alexander-beedie, @c-peters, @cmdlineluser, @drgif, @ion-elgreco, @lfn3, @orlp, @potzenhotz, @rea1bacon, @reswqa, @ritchie46, @stinodego and @zundertj