Releases: pola-rs/polars
Python Polars 0.20.6
🏆 Highlights
- new implementation for
String/Binary
type. (#13748)
⚠️ Deprecations
- Deprecate
dtype_if_empty
parameter forSeries
constructor (#13976)
🚀 Performance improvements
- improve string/binary reverse performance (#14016)
- add "calamine" support to
read_excel
, usingfastexcel
(~8-10x speedup) (#14000) - optimize
DataFrame.describe
by presorting columns (#13822) - elide redundant bound checks. (#13909)
- speedup boolean filter (#13905)
- speedup binview filter (#13902)
- allow python threads in read_ functions (#13886)
- improve binview filter (#13878)
- apply string view GC more conservatively (#13850)
- add optimized BinaryViewArray comparison kernels (#13839)
- lazy cache binview bytes len (#13830)
- fast-path for eager int_range (#13811)
- Optimize
arr.sum
for inner non-null bool (#13800)
✨ Enhancements
- Add
UnstableWarning
for unstable functionality (#13948) - DataFrame supports explode by array column (#13958)
- add "calamine" support to
read_excel
, usingfastexcel
(~8-10x speedup) (#14000) - improve binary formatting (#13981)
- preserve Enum information when going to IPC (#13943)
- support calling
describe
on aLazyFrame
(#13982) - support kwargs in plugin 'field' functions and raise error on unsupported binview layout (#13944)
- support cast decimal to utf8 (#13829)
- add SQL support for
timestamp
precision modifier (#13936) - support negative indexing and expressions for
LEFT
,RIGHT
andSUBSTR
SQL string funcs (#13888) - Introduce
explode
forArrayNameSpace
(#13923) - unify Series/DataFrame
describe
code (#13720) - raise better error message for .dt.time on Date column (#13932)
- List set_operations supports float (#13920)
- Add
ignore_nulls
forarr.join
(#13919) - register 'set_sorted' as batch/elementwise (#13896)
- move Enum/Categorical categories to binview (#13882)
- Add
ignore_nulls
forlist.join
(#13701) - Add
ignore_nulls
forpl.concat_str
(#13877) - Align
int_range
andint_ranges
signatures (#13867) - fix parquet for binview (#13873)
- support mmap for binview in OOC (#13872)
- implement ffi for
binview
(#13871) - Support zero fill null strategy for binary and string columns (#13869)
- allow df.rename and lf.rename to take a renaming function (#13708)
- Implement/fix unary minus operator
-pl.col(...)
(#13776) - extend SQL
EXTRACT
with "century", "millennium", and "timezone" parts (#13634) - fix binview ipc format (#13842)
- add SQL support for
numeric
and/ordecimal
types (#13739) - improve panic message (#13836)
- Expressify
str.zfill
(#13790) - new implementation for
String/Binary
type. (#13748) - Add typing to hvplot plot namespace (#13813)
- Add
nulls_last
forSeries.sort
(#13794) - allow
ftp
URLs, improve URL check (#13781)
🐞 Bug fixes
- count matches on list categorical (#14021)
list.min/max
with empty and/or None elements (#14018)- Make
to_pandas()
work for Dataframe and Series with dtypeObject
(#13910) - raise for
pl.concat(how="align")
when no columns are shared between frames (#13941) - Fix casting from categorical to numeric (#13957)
- read_csv preserve whitespace and newlines (#13934)
- omit implicit 'site' from import-timing test (#14009)
- append decimal with different scale (#13977)
- Use
date_as_object=False
as default forSeries.to_pandas
(just likeDataFrame.to_pandas
) (#13984) - serialize decimal type (#13997)
- check input type for
arr/list.contains
(#13959) - Fix
max_colname_length
formatting inglimpse()
(#13969) - Allow dtype merge when inner dtype is enum (#13938)
- recurse less in streaming shared sinks (#13930)
- ensure order is preserved if streaming from different sources (#13922)
- Fix
is_not_null
for Struct columns (#13921) - convert object-dtyped NumPy str/bytes arrays to pl.String/pl.Binary instead of pl.Object (#13712)
- allow extract of numeric from str AnyValue (#13865)
- single-element .dt.time() and .dt.date() should always preserve sortedness (#13808)
- prune emtpy chunks before set operations (#13898)
- treat null columns as zero in
sum_horizontal
(#13880) - include null count in rolling window validity with
min_periods
(#13863) - Fix interchange protocol for new String type (#13881)
- parquet hybrid RLE encoding did not always align to bit width (#13883)
- Add
ignore_nulls
forlist.join
(#13701) - .dt.time() was panicking for datetimes prior to unix epoch (#13812)
- allow list creation of decimals (#13851)
- ensure kwargs
filter
behaviour matches docstring (expect equivalence witheq
) (#13864) - Implement
abs
for Decimal, error on Date/Time/Datetime (#13821) - rolling nested groups deadlock (#13835)
gather_every
should work on agg context (#13810)- Fix segfault of
is_in
(#13814) - don't panic on full null qcut (#13815)
- validate operator arithmetic with
None
, fixSeries
edge-case (#13780)
📖 Documentation
- Add missing doc entries (#14006)
- add missing len to rst file (#13999)
- Improve structure of user guide (#13951)
- Improve structure of user guide (#13639)
- Introduce ecosystem page in user guide (#13903)
- Mention deltalake write support in README (#13890)
- use proper argument names in the code blocks of api.rst (#13866)
🛠️ Other improvements
- make Enums an actual datatype (#14011)
- omit implicit 'site' from import-timing test (#14009)
- Constructor improvements - part 1 (#14001)
- Add
glimpse
test (#13979) - Move PyO3 ChunkedArray conversion logic into its own module (#13973)
- Fix xdist streaming group (#13974)
- Fix spurious test failures (#13961)
- minor
describe
tidy-up, and slight rewording of some Exception docstrings (#13942) - Fix pip warning filter return code (#13935)
- Minor refactor of PyO3 conversions module (#13929)
- move
filter
topolars-compute
(#13897) - Revert pandas warning filter (#13893)
- Make functions in
expr/general
non-anonymous (#13832) - Fix doctests (#13831)
- Refactor Python release workflow (#13807)
Thank you to all our contributors for making this release possible!
@ByteNybbler, @JulianCologne, @MarcoGorelli, @Wainberg, @alexander-beedie, @c-peters, @dependabot, @dependabot[bot], @edavisau, @flisky, @ion-elgreco, @itamarst, @jacksonthall22, @kstoneriv3, @mcrumiller, @mkucijan, @nameexhaustion, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @stinodego, @taki-mekhalfa, @thomasaarholt and @valorien
Python Polars 0.20.6-rc.1
🏆 Highlights
- new implementation for
String/Binary
type. (#13748)
🚀 Performance improvements
- speedup boolean filter (#13905)
- speedup binview filter (#13902)
- allow python threads in read_ functions (#13886)
- improve binview filter (#13878)
- apply string view GC more conservatively (#13850)
- add optimized BinaryViewArray comparison kernels (#13839)
- lazy cache binview bytes len (#13830)
- fast-path for eager int_range (#13811)
- Optimize
arr.sum
for inner non-null bool (#13800)
✨ Enhancements
- register 'set_sorted' as batch/elementwise (#13896)
- move Enum/Categorical categories to binview (#13882)
- Add
ignore_nulls
forlist.join
(#13701) - Add
ignore_nulls
forpl.concat_str
(#13877) - Align
int_range
andint_ranges
signatures (#13867) - fix parquet for binview (#13873)
- support mmap for binview in OOC (#13872)
- implement ffi for
binview
(#13871) - Support zero fill null strategy for binary and string columns (#13869)
- allow df.rename and lf.rename to take a renaming function (#13708)
- Implement/fix unary minus operator
-pl.col(...)
(#13776) - extend SQL
EXTRACT
with "century", "millennium", and "timezone" parts (#13634) - fix binview ipc format (#13842)
- add SQL support for
numeric
and/ordecimal
types (#13739) - improve panic message (#13836)
- Expressify
str.zfill
(#13790) - new implementation for
String/Binary
type. (#13748) - Add typing to hvplot plot namespace (#13813)
- Add
nulls_last
forSeries.sort
(#13794) - allow
ftp
URLs, improve URL check (#13781)
🐞 Bug fixes
- prune emtpy chunks before set operations (#13898)
- treat null columns as zero in
sum_horizontal
(#13880) - include null count in rolling window validity with
min_periods
(#13863) - Fix interchange protocol for new String type (#13881)
- parquet hybrid RLE encoding did not always align to bit width (#13883)
- Add
ignore_nulls
forlist.join
(#13701) - .dt.time() was panicking for datetimes prior to unix epoch (#13812)
- allow list creation of decimals (#13851)
- ensure kwargs
filter
behaviour matches docstring (expect equivalence witheq
) (#13864) - Implement
abs
for Decimal, error on Date/Time/Datetime (#13821) - rolling nested groups deadlock (#13835)
gather_every
should work on agg context (#13810)- Fix segfault of
is_in
(#13814) - don't panic on full null qcut (#13815)
- validate operator arithmetic with
None
, fixSeries
edge-case (#13780)
📖 Documentation
- Mention deltalake write support in README (#13890)
- use proper argument names in the code blocks of api.rst (#13866)
🛠️ Other improvements
- move
filter
topolars-compute
(#13897) - Revert pandas warning filter (#13893)
- Make functions in
expr/general
non-anonymous (#13832) - Fix doctests (#13831)
- Refactor Python release workflow (#13807)
Thank you to all our contributors for making this release possible!
@ByteNybbler, @MarcoGorelli, @Wainberg, @alexander-beedie, @dependabot, @dependabot[bot], @edavisau, @flisky, @ion-elgreco, @itamarst, @kstoneriv3, @mcrumiller, @mkucijan, @nameexhaustion, @orlp, @reswqa, @ritchie46, @stinodego, @taki-mekhalfa and @thomasaarholt
Python Polars 0.20.5
⚠️ Deprecations
- Deprecate default delimiter value for
str.concat
(#13690) - Rename
pl.count()
topl.len()
(#13719) - Deprecate
dt.with_time_unit
in favor ofcast(pl.Int64).cast(pl.Datetime(time_unit, time_zone))
(#13667)
🚀 Performance improvements
- directly embed data ptr in Buffer (#13744)
✨ Enhancements
- Impl
count_matches
for array namespace (#13675) - Add
nulls_last
forlist/array.sort
(#13795) - convert fixed-offset timezones to respective Etc timezone from time zone database (#13738)
- allow
read_excel
to load from remote http locations (#13753) - Expressify
str.slice
(#13747) - implement binview for polars-row (#13736)
- implement binview for polars-json (#13737)
- add architecture for polars-flavored IPC (#13734)
- implement binview comparison kernels (#13715)
- raise default frame/series repr height from 8 to 10 (#13699)
🐞 Bug fixes
- do not read data for zero-length compressed buffer (#13791)
- Fix the non-null test of
transpose
(#13783) - Raise error instead of panic when joining on wildcard/nth (#13742)
str.concat
correctly ignore single null value (#13751)- Selectors
by_name
andby_dtype
should allow empty list as input (#11024) - Keep Series attributes docstrings when read by Sphinx (#13731)
- fix error message when creating DataFrame from 0-dimensional NumPy array (#13729)
- support corr() for single-column DataFrames (#13728)
- Use
NonZeroUsize
forbatch_size
parameter inwrite_csv/sink_csv/scan_ndjson
(#13726) - error instead of panicking in sql if empty function (#13691)
📖 Documentation
- Fix typo in deprecation message of
with_row_count
(#13793) - Fix incorrect "coming from pandas" syntax (#13767)
- Improve streaming section of the user guide (#13750)
- improve
n_unique
andapprox_n_unique
docs (#13752) - add missing Series.str.find reference (#13717)
- Be more explicit about behaviour in
str.strip_chars
/strip_chars_start
/strip_chars_end
docstrings (#13697) - Add doc example for
datetime_ranges
(#13695) - document %A and %B to get day name and month name (#13678)
🛠️ Other improvements
- Make
pl.duration
non-anonymous (#13762) - Add test for
describe
on Object types (#13689) - Only run bytecode parser CI workflow for Python 3.9/3.10 (#13664)
Thank you to all our contributors for making this release possible!
@29antonioac, @MarcoGorelli, @NedJWestern, @Wainberg, @alexander-beedie, @cgevans, @henryharbeck, @langestefan, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @stinodego and @universalmind303
Python Polars 0.20.4
⚠️ Deprecations
- Fix group keys in
partition_by(as_dict=True)
/GroupBy.__iter__
in some cases (#13646) - Rename
row_count_name
/row_count_offset
parameters in IO functions torow_index_*
(#13563) - Deprecate
dt.datetime
in favor ofdt.replace_time_zone(None)
(#13520) - Rename
with_row_count
towith_row_index
(#13494) - Deprecate
Expr.where
in favor offilter
(#13440) - Allow
drop
with no inputs as a no-op (#13460)
🚀 Performance improvements
- elide parallelism restriction on generic rolling expressions (#13662)
- ensure time groups are parallelized (#13660)
- do not eagerly compute bitcount (#13562)
- optimise SQL engine string concat (#13499)
- Refactor expression parsing logic of predicates/constraints (#13468)
- Represent
Enum
categories as Series (#13434) - remove lifetime requirement from CategoricalChunkedBuilder (#13319)
✨ Enhancements
- write parquet ColumnOrder (#13672)
- Impl
contains
for ArrayNameSpace (#13638) - improve
rolling()
expression formatting (#13657) - Implement
is_between
in Rust (#11945) - Add base
PolarsError
andPolarsWarning
class (#13615) - typing overloads for Series operator methods
ge, gt, ...
(#13167) - Expressify
pattern
ofstr.extract
(#13607) - Impl
join
for ArrayNameSpace (#13586) - add SQL engine support for string cast to
json
(#13624) - add SQL engine support for
EXTRACT
andDATE_PART
(#13603) - Allow
drop
with no inputs as a no-op (#13460) - add SQL engine support for
POSITION
andSTRPOS
(#13585) - additional multi-column support for
pl.<function>
entries (#13336) is_in
support for array dtype (#13559)- add new
str.find
expression, returning the index of a regex pattern or literal substring (#13561) - Impl and dispatch arr.first/last to get (#13536)
- Implement
from_dataframe
natively (interchange protocol) (#10701) - add SQL engine support for
LIKE
andILIKE
pattern matching (#13522) - improve hive partition pruning (#13358) (#13426)
- Add compact syntax for
int_range
starting from 0 (#13530) - don't rechunk by default in lazy scans (#13518)
- Add
cum_count
expression function (#13478) - add SQL engine support for
IF
control flow function (#13491) - add SQL engine support for
MOD
function (#13502) - return datetime for datetime mean & median (#13417)
- add SQL engine support for
CONCAT_WS
string function (#13483) - Allow map_batches to auto-convert output NumPy arrays to Series (#13277)
- add SQL engine support for
RIGHT
andREVERSE
string functions (#13461) - implement
BinaryView
andUtf8View
inpolars-arrow
(#13243) - add SQL engine support for variadic string
CONCAT
function (#13428) - add support for AND in SQL join-clause context (#13242)
- Impl ordering ops for array namespace (#13414)
- add SQL engine support for
REPLACE
string function (#13431) - add SQL engine support for
SIGN
function (#13429) - add SQL engine support for
IFNULL
function (#13432) - additional SQL support for
bytes
,bit
, andhex
literals (#13389)
🐞 Bug fixes
- gather.get schema (#13679)
- Fix group keys in
partition_by(as_dict=True)
/GroupBy.__iter__
in some cases (#13646) - ensure we hit proper cache in nested
rolling
expressions (#13666) - Allow
av_buffer
cast numeric record to temporal type (#13661) - streaming cross join if swapped is hit (#13656)
- Make sure rolling key is projected when process projection (#13622)
- fix schema inference for json (#13637)
- Improve parsing of inputs for Expr dunders (#13635)
- Empty series of AggregatedList should also have list dtype (#13620)
Series.eq_missing
should return an Expr when the input is an Expr (#13628)- fallback to cast kernel if
inline_cast
AnyValue raise (#13595) - Fix formatting in
describe
for precise quantiles (#13593) - fix reverse variable row decoding (#13587)
- Fix
scatter
for null values (#13578) - Fix
cum_count
with regards to start value / null values (#13535) - Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (#13548)
- Treat Python
None
as null value forObject
dtype (#13564) - Fix
scatter
to allow single temporal inputs (#13577) - Fix interchange protocol data buffer dtype (#10787)
Expr.replace
to single value did not replace NULLs (#13551)- improve hive partition pruning (#13358) (#13426)
- fix projection pushdown for new outer join schema (#13527)
- dont raise when partial function is passed to map_elements (#13524)
- improve reading of mixed string/other dtype column data from spreadsheets with
openpyxl
andpyxlsb
engines (#13495) - ensure size-hint of TrueIdxIter is correct (#13508)
- correct 'outer_coalesce' logic in case of duplicate names (#13501)
- raise for out-of-range datetimes in to_datetime/strptime (#13403)
- Fix Series equality for List/Array types (#13477)
- Keep logical type when getting values from list (#13456)
- Handle duplicate/ambiguous inputs for
replace
(#13217) - Handle empty inputs to Enum constructor (#13446)
- Fix
group_by
iteration when grouping by certain selectors (#13437) - Fix
to_pandas
for 0x0 dataframe (#13420) - Fix offsets for numeric types in
from_buffer
(#13398)
📖 Documentation
- Clarify documentation for the
agg_list
argument inExpr.map_batches
(#13625) - fix linking to feature flags in user guide (#13644)
- bring sink_ndjson docstring in line with other sink docstrings (#13636)
- Update
then
andotherwise
docstrings with "strings are parsed as column names" (#13630) - Add
sink_ndjson
to API reference. (#13627) - Improve documentation on broadcasting (#13394)
- Add note about toolchain issue under native Windows (#13590)
- Hint about ruff setting in VSCode (#13421)
- Clarify examples for .transpose() (#13581)
- Add additional
Series
docstring examples (#13558) - Doc example for
read_csv
(#13161) (#13545) - Add more doc examples on how to create an index column (#13532)
- update SQL section of the README (#13529)
- Add note to
int_range
docs for creating an index column (#13516) - add a note to the
read_database_uri
docstring about escaping special characters in the connection string (#13514) - update polars-business > polars-xdt link (#13509)
- Fix various typos, grammar and formatting in docstrings and user guide (#13506)
- Doc examples for
threadpool_size
andget_index_type
(#13496) - Add missing datetime examples to docs (#13487)
- add polars-distance to plugins page (#13454)
- define file-like object in read_parquet docstring (#13463)
- Move
Series.struct.json_encode
to methods in Sphinx autosummary (#13443) - Add missing examples of
series/list.py
(#13423) - show
datetime.date
import in code block (#13419) - clarify documentation for rle and rle_id (#13397)
- use named series in Series.plot example (#13407)
- fix alphabetical order of documentation entries (#13396)
🛠️ Other improvements
- Auto-add 'needs triage' label to bugs (#13671)
- make rolling index column visible to optimizer (#13658)
- Enable new error message lint to improve stack trace display (#13596)
- Add
Documentation
/Build system
sections to the changelog (#13594) - Filter unhelpful messages in
make build
(#13579) - Remove extra line break between checkboxes in GitHub bug report issues (#13576)
- Narrow type hint for
get_index_type
util (#13556) - Fix some test failures/slowdowns (#13504)
- pandas 2.2 compat (#13467)
- Increase timeout for gevent async test (#13448)
- Do not end docstrings with a blank line (#13193)
Thank you to all our contributors for making this release possible!
@Bromeon, @MarcNuebel, @MarcoGorelli, @ShivMunagala, @Wainberg, @aaarrti, @alexander-beedie, @bchalk101, @c-peters, @cgevans, @cmdlineluser, @collinprince, @deanm0000, @hamishs, @henryharbeck, @ion-elgreco, @jcrozum, @mcrumiller, @nameexhaustion, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @s-banach, @shritesh, @stinodego, @tim-stephenson and @wjandrea
Rust Polars 0.36.2
🏆 Highlights
- Add new
Enum
categorical data type which allows a fixed set of categories (#11822)
💥 Breaking changes
- Rename
Utf8
data type toString
(#13224) - Rename
set_at_idx
toscatter
(#12687) - Preserve left and right join keys in outer joins (#12963)
- Implement
dtype
parameter forint_range
on Rust side (#12940) - Update
Expr.count
to ignore null values by default (#12934) - Change
value_counts
resulting column name fromcounts
tocount
(#12506) - Change default
join
behavior with regard to nulls, addjoin_nulls
parameter to keep existing behavior (#12840) - Smaller integer data types for datetime components (#12070)
- Fix
NaN
ordering to make NaNs compare greater than any other float, and equal to themselves (#12721) - Rename
frame_equal
/series_equal
toequals
(#12663) - Rename
not_
expression tonot
on the Rust side (#12587) - Rename
str.json_extract
tostr.json_decode
(#12586) - Rename DataFrame column index methods (#12542)
🚀 Performance improvements
- optimize set bit count (#13317)
- speed up
.dt.truncate
for large numbers of years (#13310) - don't eagerly evaluate error branches (#13311)
- don't needlessly allocate validity in concat/rechunk (#13288)
- add fast path to
count_bits_set_by_offsets
(#13253) - make
.dt.truncate('*mo')
more than 3x faster (#13192) - ensure single expression evaluation for replace (#13147)
- Elide allocation in outer join materialization (#12992)
- Ensure we reduce for
any/all_horizontal
(#12976) - Add fast paths for UTC in
truncate
(#12965) - Improve
rolling_median
algorithm (#12704) - Use fast path for non-null data in new SQL-like null matching (#12874)
- improve
merge_local_rhs_categorical
traversal (#12660) - make values_size estimate correct for sliced arrays (#12658)
- improve parquet utf8 validation (#12655)
- parquet pre-allocate buffer in binary plain encode (#12652)
- optimize dict binary decoding in parquet (#12648)
- ensure we only check the values within bounds (#12633)
- parquet; elide recursion in hot path (#12625)
- improve cov/corr algorithm (#12590)
- apply left side predicate pushdown also to right side on semi join (#12565)
- ensure streaming parquet download remains concurrent
~7x
(#12552) - speed up parquet download of streaming engine (#12544)
✨ Enhancements
- support negative indices in
gather
ingroup_by
context (#13373) - support negative indexing in gather (select context) (#13343)
- support min_periods for temporal rolling aggregations (#13342)
- support
REGEXP
andRLIKE
pattern matching in SQL engine (#13359) - gracefully handle panics in plugins (#13329)
- Implement
unique/n_unique/unique_counts/is_unique/is_duplicated
forNull
series (#13307) - support common variant spelling
STDEV
in the SQL engine (in addition toSTDDEV
) (#13303) - change doc links to new url docs.pola.rs (#13290)
- support horizontal concatenation of LazyFrames (#13139)
- Impl serde for array dtype (#13168)
- dispatch strict_cast via cast (#13255)
- Impl any/all for array type (#13250)
- add cancellable queries (#13178)
- add
offset
parameter togather_every
(#13156) - Support
Array
dtype AnyValue Series construction (#12817) - Allow
step
parameter inint_ranges
to take an expression (#13148) - Implement
count
for DataFrame/LazyFrame (#13153) - Move from GA to more privacy friendly framework (#13155)
- Rename
set_at_idx
toscatter
(#12687) - prune all/any_horizontals with single inputs (#13146)
- ensure we get cleaner logical plans with
any/all_horizontal
(#13144) - Add
str.contains_any
andstr.replace_many
(Aho-Corasick algorithms) (#13073) - Auto-infer credentials from
.aws
folder (#13062) - Support private cloud S3 storage in
scan_parquet
(#13060) - Allow order operators (<,>,>=,<=) on Enum types (#12982)
- Reimplement
replace
expression on the Rust side (#13002) - Use tokio semaphore for concurrency handling (#13026)
- Improve and expressify
hist
(#13014) - Preserve left and right join keys in outer joins (#12963)
- Allow
end
beforestart
indate/time_range
(#12964) - Implement group-tuples for
Null
dtype (#12975) - Implement
dtype
parameter forint_range
on Rust side (#12940) - Cast to an enum from int (#12954)
- Move categorical ordering into dtype (#12911)
- Update
Expr.count
to ignore null values by default (#12934) - Enable partial predicate pushdown past window expressions (#12710)
- Add
str.reverse
(#12878) - Change
value_counts
resulting column name fromcounts
tocount
(#12506) - Implement
std
andvar
forDuration
columns (#12865) - Change default
join
behavior with regard to nulls, addjoin_nulls
parameter to keep existing behavior (#12840) - Preserve base dtype when raising to
UInt
power (#10446) - Smaller integer data types for datetime components (#12070)
- Support SQL subqueries for
JOIN
andFROM
(#12819) - parquet support required deltabyte encoding (#12836)
- Add new
Enum
categorical data type which allows a fixed set of categories (#11822) - support nested null in vstack/append/extend/concat (#12771)
- Improve error messages on attempted Arrow conversions involving incompatible/unknown dtypes (#12421)
- determine mode parallelism depending on current tasks (#12764)
- enable slice push down past
with_columns
(#12742) - implement From<LazyGroupBy> for LazyFrame (#12562)
- Rename
frame_equal
/series_equal
toequals
(#12663) - Join operations on local categoricals (#12657)
- use RLE_DICTIONARY for integers in parquet (#12647)
- Add configuration option for where Polars spills to disk (#12595)
- implement RLE_DICT encoding for utf8/binary columns (reduced parquet file size) (#12623)
- implement 'DeltaByteArray' decoding for parquet (#12602)
- warn if
by
column is not sorted in rolling aggregations (as opposed to raising), add warn_if_unsorted argument (#12398) - struct -> json encoding expression (#12583)
- Implement support for multi-character comments in
read_csv
(#12519) - Implement
LazyFrame.sink_ndjson
(#10786) - improve concurrency parameters (#12567)
- Adds sink_ipc_cloud (#12556)
- Adds sink_ipc_cloud (#11008)
- In explain(), rename PIPELINE to STREAMING so it's clearer what it means (#12547)
🐞 Bug fixes
- range/ranges output name should follow lhs rule (#13369)
- updated Display trait for enum categoricals (#13331)
- nested dtypes: export logical type in plugins (#13325)
- fix invalid dtype setting in array (#13327)
- fix
csv
parser error when commented-out rows precede the header row (#13318) - invalid schema outer join after projection pd (#13315)
- invalid predicate optimization (#13313)
- Account for null values in categorical
unique/n_unique
(#13308) - fix schema when subtracting (#13309)
- broadcasting of unit LHS in string operations (#12737)
- casting list/arr to arr/list shouldn't convert chunks to logical type (#13259)
- sorting categorical lexically bugs on null values (#13271)
- improve replace on categoricals (#13223)
- round trip to JSON and back should preserve Enum type (#13267)
- enable and fix SIMD in polars-compute (#13251)
- match_chunks shouldn't change the dtype (#13222)
- sink_csv deadlock (#13239)
is_in
operator for categoricals (#13205)- Better handle mismatched dtypes in
replace
(#13213) - Fix
replace
fast path by castingold
input to the right data type (#13176) - ndjson nested null schema inference (#13206)
- slice for
NullChunked
no longer force single chunk (#13174) - don't cast to unknown dtypes (#13197)
- Allow casting nullable list to array (#13196)
- maintain old join behavior in window expression (#13179)
- Fix comparison of categoricals (#13137)
- Use the name of the leftmost expression in horizontal operations (#13143)
- any_value should supports cast to boolean (#13125)
- Update offsets of null value correctly for all
from_iter_xxx_trusted_len
(#13132) - fix neq for series cmp str (#13128)
- fix category list builder append series with multiple chunks (#13116)
- repeat_by should not raise if by contains nulls (#13105)
- [csv] raise on single quote char (#13104)
- Raise if scan zstd compressed csv file (#13102)
- Don't check map length if input is literal (#13098)
- use
FunctionExpr
's scalar return type foris_in
(#13091) - rolling_quantile can get incorrect state (#13088)
- Fix off-by-one error in
quantile(method="nearest")
(#13058) - Fix incorrect schema inference on nested columns (#13057)
- Don't raise for
datetime_range
if starting on ambiguous datetime and earliest was specified (#13050) - add cast safety to literals (#12983)
- Parse
json_decode
per max buffer length (#13029) - Parse
00:00
time zone as UTC (#13034) - Fix timeout errors in concurrent downloads (#13023)
- Fix SQL substring indexing (#13016)
- Allow broadcasting in
ranges
(#11900) - Prevent deadlock in
sink_csv
(#12991) - Don't get mutable if buffer is sliced (#12979)
- Dataframes with Decimal columns cannot be pickled (#12955)
- Fix
truncate
when truncating by multiple weeks (#12948) - Fix segfault / memory corruption after plugins return
Err
result (#12953) - Don't panic when
ambiguous
parameter is not Utf8 (#12913) - don't panic on empty df in
merge_sort
(#12923) - Patch
rolling_var
/rolling_std
numerical stability (#12909) - Fix incorrect Int16
min
/max
due to incorrect SIMD mask construction (#12908) - Fix OOB error in list set operations on empty frame (#12845)
- Fix repr of
Expr.gather
(which was still showing deprecated take) (#12864) - Fix
nan_min/max
incorrectly aggregating chunks with addition (#12848) - write only one dict page per row rowgroup (#12831)
- incorrect values from parquet RLE decoding (#12818)
- Handle aggregation for all-NaN groups in
group_by
(#12304) - Use total float ordering ...
Python Polars 0.20.3
🏆 Highlights
- add
plot
namespace (which defers to hvplot) (#13238)
🚀 Performance improvements
- optimize set bit count (#13317)
- speed up
.dt.truncate
for large numbers of years (#13310) - don't eagerly evaluate error branches (#13311)
- don't trigger internal borrwing in numpy memmap (#13304)
- don't needlessly allocate validity in concat/rechunk (#13288)
- add fast path to
count_bits_set_by_offsets
(#13253) - make
.dt.truncate('*mo')
more than 3x faster (#13192)
✨ Enhancements
- support negative indices in
gather
ingroup_by
context (#13373) - validate Enum categories (#13356)
- improve Series/DataFrame init from existing Series/DataFrame objects (#13344)
- support negative indexing in gather (select context) (#13343)
- support min_periods for temporal rolling aggregations (#13342)
- support
REGEXP
andRLIKE
pattern matching in SQL engine (#13359) - emit suggestion for how to replace map_elements sigmoid function with expressions (#13347)
- Support Enum types in interchange protocol (#13368)
- add plot namespace (which defers to hvplot) (#13238)
- gracefully handle panics in plugins (#13329)
- rework
pl.exclude
as a pure selector, allowing other selectors as input (#13301) - Implement
unique/n_unique/unique_counts/is_unique/is_duplicated
forNull
series (#13307) - support common variant spelling
STDEV
in the SQL engine (in addition toSTDDEV
) (#13303) - enhance expression-level
filter
syntax with support for multiple predicates and kwargs (#12689) - change doc links to new url docs.pola.rs (#13290)
- support horizontal concatenation of LazyFrames (#13139)
- Rename
Utf8
data type toString
, keepUtf8
as alias (#13257) - dispatch strict_cast via cast (#13255)
- Impl any/all for array type (#13250)
- add cancellable queries (#13178)
- add
offset
parameter togather_every
(#13156) - Support
Array
dtype AnyValue Series construction (#12817) - Allow
step
parameter inint_ranges
to take an expression (#13148) - make python
map_batches
safer (#13181) - Implement
count
for DataFrame/LazyFrame (#13153)
🐞 Bug fixes
- don't lose track of
ones
andzeros
dtype, improve use withArray
, raise error if dtype invalid (#13326) - updated Display trait for enum categoricals (#13331)
- nested dtypes: export logical type in plugins (#13325)
- fix invalid dtype setting in array (#13327)
- fix
csv
parser error when commented-out rows precede the header row (#13318) - invalid schema outer join after projection pd (#13315)
- invalid predicate optimization (#13313)
- Account for null values in categorical
unique/n_unique
(#13308) - fix schema when subtracting (#13309)
- broadcasting of unit LHS in string operations (#12737)
- sorting categorical lexically bugs on null values (#13271)
- improve replace on categoricals (#13223)
- round trip to JSON and back should preserve Enum type (#13267)
- fix return type hint of list series any/all (#13265)
- sink_csv deadlock (#13239)
- Correctly use
read_parquet
for all binary inputs (#13218) is_in
operator for categoricals (#13205)- Better handle mismatched dtypes in
replace
(#13213) - Fix
replace
fast path by castingold
input to the right data type (#13176) - ndjson nested null schema inference (#13206)
- don't cast to unknown dtypes (#13197)
- maintain old join behavior in window expression (#13179)
🛠️ Other improvements
- reverse condtion order in udfs _expr function (#13348)
- Update release workflow for new upload/download artifact versions (#13355)
- Allow construction of
Series
from memory buffers (#13323) - add 'pipe littering' to 'coming from pandas' section (#13335)
- Refactor functionality related to Series buffers (#13291)
- Restore light/darkmode switch in API reference (#13312)
- Copy Makefile build commands to top level (#13293)
- Fix release flags (#13298)
- Re-enable consortium standard tests (#13296)
- Update CODEOWNERS (#13292)
- Add CPU compatibility check (#13134)
- Change base url of docs/guide to
docs.pola.rs
(#13281) - Fix source link for dev docs (#13279)
- fix return type hint of list series any/all (#13265)
- Fix display of overloaded signatures (#13258)
- clean up bytecode parsing a bit (#13221)
- Add a couple of docstring examples to Series methods (#13244)
- remove unnecessary arg unpacking (#13241)
- update rustc (#13219)
- fix horizontal concatenation documentation (#13141)
- Replace blackdoc by ruff's new docstring formatter (#13182)
- Update ruff & ruff settings (#13126)
- Link to latest object_store docs in api doc (#13180)
- Fix failing test (#13171)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @nameexhaustion, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego
Python Polars 0.20.3-rc.2
🚀 Performance improvements
- don't needlessly allocate validity in concat/rechunk (#13288)
- add fast path to
count_bits_set_by_offsets
(#13253) - make
.dt.truncate('*mo')
more than 3x faster (#13192)
✨ Enhancements
- change doc links to new url docs.pola.rs (#13290)
- support horizontal concatenation of LazyFrames (#13139)
- Rename
Utf8
data type toString
, keepUtf8
as alias (#13257) - dispatch strict_cast via cast (#13255)
- Impl any/all for array type (#13250)
- add cancellable queries (#13178)
- add
offset
parameter togather_every
(#13156) - Support
Array
dtype AnyValue Series construction (#12817) - Allow
step
parameter inint_ranges
to take an expression (#13148) - make python
map_batches
safer (#13181) - Implement
count
for DataFrame/LazyFrame (#13153)
🐞 Bug fixes
- sorting categorical lexically bugs on null values (#13271)
- improve replace on categoricals (#13223)
- round trip to JSON and back should preserve Enum type (#13267)
- fix return type hint of list series any/all (#13265)
- sink_csv deadlock (#13239)
- Correctly use
read_parquet
for all binary inputs (#13218) is_in
operator for categoricals (#13205)- Better handle mismatched dtypes in
replace
(#13213) - Fix
replace
fast path by castingold
input to the right data type (#13176) - ndjson nested null schema inference (#13206)
- don't cast to unknown dtypes (#13197)
- maintain old join behavior in window expression (#13179)
🛠️ Other improvements
- Copy Makefile build commands to top level (#13293)
- Fix release flags (#13298)
- Re-enable consortium standard tests (#13296)
- Update CODEOWNERS (#13292)
- Add CPU compatibility check (#13134)
- Change base url of docs/guide to
docs.pola.rs
(#13281) - Fix source link for dev docs (#13279)
- fix return type hint of list series any/all (#13265)
- Fix display of overloaded signatures (#13258)
- clean up bytecode parsing a bit (#13221)
- Add a couple of docstring examples to Series methods (#13244)
- remove unnecessary arg unpacking (#13241)
- update rustc (#13219)
- fix horizontal concatenation documentation (#13141)
- Replace blackdoc by ruff's new docstring formatter (#13182)
- Update ruff & ruff settings (#13126)
- Link to latest object_store docs in api doc (#13180)
- Fix failing test (#13171)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego
Python Polars 0.20.3-rc.1
🚀 Performance improvements
- add fast path to
count_bits_set_by_offsets
(#13253) - make
.dt.truncate('*mo')
more than 3x faster (#13192)
✨ Enhancements
- Rename
Utf8
data type toString
, keepUtf8
as alias (#13257) - dispatch strict_cast via cast (#13255)
- Impl any/all for array type (#13250)
- add cancellable queries (#13178)
- add
offset
parameter togather_every
(#13156) - Support
Array
dtype AnyValue Series construction (#12817) - Allow
step
parameter inint_ranges
to take an expression (#13148) - make python
map_batches
safer (#13181) - Implement
count
for DataFrame/LazyFrame (#13153)
🐞 Bug fixes
- sorting categorical lexically bugs on null values (#13271)
- improve replace on categoricals (#13223)
- round trip to JSON and back should preserve Enum type (#13267)
- fix return type hint of list series any/all (#13265)
- sink_csv deadlock (#13239)
- Correctly use
read_parquet
for all binary inputs (#13218) is_in
operator for categoricals (#13205)- Better handle mismatched dtypes in
replace
(#13213) - Fix
replace
fast path by castingold
input to the right data type (#13176) - ndjson nested null schema inference (#13206)
- don't cast to unknown dtypes (#13197)
- maintain old join behavior in window expression (#13179)
🛠️ Other improvements
- Add CPU compatibility check (#13134)
- Change base url of docs/guide to
docs.pola.rs
(#13281) - Fix source link for dev docs (#13279)
- fix return type hint of list series any/all (#13265)
- Fix display of overloaded signatures (#13258)
- clean up bytecode parsing a bit (#13221)
- Add a couple of docstring examples to Series methods (#13244)
- remove unnecessary arg unpacking (#13241)
- update rustc (#13219)
- fix horizontal concatenation documentation (#13141)
- Replace blackdoc by ruff's new docstring formatter (#13182)
- Update ruff & ruff settings (#13126)
- Link to latest object_store docs in api doc (#13180)
- Fix failing test (#13171)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @adamreeve, @alexander-beedie, @c-peters, @cjfuller, @dependabot, @dependabot[bot], @mcrumiller, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @robvanmieghem and @stinodego
Python Polars 0.20.2
🚀 Performance improvements
- ensure single expression evaluation for replace (#13147)
- drop the pyarrow conversion path in
iter_rows
; we can now do fully native conversion ~2-3x faster (#13122)
✨ Enhancements
- Move from GA to more privacy friendly framework (#13155)
- prune all/any_horizontals with single inputs (#13146)
- ensure we get cleaner logical plans with
any/all_horizontal
(#13144)
🐞 Bug fixes
- Fix comparison of categoricals (#13137)
- Use the name of the leftmost expression in horizontal operations (#13143)
- any_value should supports cast to boolean (#13125)
- Update offsets of null value correctly for all
from_iter_xxx_trusted_len
(#13132) - fix neq for series cmp str (#13128)
- Fix off-by-one error in
lit
dtype determination for integers (#13129) - fix category list builder append series with multiple chunks (#13116)
🛠️ Other improvements
- Fix release LTS CPU step (#13160)
- Use the name of the leftmost expression in horizontal operations (#13143)
- ensure we get cleaner logical plans with
any/all_horizontal
(#13144) - Minor cleanup of PyO3 bindings (#13067)
- Update
auto_explode
param name toreturns_scalar
(#13119) - Mark whether the current package is the LTS-CPU version (#13068)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @c-peters, @orlp, @reswqa, @ritchie46 and @stinodego
Python Polars 0.20.1
🐞 Bug fixes
- repeat_by should not raise if by contains nulls (#13105)
- [csv] raise on single quote char (#13104)
- Raise if scan zstd compressed csv file (#13102)
- allow timeunit-less dtype in
pl.lit
creation (#12997) - Don't check map length if input is literal (#13098)
- rolling_quantile can get incorrect state (#13088)
🛠️ Other improvements
- Fix column name in
contains_any
example (#13090) - update user-defined-functions for 0.19.x (#13071)
- Fix some links, and make
map_batches
warning more evident (#13081) - Linting updates (#13069)
- take pl.concat out of StringCache context manager in "mismatched string cache" error message (#13076)
- add Enum to dtype list (#13080)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @mcrumiller, @reswqa, @ritchie46 and @stinodego