Releases: pola-rs/polars
Python Polars 0.17.0
⚠️ Breaking changes
- rename some function arguments (#8017)
- don't create duplicate pivot names (#8002)
- Remove deprecated behaviour (#7978)
- rename
toggle_string_cache
toenable_string_cache
(#7970) - change top_k(descending) -> bottom_k (#7969)
- in
sort
,top_k
,sort_by
, andarg_sort_by
, raise ifdescending
is a sequence and its length doesn't match the number of columns to sort by (#7957) - Use RowsError instead of RowsException as recommended … (#6009)
- Use
time_unit
/time_zone
instead oftu
/tz
(#7910) - More ergonomic args for
struct
,concat_str
, andarg_sort_by
(#7308) - swap arguments of
shift_and_fill
and add default… (#7192) - set maintain_order=False for df/lf.unique (#7468)
- Rename pipe arg
func
tofunction
(#7139) - Set some args for
Series
/Expr
methods to keyword-only (#7860)
🚀 Performance improvements
FromParalleIter<Option<str>> for Utf8Chunked
~1.9x
(#8058)- speed up from_par_iter Option<bool>
~2.5x
(#8057) - parallelize numeric ChunkedArray materialization
~2x
. (#8053) - parallelize
into_groups
materialization ~-25%
(#8036) - use a trusted anyvalue builder (#8001)
- numeric grouptuples with nulls hash in single pass
~25%
(#7980) - ensure primitives are parsed first in anyvalue conversion (#7955)
- use perfect hash table for categoricals (#7951)
✨ Enhancements
- multiple sql contexts & optional sql highlighting in cli (#8072)
- implement arg_sort for struct dtype (#8051)
- Support
DataFrame
init from pyarrowRecordBatch
objects, and improve init fromArray
(#8011) - allow
write_ipc
to takefile=None
(returningBytesIO
) (#7997) - Add __array__ method to DataFrame (#7979)
- support struct in df.unique (#7976)
- change top_k(descending) -> bottom_k (#7969)
- basic sanity-checks for some
Config
methods, reference POLARS_MAX_THREADS inthreadpool_size
docstring (#7965) - optimize away nested unions in lp (#7861)
- Use RowsError instead of RowsException as recommended … (#6009)
- More ergonomic args for
struct
,concat_str
, andarg_sort_by
(#7308)
🐞 Bug fixes
- check element count in multi-column explode (#8050)
- set lower limit for chunk_size (#8048)
- impl to_static for struct (#8037)
- create Series with list of only None with Float32 dtype (#8015)
- version gate pyarrow version for `to_pandas=(use_pyarrow… (#8026)
- Only allow correct type for get_column and to_series arg… (#7983)
- Output correct dtype for values of remapping dict in map… (#8013)
- all/any empty sets (#8012)
- struct null_count, cast string, tranpose and describe (#8009)
- fix pivot and transpose of struct data (#8005)
- don't create duplicate pivot names (#8002)
- Fix test_literal_group_agg_chunked_7968 test (#7991)
- fix chunked literals in expression engine (#7973)
- in
sort
,top_k
,sort_by
, andarg_sort_by
, raise ifdescending
is a sequence and its length doesn't match the number of columns to sort by (#7957) - pandas 2.0 compat (#7962)
- concat object types (#7958)
- fix decimal conversion alignment (#7954)
🛠️ Other improvements
- Fix Expr.apply docstring for return_dtype parameter (#8069)
- rename some function arguments (#8017)
- Remove deprecated behaviour (#7978)
- Add docstring examples for top_k and bottom_k (#7987)
- rename
toggle_string_cache
toenable_string_cache
(#7970) - add remaining operator-equivalent method docstrings and a related html/docs entry (#7953)
- Use
time_unit
/time_zone
instead oftu
/tz
(#7910) - swap arguments of
shift_and_fill
and add default… (#7192) - set maintain_order=False for df/lf.unique (#7468)
- Rename pipe arg
func
tofunction
(#7139) - Set some args for
Series
/Expr
methods to keyword-only (#7860)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @StefanBRas, @alexander-beedie, @ghuls, @rben01, @ritchie46, @stinodego and @universalmind303
Python Polars 0.16.18
🚀 Performance improvements
- improve group_tuples of high cardinality data
~10%
(#7938)
✨ Enhancements
- Add seed argument to rank for random (#7913)
- Support Numpy ufunc with more than one expression (#7924)
🐞 Bug fixes
🛠️ Other improvements
- Rename argument
f
tofunction
in reduce docstring (#7925) - improve docstrings for numeric/math operator-equivalent methods (#7942)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @alonme, @ankane, @dependabot, @dependabot[bot], @lorentzenchr, @rben01, @ritchie46 and @zundertj
Python Polars 0.16.17
🚀 Performance improvements
- use streaming instead of partitioned groupby (#7907)
- don't auto-stream groupby (#7906)
- rechunk before aggs (#7903)
- don't re-allocate groups in sorted to_dummies (#7897)
- fix hashing regression (#7833)
- rechunk dataframe before unique computation (#7814)
- improve hash quality (#7813)
- always take sorted fast path group_tuples (#7787)
✨ Enhancements
- auto-infer detecting time-zone-awareness of fmt argument in strptime; deprecate tz_aware argument (#7886)
- Add
Series.pow()
(#7898) - deal with null values in cut/qcut (#7878)
- allow list/tuple
lit
values (#7879) - Support writing dynamic/live formula columns via
write_excel
(#7871) - support datetime/date subclasses (e.g. FreezeGun) (#7819)
- support mode for floats and categoricals (#7827)
- allow Series init with
Unknown
dtype to proceed as if dtype isNone
, to allow inference (#7830) - support sort by 'struct' type (#7822)
- add
to_repr
methods to DataFrame and Series (#7802) - thousand separators in shape of repr
DataFrame
(#7775) - Improve automatic output dtype setting for
map_dict
. (#7797) - new utility
from_repr
function that reconstructs a DataFrame from its table repr (#7781) - deprecate default value of
aggregation_function
being'first'
inpivot
. In a future version, it will default toNone
(#7784)
🐞 Bug fixes
- fix lit agg (#7904)
- disable ooc groupby (#7901)
- Use
check_exact
for temporal types inassert_series_equal
(#7896) - fix abs logical type (#7895)
- fix boolean min/max output type and null handling (#7894)
- Cast compound types to their simple string representation on export to Excel (#7887)
- ensure
_repr_html_
escapes column names in addition to data/body elements (#7877) - validate groupby_dynamic inputs (#7876)
- correct for chunks in arg_where (#7873)
- fix nested logical/physical list (#7872)
- fix arbitrary nested logical types (#7869)
- Relax type hints for when/then (#7857)
- don't use fxhash in sink_sorted fast path (#7849)
- parquet stats & all kernel (#7846)
- Add missing type hint for
is_between
(#7835) - fill null list (#7836)
- fix explode list[null] (#7832)
- fix unicode lower/uppercase (#7826)
- raise error on invalid series concat strategy (#7823)
- don't use naive name in partitioned agg (#7810)
- Ensure CsvReader always respects the n_rows parameter (#7789)
🛠️ Other improvements
- Fix read_csv docstring formatting (#7875)
- update concat docstring for how parameter (#7834)
- don't run hash stability test on arm64 (#7825)
- Improve pl.when documentation (#7793)
- add description of ddof (#7811)
- Rename
venv
folder to.venv
(#7790) - add a
make requirements
option to install/refresh dependencies without having to recreate thevenv
(#7792) - fixup stacklevels (#7796)
- Drop
ruff
target version (#7791)
Thank you to all our contributors for making this release possible!
@LdRoW, @MarcoGorelli, @Newtoniano, @advoet, @alexander-beedie, @duskmoon314, @foxcroftjn, @ghuls, @jonashaag, @ritchie46, @stinodego and @zundertj
Python Polars 0.16.16
🐞 Bug fixes
- ensure k is lower than height (#7779)
- raise error on invalid categorical cast (#7686)
- raise error on attempt to set invalid
Datetime
orDuration
dtype timeunit (#7768)
🛠️ Other improvements
Thank you to all our contributors for making this release possible!
@alexander-beedie, @ghuls, @ritchie46 and @universalmind303
Python Polars 0.16.15
🚀 Performance improvements
- change top_k algorithm (#7718)
- runtime SIMD target detection for
min/max/sum
and impl SIMDmean
~2-5x
(#7702) - implement top-k optimization (#7678)
- ooc-sort dump in thread local if IO-thread is full. (#7668)
- use perfect hash table for ooc partitioning (#7653)
✨ Enhancements
- add dt.datetime, dt.date, dt.time (#7735)
- new "row_totals" parameter for
write_excel
that adds a row-wise total column using structured references (#7751) - More ergonomic args for
min/max
(#7742) - More ergonomic args for
concat_list
(#7745) - add
Series.hist
(#7727) - add
qcut
(#7724) - add
maintain_order
option toSeries.cut
(#7723) - create series with only none list with specific dtype (#7722)
- add
maintain_order
inarr.unique
(#7721) DataFrame.top_k/ LazyFrame.top_k
(#7720)- clearer error message when replace_time_zone encounters ambiguous or non-existent datetimes (#7685)
- include
set_fmt_float
value inConfig
load/save state (#7696) - raise on descending date_range arguments (#7671)
- include
add
operator-equivalent expression (#7667) - add expression method equivalents for existing math/logical operators (#7660)
- add
is_leap_year
to temporal expressions (#7618) - full out-of core support for streaming groupby (#7630)
- clearer error message when creating duration string without integer (#7648)
- allow
scan_csv
to take a list of column names in anew_columns
param (#7642) - out-of-core
groupby/unique
of groupby on integer keys (#7604) - allow set and/or frozenset as input to
is_in
expressions (#7613)
🐞 Bug fixes
- make zip_with_same_type obligatory (#7761)
- fix melt projection pushdown node (#7752)
- fix predicate pushdown for 'unique' first/last (#7749)
- fix null propagation (#7748)
- fix init from pandas Series that has no dtype and is empty (or contains only null values) (#7716)
- avoid ambiguous time error when passing python Datetime to DataFrame constructor (#7711)
- Fix infering CSV schema when skip_rows_after_heade… (#7701)
- fix race condition in null handling of window fast… (#7695)
- address
Series
init regression from list ofnp.arange
objects (#7692) - improve error message if unavailable lazy module is queried for
__version__
attribute (#7680) - fix reversed non-existant file error msg (#7657) (#7673)
- respect time zone in groupby_rolling with negative offset (#7664)
- fix empty case str.replace (#7662)
- allow for list of datetimes with timezone(timedelta!=0) in Series constructor (#7645)
- respect time zone in rolling_* functions (#7643)
- fix schema of decimal type reads (#7652)
- detect deltalake version in show_versions (#7622)
- respect time zone in offset_by (#7626)
- fix boolean
Series
init with integer 1/0 values (#7619) - respect time zone in dt.round (#7611)
🛠️ Other improvements
- Display full argument names in __repr__ for Datetime a… (#7736)
- add
Expr.pipe
API docs link (#7734) - Add sort_by example taking one row per group (#7712)
- Clean up a few type hints/imports (#7687)
- Move
wrap_x
utils toutils
module (#7672) - Reduce number of polars.internals imports (#7628)
- Remove duplicate column from Expr.sort example (#7684)
- Move
expr
parsing to utils (#7661) - Eliminate function re-exports through
internals
(#7650) - Move last functionality out of
internals
(#7649) - More internals cleanup (#7638)
- Update lockfile (#7637)
- fix and improve type hints and function names (#7609)
- remove additional logic from scan delta (#7605)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @borchero, @chitralverma, @didriksg, @ghuls, @jakob-keller, @minimav, @ritchie46, @stinodego, @universalmind303 and @zundertj
Python Polars 0.16.14
🚀 Performance improvements
- optimize string kernels, (elide redundant allocs) (#7602)
- even faster polars module import (~15%) (#7584)
- optimize
str_replace
for same length replacements~2x
(#7580) - reinstate fast module import and optimise
DataFrame
init by implementing dynamicsingledispatch
registration (#7559) - improve perf or
str.replace_n
and addn
argument~10x
(#7575) - speedup
replace_literal_all
of single byte replacements~15x
. (#7565) - set sorted flags (#7558)
- extend ultrafast constant-value frame init to temporal types (over 1,000x speedup) (#7527)
✨ Enhancements
- slightly more space-efficient table output (use ellipsis char, not three periods) (#7599)
- implement decimal -> dtype cast (#7600)
- use head on pyarrow datasets (#7570)
- overwrite streaming chunk size (#7543)
🐞 Bug fixes
- remove index columns in pandas to_sql() (#7596)
- add decimal chunk_lengths (#7589)
- fix ooc sort. the fast path was invalid (#7588)
- Fix regression throwing AmbiguousTimeError in groupby_dynamic (#7454)
- activate dtype-duration for polars-ops (#7582)
- distinct project whole schema if not a subset (#7581)
- reinstate fast module import and optimise
DataFrame
init by implementing dynamicsingledispatch
registration (#7559) - sql window functions (#7458)
- respect time zone in upsample (#7563)
- fix rolling windows for windows that shrink from lhs (#7556)
- remove pyarrow from construction and dispatch to rust (#7551)
- fix negative indexing for
head
/tail
(#7554) - Remove
BatchedCsvReader
from public API (#7546) - fix logical/list getitem (#7545)
- pushdown key in merge sorted projection pd (#7542)
- don't upcast column to string in 'is_in' operation (#7538)
🛠️ Other improvements
- Move more code out of
internals
(#7597) - add a performance hint about use of
lru_cache
to theapply
docstrings (#7593) - Avoid
pli
in type hints (part 2) (#7587) - Avoid
pli
in type hints (part 1) (#7586) - Move core objects to top level (#7576)
- Bump ruff (#7567)
- Rename namespace Array -> List in docs (#7541)
- Move
fmt
tests totest_fmt
(#7555) - Rename
sep
arg toseparator
(#7533) - Minor Series cleanup (#7531)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Vincenthays, @alexander-beedie, @ritchie46, @stinodego, @universalmind303 and @vincev
Python Polars 0.16.13
🚀 Performance improvements
- use atoi in favor of lexical in strptime
-25%
(#7501) - [csv] faster utf8 validation
~20%
(#7500) - [csv] SIMD accelerate SplitFields
-40%
(#7498) - (csv) don't use memchr for splitfields
-~0.15%
(#7494) - csv-file use fast-float for csv float parsing (#7492)
✨ Enhancements
- literal support for binary (#7519)
🐞 Bug fixes
- fix(rust, python) respect time zone in date_range (#7503)
- transparently integrate externally-registered Excel formats (#7520)
- use physical types in sort-by args (#7518)
- keep series name in arithmetic (#7513)
- Initialize with
Decimal
dtype (#7511) - fix projection pushdown of asof_joins (#7487)
🛠️ Other improvements
- update
show_versions
with xlsxwriter (and add as optional dependency) (#7507) - Use new
LazyFrame
init in docs (#7508) - Bump some linting versions (#7505)
Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @ecashin, @ritchie46 and @stinodego
Python Polars 0.16.12
🚀 Performance improvements
- speed up comparison of sorted arrays
~3.85x
. (#7478) - improve performance for datetime parsing with %Z (#7369)
✨ Enhancements
- slice pushdown in
LazyFrame.unique
(#7470) - streaming
LazyFrame.unique
(#7466) - automatically infer iso8601-like dates (#7457)
- push down temporal predicates to pyarrow scanner (#7421)
- slice pushdown in scan_arrow_ds (#7449)
- convert decimal 256 to 128 on entry (#7448)
- provide option to set individual
row_heights
on Excel export (#7447) - optimise
Excel
export when all data in a multi-column conditional format is contiguous (#7427) - dynamically change chunk_size in streaming `explo… (#7415)
- support setting multiple conditional formats on the same
Excel
table column/range (#7411) - add unary +,-,! to sql (#7399)
- disallow converting key values to null in map_dict due … (#7393)
- use IO backed reader when
low_memory=True
. (#7394) - The big error revamp (#7362)
- parse year-month-day as Datetime in slow-path (#7373)
- support applying one conditional format to multiple columns on
Excel
export (allows for heatmaps) (#7379) - Proper superclass for Decimal (#7384)
- tweak default Date and Time format strings for
Excel
export (#7380) - make melt streamable (#7364)
- don't rechunk before writing to csv (#7365)
🐞 Bug fixes
- handle an unusual edge-case introspecting dataclass type hints (#7476)
- raise error on categorical by arguments if not fro… (#7464)
- fix and test df.corr (#7463)
- make
DataFrame
rendering compatible with quarto and pandoc (#7455) - sql floor & ceil (#7456)
- fix
DataFrame
table rendering issue in some Jupyter environments (#7450) - allow for hourly date_range to cross DST (#7430)
- respect lexical/physical in multi-column categoric… (#7417)
- fix null_dtype slice (#7414)
- sort_by logical types (#7412)
- parse single-digit months and dates when code would have gone down fastpath (#7391)
- creating empty struct series with some unit fields (#7383)
- minor
Excel
export improvements/fixes (#7363)
🛠️ Other improvements
- Rename
read_x
functions argfile
tosource
(#7460) - Refactor
utils
module (#7435) - Rename functions that clash with builtins (#7424)
- Showcase new ergonomic syntax in README (#7419)
- Rename Decimal
prec
toprecision
(#7401) - Remove
_base_type
util (#7410) - Rename first arg of
from_x
todata
(#7407) - use exc as variable name for all captured exceptions (#7403)
- Remove redundant
schema
keyword description from `pl.… (#7400) - Rename
cfg
module toconfig
(#7385) - Add test for for groupby referencing the same column twice (#7340)
- Split up
datatypes
module (#7357) - Clean up type checking lints (#7358)
Thank you to all our contributors for making this release possible!
@Hofer-Julian, @MarcoGorelli, @SauravMaheshkar, @aldanor, @alexander-beedie, @cjackal, @ghuls, @josh, @juba, @nrebena, @rben01, @ritchie46, @stinodego and @universalmind303
Python Polars 0.16.11
🚀 Performance improvements
- optimize str.replace_all (#7353)
- optimize str.replace
~2x
improvement (#7347) - ensure utf8 apply preallocates memory (#7345)
✨ Enhancements
- make
LazyFrame.explode
streamable. (#7341) - allow import of dtype groups from the top-level to improve discovery (#7339)
🐞 Bug fixes
- make decimal types opt-in (#7348)
- fix chunk_sizes in threading apply (#7351)
- don't panic when writing
NullArray
values to python row tuple (#7346)
🛠️ Other improvements
- add
write_excel
API docs link (#7338)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @ritchie46 and @s-banach
Python Polars 0.16.10
🏆 Highlights
- Excel export support via new
write_excel
IO method (#7251) - out of core sort on multiple columns (#7244)
🚀 Performance improvements
- improve batched csv readers perf and memory perf (#7329)
- use inlined strings for field and schema (#7272)
- reuse groups in binary expressions (#7202)
✨ Enhancements
- support creation of sparklines when exporting
Excel
tables (#7333) - support sqlalchemy/pandas backed
write_database
(#7322) - add adbc database reader and writer (
DataFrame.write_database
) (#7318) - make
expr.apply
streamable in selection context (#7316) - More ergonomic
unnest
args (#7310) - initial working version of Decimal Series (#7220)
- Support explicit Binary dtype in constructor (#7305)
- implement serde for literal datetime and series (#7301)
- improve error message if mmap fails in ipc (#7300)
- add multi-threaded apply (#7277)
- add support for serializing categoricals to json (#7276)
- Add Expr.arg_true (#7056)
- don't require pyarrow for initialising Series with Python datetimes (#7273)
- Excel export support via new
write_excel
IO method (#7251) - deprecate
describe_(optimized)_plan
in favor ofexplain
(#7264) - enable min-max skipping for binary in parquet, enable min-max skipping for
is_in
exprs (#7169) - out of core sort on multiple columns (#7244)
- support nulls_last for multi-column sort (#7242)
- allow optimizations flags in describe_plan (#7233)
- implement row encoding for boolean and binary (#7218)
- allow passing utc=True when parsing time-zone-naive date strings (#7203)
- Add
**named_exprs
input forstruct
(#7208) - add sql "ARRAY_AGG" (#7204)
🐞 Bug fixes
- fix offset in threading apply (#7330)
- fix projection pushdown on join with unused join key (#7326)
- raise error on time -> datetime cast (#7325)
- raise error if output of 'apply' cannot be determined (#7317)
- make
pl.struct
mappable (#7299) - err on duplicate with_column names (#7296)
- don't panic on
str.parse_int
(#7072) - improve concat_list with empty list error message (#7236)
- fix groupby_dynamic's binning when index_column is time-zone-aware (#7278)
- fix preservation of microseconds when converting Python datetime (#7271)
- fix us precision of datetime to anyvalue conversion (#7268)
- no panic on empty cross join (#7266)
- raise error on ambiguous filter predicates (#7265)
- handle concat_list with first lit value (#7235)
- respect schema in DataFrame initialisation for time-zone-aware datetime (#7240)
- ensure
every
type is properly normalised (forgroupby_dynamic
andgroupby_rolling
) (#7238) - add test of median function in lazy mode (#7224)
- dont lose precision in pl.date_range due to floating point arithmetic (#7229)
- Conversion of negative timedelta to polars duration (#7209)
- ensure parametric testing
cols=int
definition respectsallowed_dtypes
(#7213)
🛠️ Other improvements
- Fix
read/write_database
tests (#7327) - Rename
scan_ds
toscan_pyarrow_dataset
(#7320) - don't run tests that write to disk by default (#7321)
- rename
read_sql
toread_database
(#7315) - Address
git2
vulnerability (#7309) - Correctly deprecate
DataFrame.pearson_corr
(#7307) - Skip
write_excel
doctests (#7306) - Run
pytest-xdist
with worksteal (#7304) - Rename pearson_corr & spearman_rank_corr (#7014)
- refactor(python) Split
io
module per type (#7295) - Move
_html
module to dataframe module (#7256) - Enable
strict
for ruffTCH
lints (#7234) - better select on map_dict dtype (#7217)
- add warning of mmap to ipc docstring (#7216)
- exit non-zero on fix from ruff (#7215)
- ensure that
DataFrame
andLazyFrame
init params don't diverge (#7214)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @aldanor, @alexander-beedie, @coinflip112, @csko, @dependabot, @dependabot[bot], @ghuls, @josemasar, @josh, @mslapek, @nrebena, @ozgrakkurt, @papparapa, @ptiza, @rben01, @ritchie46, @sorhawell, @stinodego, @universalmind303, @xyning and @zundertj