Releases · pola-rs/polars

08 Apr 13:24

github-actions

py-0.17.0

842fbec

Python Polars 0.17.0

⚠️ Breaking changes

rename some function arguments (#8017)
don't create duplicate pivot names (#8002)
Remove deprecated behaviour (#7978)
rename toggle_string_cache to enable_string_cache (#7970)
change top_k(descending) -> bottom_k (#7969)
in sort, top_k, sort_by, and arg_sort_by, raise if descending is a sequence and its length doesn't match the number of columns to sort by (#7957)
Use RowsError instead of RowsException as recommended … (#6009)
Use time_unit/time_zone instead of tu/tz (#7910)
More ergonomic args for struct, concat_str, and arg_sort_by (#7308)
swap arguments of shift_and_fill and add default… (#7192)
set maintain_order=False for df/lf.unique (#7468)
Rename pipe arg func to function (#7139)
Set some args for Series/Expr methods to keyword-only (#7860)

🚀 Performance improvements

FromParalleIter<Option<str>> for Utf8Chunked ~1.9x (#8058)
speed up from_par_iter Option<bool> ~2.5x (#8057)
parallelize numeric ChunkedArray materialization ~2x. (#8053)
parallelize into_groups materialization ~-25% (#8036)
use a trusted anyvalue builder (#8001)
numeric grouptuples with nulls hash in single pass ~25% (#7980)
ensure primitives are parsed first in anyvalue conversion (#7955)
use perfect hash table for categoricals (#7951)

✨ Enhancements

multiple sql contexts & optional sql highlighting in cli (#8072)
implement arg_sort for struct dtype (#8051)
Support DataFrame init from pyarrow RecordBatch objects, and improve init from Array (#8011)
allow write_ipc to take file=None (returning BytesIO) (#7997)
Add __array__ method to DataFrame (#7979)
support struct in df.unique (#7976)
change top_k(descending) -> bottom_k (#7969)
basic sanity-checks for some Config methods, reference POLARS_MAX_THREADS in threadpool_size docstring (#7965)
optimize away nested unions in lp (#7861)
Use RowsError instead of RowsException as recommended … (#6009)
More ergonomic args for struct, concat_str, and arg_sort_by (#7308)

🐞 Bug fixes

check element count in multi-column explode (#8050)
set lower limit for chunk_size (#8048)
impl to_static for struct (#8037)
create Series with list of only None with Float32 dtype (#8015)
version gate pyarrow version for `to_pandas=(use_pyarrow… (#8026)
Only allow correct type for get_column and to_series arg… (#7983)
Output correct dtype for values of remapping dict in map… (#8013)
all/any empty sets (#8012)
struct null_count, cast string, tranpose and describe (#8009)
fix pivot and transpose of struct data (#8005)
don't create duplicate pivot names (#8002)
Fix test_literal_group_agg_chunked_7968 test (#7991)
fix chunked literals in expression engine (#7973)
in sort, top_k, sort_by, and arg_sort_by, raise if descending is a sequence and its length doesn't match the number of columns to sort by (#7957)
pandas 2.0 compat (#7962)
concat object types (#7958)
fix decimal conversion alignment (#7954)

🛠️ Other improvements

Fix Expr.apply docstring for return_dtype parameter (#8069)
rename some function arguments (#8017)
Remove deprecated behaviour (#7978)
Add docstring examples for top_k and bottom_k (#7987)
rename toggle_string_cache to enable_string_cache (#7970)
add remaining operator-equivalent method docstrings and a related html/docs entry (#7953)
Use time_unit/time_zone instead of tu/tz (#7910)
swap arguments of shift_and_fill and add default… (#7192)
set maintain_order=False for df/lf.unique (#7468)
Rename pipe arg func to function (#7139)
Set some args for Series/Expr methods to keyword-only (#7860)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @StefanBRas, @alexander-beedie, @ghuls, @rben01, @ritchie46, @stinodego and @universalmind303

Contributors

rben01, ghuls, and 6 other contributors

Assets 2

03 Apr 05:35

github-actions

py-0.16.18

1a7103f

Python Polars 0.16.18

🚀 Performance improvements

improve group_tuples of high cardinality data ~10% (#7938)

✨ Enhancements

Add seed argument to rank for random (#7913)
Support Numpy ufunc with more than one expression (#7924)

🐞 Bug fixes

Fix lazy encode schema (#7912)
respect skip_nulls in apply for temporal types (#7908)

🛠️ Other improvements

Rename argument f to function in reduce docstring (#7925)
improve docstrings for numeric/math operator-equivalent methods (#7942)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @alonme, @ankane, @dependabot, @dependabot[bot], @lorentzenchr, @rben01, @ritchie46 and @zundertj

Contributors

ankane, rben01, and 7 other contributors

Assets 2

31 Mar 16:41

github-actions

py-0.16.17

69516a7

Python Polars 0.16.17

🚀 Performance improvements

use streaming instead of partitioned groupby (#7907)
don't auto-stream groupby (#7906)
rechunk before aggs (#7903)
don't re-allocate groups in sorted to_dummies (#7897)
fix hashing regression (#7833)
rechunk dataframe before unique computation (#7814)
improve hash quality (#7813)
always take sorted fast path group_tuples (#7787)

✨ Enhancements

auto-infer detecting time-zone-awareness of fmt argument in strptime; deprecate tz_aware argument (#7886)
Add Series.pow() (#7898)
deal with null values in cut/qcut (#7878)
allow list/tuple lit values (#7879)
Support writing dynamic/live formula columns via write_excel (#7871)
support datetime/date subclasses (e.g. FreezeGun) (#7819)
support mode for floats and categoricals (#7827)
allow Series init with Unknown dtype to proceed as if dtype is None, to allow inference (#7830)
support sort by 'struct' type (#7822)
add to_repr methods to DataFrame and Series (#7802)
thousand separators in shape of repr DataFrame (#7775)
Improve automatic output dtype setting for map_dict. (#7797)
new utility from_repr function that reconstructs a DataFrame from its table repr (#7781)
deprecate default value of aggregation_function being 'first' in pivot. In a future version, it will default to None (#7784)

🐞 Bug fixes

fix lit agg (#7904)
disable ooc groupby (#7901)
Use check_exact for temporal types in assert_series_equal (#7896)
fix abs logical type (#7895)
fix boolean min/max output type and null handling (#7894)
Cast compound types to their simple string representation on export to Excel (#7887)
ensure _repr_html_ escapes column names in addition to data/body elements (#7877)
validate groupby_dynamic inputs (#7876)
correct for chunks in arg_where (#7873)
fix nested logical/physical list (#7872)
fix arbitrary nested logical types (#7869)
Relax type hints for when/then (#7857)
don't use fxhash in sink_sorted fast path (#7849)
parquet stats & all kernel (#7846)
Add missing type hint for is_between (#7835)
fill null list (#7836)
fix explode list[null] (#7832)
fix unicode lower/uppercase (#7826)
raise error on invalid series concat strategy (#7823)
don't use naive name in partitioned agg (#7810)
Ensure CsvReader always respects the n_rows parameter (#7789)

🛠️ Other improvements

Fix read_csv docstring formatting (#7875)
update concat docstring for how parameter (#7834)
don't run hash stability test on arm64 (#7825)
Improve pl.when documentation (#7793)
add description of ddof (#7811)
Rename venv folder to .venv (#7790)
add a make requirements option to install/refresh dependencies without having to recreate the venv (#7792)
fixup stacklevels (#7796)
Drop ruff target version (#7791)

Thank you to all our contributors for making this release possible!
@LdRoW, @MarcoGorelli, @Newtoniano, @advoet, @alexander-beedie, @duskmoon314, @foxcroftjn, @ghuls, @jonashaag, @ritchie46, @stinodego and @zundertj

Contributors

jonashaag, ghuls, and 10 other contributors

Assets 2

25 Mar 12:39

github-actions

py-0.16.16

c6db488

Python Polars 0.16.16

🐞 Bug fixes

ensure k is lower than height (#7779)
raise error on invalid categorical cast (#7686)
raise error on attempt to set invalid Datetime or Duration dtype timeunit (#7768)

🛠️ Other improvements

Add "typos" as spell checking lint (#7759)
fix typos (#7756)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ghuls, @ritchie46 and @universalmind303

Contributors

ghuls, alexander-beedie, and 2 other contributors

Assets 2

24 Mar 14:45

github-actions

py-0.16.15

d0a91e6

Python Polars 0.16.15

🚀 Performance improvements

change top_k algorithm (#7718)
runtime SIMD target detection for min/max/sum and impl SIMD mean ~2-5x (#7702)
implement top-k optimization (#7678)
ooc-sort dump in thread local if IO-thread is full. (#7668)
use perfect hash table for ooc partitioning (#7653)

✨ Enhancements

add dt.datetime, dt.date, dt.time (#7735)
new "row_totals" parameter for write_excel that adds a row-wise total column using structured references (#7751)
More ergonomic args for min/max (#7742)
More ergonomic args for concat_list (#7745)
add Series.hist (#7727)
add qcut (#7724)
add maintain_order option to Series.cut (#7723)
create series with only none list with specific dtype (#7722)
add maintain_order in arr.unique (#7721)
DataFrame.top_k/ LazyFrame.top_k (#7720)
clearer error message when replace_time_zone encounters ambiguous or non-existent datetimes (#7685)
include set_fmt_float value in Config load/save state (#7696)
raise on descending date_range arguments (#7671)
include add operator-equivalent expression (#7667)
add expression method equivalents for existing math/logical operators (#7660)
add is_leap_year to temporal expressions (#7618)
full out-of core support for streaming groupby (#7630)
clearer error message when creating duration string without integer (#7648)
allow scan_csv to take a list of column names in a new_columns param (#7642)
out-of-core groupby/unique of groupby on integer keys (#7604)
allow set and/or frozenset as input to is_in expressions (#7613)

🐞 Bug fixes

make zip_with_same_type obligatory (#7761)
fix melt projection pushdown node (#7752)
fix predicate pushdown for 'unique' first/last (#7749)
fix null propagation (#7748)
fix init from pandas Series that has no dtype and is empty (or contains only null values) (#7716)
avoid ambiguous time error when passing python Datetime to DataFrame constructor (#7711)
Fix infering CSV schema when skip_rows_after_heade… (#7701)
fix race condition in null handling of window fast… (#7695)
address Series init regression from list of np.arange objects (#7692)
improve error message if unavailable lazy module is queried for __version__ attribute (#7680)
fix reversed non-existant file error msg (#7657) (#7673)
respect time zone in groupby_rolling with negative offset (#7664)
fix empty case str.replace (#7662)
allow for list of datetimes with timezone(timedelta!=0) in Series constructor (#7645)
respect time zone in rolling_* functions (#7643)
fix schema of decimal type reads (#7652)
detect deltalake version in show_versions (#7622)
respect time zone in offset_by (#7626)
fix boolean Series init with integer 1/0 values (#7619)
respect time zone in dt.round (#7611)

🛠️ Other improvements

Display full argument names in __repr__ for Datetime a… (#7736)
add Expr.pipe API docs link (#7734)
Add sort_by example taking one row per group (#7712)
Clean up a few type hints/imports (#7687)
Move wrap_x utils to utils module (#7672)
Reduce number of polars.internals imports (#7628)
Remove duplicate column from Expr.sort example (#7684)
Move expr parsing to utils (#7661)
Eliminate function re-exports through internals (#7650)
Move last functionality out of internals (#7649)
More internals cleanup (#7638)
Update lockfile (#7637)
fix and improve type hints and function names (#7609)
remove additional logic from scan delta (#7605)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @borchero, @chitralverma, @didriksg, @ghuls, @jakob-keller, @minimav, @ritchie46, @stinodego, @universalmind303 and @zundertj

Contributors

ghuls, alexander-beedie, and 10 other contributors

Assets 2

17 Mar 09:56

github-actions

py-0.16.14

c5b5e02

Python Polars 0.16.14

🚀 Performance improvements

optimize string kernels, (elide redundant allocs) (#7602)
even faster polars module import (~15%) (#7584)
optimize str_replace for same length replacements ~2x (#7580)
reinstate fast module import and optimise DataFrame init by implementing dynamic singledispatch registration (#7559)
improve perf or str.replace_n and add n argument ~10x (#7575)
speedup replace_literal_all of single byte replacements ~15x. (#7565)
set sorted flags (#7558)
extend ultrafast constant-value frame init to temporal types (over 1,000x speedup) (#7527)

✨ Enhancements

slightly more space-efficient table output (use ellipsis char, not three periods) (#7599)
implement decimal -> dtype cast (#7600)
use head on pyarrow datasets (#7570)
overwrite streaming chunk size (#7543)

🐞 Bug fixes

remove index columns in pandas to_sql() (#7596)
add decimal chunk_lengths (#7589)
fix ooc sort. the fast path was invalid (#7588)
Fix regression throwing AmbiguousTimeError in groupby_dynamic (#7454)
activate dtype-duration for polars-ops (#7582)
distinct project whole schema if not a subset (#7581)
reinstate fast module import and optimise DataFrame init by implementing dynamic singledispatch registration (#7559)
sql window functions (#7458)
respect time zone in upsample (#7563)
fix rolling windows for windows that shrink from lhs (#7556)
remove pyarrow from construction and dispatch to rust (#7551)
fix negative indexing for head/tail (#7554)
Remove BatchedCsvReader from public API (#7546)
fix logical/list getitem (#7545)
pushdown key in merge sorted projection pd (#7542)
don't upcast column to string in 'is_in' operation (#7538)

🛠️ Other improvements

Move more code out of internals (#7597)
add a performance hint about use of lru_cache to the apply docstrings (#7593)
Avoid pli in type hints (part 2) (#7587)
Avoid pli in type hints (part 1) (#7586)
Move core objects to top level (#7576)
Bump ruff (#7567)
Rename namespace Array -> List in docs (#7541)
Move fmt tests to test_fmt (#7555)
Rename sep arg to separator (#7533)
Minor Series cleanup (#7531)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @Vincenthays, @alexander-beedie, @ritchie46, @stinodego, @universalmind303 and @vincev

Contributors

alexander-beedie, ritchie46, and 5 other contributors

Assets 2

13 Mar 11:15

github-actions

py-0.16.13

42f503a

Python Polars 0.16.13

🚀 Performance improvements

use atoi in favor of lexical in strptime -25% (#7501)
[csv] faster utf8 validation ~20% (#7500)
[csv] SIMD accelerate SplitFields -40% (#7498)
(csv) don't use memchr for splitfields -~0.15% (#7494)
csv-file use fast-float for csv float parsing (#7492)

✨ Enhancements

literal support for binary (#7519)

🐞 Bug fixes

fix(rust, python) respect time zone in date_range (#7503)
transparently integrate externally-registered Excel formats (#7520)
use physical types in sort-by args (#7518)
keep series name in arithmetic (#7513)
Initialize with Decimal dtype (#7511)
fix projection pushdown of asof_joins (#7487)

🛠️ Other improvements

update show_versions with xlsxwriter (and add as optional dependency) (#7507)
Use new LazyFrame init in docs (#7508)
Bump some linting versions (#7505)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @ecashin, @ritchie46 and @stinodego

Contributors

ecashin, alexander-beedie, and 4 other contributors

Assets 2

10 Mar 16:29

github-actions

py-0.16.12

e5fcf28

Python Polars 0.16.12

🚀 Performance improvements

speed up comparison of sorted arrays ~3.85x. (#7478)
improve performance for datetime parsing with %Z (#7369)

✨ Enhancements

slice pushdown in LazyFrame.unique (#7470)
streaming LazyFrame.unique (#7466)
automatically infer iso8601-like dates (#7457)
push down temporal predicates to pyarrow scanner (#7421)
slice pushdown in scan_arrow_ds (#7449)
convert decimal 256 to 128 on entry (#7448)
provide option to set individual row_heights on Excel export (#7447)
optimise Excel export when all data in a multi-column conditional format is contiguous (#7427)
dynamically change chunk_size in streaming `explo… (#7415)
support setting multiple conditional formats on the same Excel table column/range (#7411)
add unary +,-,! to sql (#7399)
disallow converting key values to null in map_dict due … (#7393)
use IO backed reader when low_memory=True. (#7394)
The big error revamp (#7362)
parse year-month-day as Datetime in slow-path (#7373)
support applying one conditional format to multiple columns on Excel export (allows for heatmaps) (#7379)
Proper superclass for Decimal (#7384)
tweak default Date and Time format strings for Excel export (#7380)
make melt streamable (#7364)
don't rechunk before writing to csv (#7365)

🐞 Bug fixes

handle an unusual edge-case introspecting dataclass type hints (#7476)
raise error on categorical by arguments if not fro… (#7464)
fix and test df.corr (#7463)
make DataFrame rendering compatible with quarto and pandoc (#7455)
sql floor & ceil (#7456)
fix DataFrame table rendering issue in some Jupyter environments (#7450)
allow for hourly date_range to cross DST (#7430)
respect lexical/physical in multi-column categoric… (#7417)
fix null_dtype slice (#7414)
sort_by logical types (#7412)
parse single-digit months and dates when code would have gone down fastpath (#7391)
creating empty struct series with some unit fields (#7383)
minor Excel export improvements/fixes (#7363)

🛠️ Other improvements

Rename read_x functions arg file to source (#7460)
Refactor utils module (#7435)
Rename functions that clash with builtins (#7424)
Showcase new ergonomic syntax in README (#7419)
Rename Decimal prec to precision (#7401)
Remove _base_type util (#7410)
Rename first arg of from_x to data (#7407)
use exc as variable name for all captured exceptions (#7403)
Remove redundant schema keyword description from `pl.… (#7400)
Rename cfg module to config (#7385)
Add test for for groupby referencing the same column twice (#7340)
Split up datatypes module (#7357)
Clean up type checking lints (#7358)

Thank you to all our contributors for making this release possible!
@Hofer-Julian, @MarcoGorelli, @SauravMaheshkar, @aldanor, @alexander-beedie, @cjackal, @ghuls, @josh, @juba, @nrebena, @rben01, @ritchie46, @stinodego and @universalmind303

Contributors

josh, juba, and 12 other contributors

Assets 2

05 Mar 17:49

github-actions

py-0.16.11

e558689

Python Polars 0.16.11

🚀 Performance improvements

optimize str.replace_all (#7353)
optimize str.replace ~2x improvement (#7347)
ensure utf8 apply preallocates memory (#7345)

✨ Enhancements

make LazyFrame.explode streamable. (#7341)
allow import of dtype groups from the top-level to improve discovery (#7339)

🐞 Bug fixes

make decimal types opt-in (#7348)
fix chunk_sizes in threading apply (#7351)
don't panic when writing NullArray values to python row tuple (#7346)

🛠️ Other improvements

add write_excel API docs link (#7338)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ritchie46 and @s-banach

Contributors

alexander-beedie, ritchie46, and s-banach

Assets 2

03 Mar 20:13

github-actions

py-0.16.10

d5ca27e

Python Polars 0.16.10

🏆 Highlights

Excel export support via new write_excel IO method (#7251)
out of core sort on multiple columns (#7244)

🚀 Performance improvements

improve batched csv readers perf and memory perf (#7329)
use inlined strings for field and schema (#7272)
reuse groups in binary expressions (#7202)

✨ Enhancements

support creation of sparklines when exporting Excel tables (#7333)
support sqlalchemy/pandas backed write_database (#7322)
add adbc database reader and writer (DataFrame.write_database) (#7318)
make expr.apply streamable in selection context (#7316)
More ergonomic unnest args (#7310)
initial working version of Decimal Series (#7220)
Support explicit Binary dtype in constructor (#7305)
implement serde for literal datetime and series (#7301)
improve error message if mmap fails in ipc (#7300)
add multi-threaded apply (#7277)
add support for serializing categoricals to json (#7276)
Add Expr.arg_true (#7056)
don't require pyarrow for initialising Series with Python datetimes (#7273)
Excel export support via new write_excel IO method (#7251)
deprecate describe_(optimized)_plan in favor of explain (#7264)
enable min-max skipping for binary in parquet, enable min-max skipping for is_in exprs (#7169)
out of core sort on multiple columns (#7244)
support nulls_last for multi-column sort (#7242)
allow optimizations flags in describe_plan (#7233)
implement row encoding for boolean and binary (#7218)
allow passing utc=True when parsing time-zone-naive date strings (#7203)
Add **named_exprs input for struct (#7208)
add sql "ARRAY_AGG" (#7204)

🐞 Bug fixes

fix offset in threading apply (#7330)
fix projection pushdown on join with unused join key (#7326)
raise error on time -> datetime cast (#7325)
raise error if output of 'apply' cannot be determined (#7317)
make pl.struct mappable (#7299)
err on duplicate with_column names (#7296)
don't panic on str.parse_int (#7072)
improve concat_list with empty list error message (#7236)
fix groupby_dynamic's binning when index_column is time-zone-aware (#7278)
fix preservation of microseconds when converting Python datetime (#7271)
fix us precision of datetime to anyvalue conversion (#7268)
no panic on empty cross join (#7266)
raise error on ambiguous filter predicates (#7265)
handle concat_list with first lit value (#7235)
respect schema in DataFrame initialisation for time-zone-aware datetime (#7240)
ensure every type is properly normalised (for groupby_dynamic and groupby_rolling) (#7238)
add test of median function in lazy mode (#7224)
dont lose precision in pl.date_range due to floating point arithmetic (#7229)
Conversion of negative timedelta to polars duration (#7209)
ensure parametric testing cols=int definition respects allowed_dtypes (#7213)

🛠️ Other improvements

Fix read/write_database tests (#7327)
Rename scan_ds to scan_pyarrow_dataset (#7320)
don't run tests that write to disk by default (#7321)
rename read_sql to read_database (#7315)
Address git2 vulnerability (#7309)
Correctly deprecate DataFrame.pearson_corr (#7307)
Skip write_excel doctests (#7306)
Run pytest-xdist with worksteal (#7304)
Rename pearson_corr & spearman_rank_corr (#7014)
refactor(python) Split io module per type (#7295)
Move _html module to dataframe module (#7256)
Enable strict for ruff TCH lints (#7234)
better select on map_dict dtype (#7217)
add warning of mmap to ipc docstring (#7216)
exit non-zero on fix from ruff (#7215)
ensure that DataFrame and LazyFrame init params don't diverge (#7214)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @aldanor, @alexander-beedie, @coinflip112, @csko, @dependabot, @dependabot[bot], @ghuls, @josemasar, @josh, @mslapek, @nrebena, @ozgrakkurt, @papparapa, @ptiza, @rben01, @ritchie46, @sorhawell, @stinodego, @universalmind303, @xyning and @zundertj

Contributors

josh, csko, and 19 other contributors

Assets 2

Releases: pola-rs/polars

Python Polars 0.17.0

⚠️ Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.18

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.17

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.16

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.15

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.14

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.13

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.12

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.11

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors

Python Polars 0.16.10

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

Contributors