Skip to content

Releases: pola-rs/polars

Python Polars 0.18.13

07 Aug 15:46
9c194a2
Compare
Choose a tag to compare

⚠️ Deprecations

  • Rename LazyFrame.read/write_json to de/serialize (#10238)
  • Add categorical_as_str parameter to testing utils (#10350)

🚀 Performance improvements

  • don't parallelize literal expressions (#10321)

✨ Enhancements

  • support selectors in additional frame methods (#10255)
  • Add Series.cat.uses_lexical_ordering (#10325)
  • utility to get buffers and pointers (#10331)
  • improve datetime parsing error message (#10332)
  • add ptr for small integer types (#10330)
  • add offsets utility (#10328)
  • allow sequential runners in select/with_columns (#10322)
  • warn about inefficient apply json.loads if json is local import (#10310)
  • improve err msg parsing time, date, datetime (#10298)
  • Add categorical_as_str parameter to testing utils

🐞 Bug fixes

  • fix oob in 'last' (#10329)
  • show inefficient apply warning in ipython (#10312)
  • add cse to no_optimization in profile (#10317)
  • fix categorical lexical sort (#10318)
  • Fix join validation (#10257)
  • Set correct dtype for .extract_groups() (#10306)

Thank you to all our contributors for making this release possible!
@CanglongCl, @JulianCologne, @MarcoGorelli, @alexander-beedie, @cmdlineluser, @eltociear, @orlp, @ritchie46 and @stinodego

Python Polars 0.18.12

04 Aug 15:40
bfabdd5
Compare
Choose a tag to compare

⚠️ Deprecations

  • renaming approx_unique as approx_n_unique (#10290)
  • Rename first qcut parameter to quantiles (#10253)
  • Deprecate avg alias for mean (#10236)

🚀 Performance improvements

  • fix O(n^2) in sorted check during append (#10241)

✨ Enhancements

  • Add str.extract_groups (#10179)
  • raise TypeError for all LazyFrame comparison operators (#10275)
  • support bytecode translation to map_dict where the lookup key is an expression (#10265)
  • add entry point to the Consortium DataFrame API (#10244)
  • Extend datetime expression function with time zone/time unit parameters (#10235)
  • add "batch_size" to scan_pyarrow_dataset parameters (#10249)

🐞 Bug fixes

  • clear window cache and run windows on proper runners (#10303)
  • fix sorted fast path in streaming groupby wrt nulls (#10289)
  • Fix interchange protocol allowing copy even when allow_copy was set to False (#10262)
  • fix nan aggregation in groupby (#10287)
  • don't panic on cse if function hasn't implemented __eq__ (#10286)
  • fix empty streaming parquet file (#10252)
  • fix logical columns of streaming multi-column sort (#10250)
  • fix date/datetime parsing for short inputs with exact=False (#10231)
  • don't panic in wildcard apply (#10240)
  • fix cse profile (#10239)

🛠️ Other improvements

  • Update CODEOWNERS (#10261)
  • add note about pyarrow partitioning (#10297)
  • Do not keep history in gh-pages branch (#10282)
  • make an explicit note in read_parquet and scan_parquet about hive-style partitioning (point to scan_pyarrow_dataset instead) (#10277)
  • Fix typo in error message (#10281)
  • Replace "question" issues with link to Stack Overflow (#10230)
  • Use sphinx' maximum_signature_line_length (#10228)
  • add warning about parallel eval of .then(..) branches (#10229)
  • Update Sphinx to 7.1.1 and bump related dependencies (#10221)
  • Update dependabot config (#10222)

Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @TLouf, @alexander-beedie, @cmdlineluser, @dependabot, @dependabot[bot], @duvenagep, @mcrumiller, @orlp, @reswqa, @ritchie46 and @stinodego

Python Polars 0.18.11

01 Aug 06:03
78460fe
Compare
Choose a tag to compare

🐞 Bug fixes

  • correct struct null counts (#10142)
  • no cse in groupby until fixed (#10216)
  • avoid false positives from multiple RETURN_VALUE ops when checking apply lambdas/functions (#10211)

🛠️ Other improvements

  • Improve deprecation utils (#10167)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @magarick, @ritchie46, @stinodego and @varunmittal91

Python Polars 0.18.10

31 Jul 11:41
459379b
Compare
Choose a tag to compare

✨ Enhancements

  • raise a better error message from read_database if not passed a string URI (#10191)
  • Add pyarrow write_to_dataset to write_parquet function (#9835)

🐞 Bug fixes

  • fix is_in on empty series (#10195)
  • fix cse windows (#10197)
  • block predicate pushdown is_in and null producing … (#10194)
  • prevent re-ordering of dict keys inside .apply (#10172)
  • initialize fixed null values (#10192)
  • Don't pickle _scan_impl (#10175)
  • ensure window function run partitioned when cse is hit (#10170)

🛠️ Other improvements

  • prepend set_ to set operations on lists (#10182)
  • Track version in deprecation utils (#10147)
  • Add a simple util issue_deprecation_warning (#10146)
  • more precise checks for inefficient apply warnings (#10135)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @cjackal, @cmdlineluser, @potzenhotz, @ritchie46 and @stinodego

Python Polars 0.18.9

28 Jul 13:04
c1f5dc2
Compare
Choose a tag to compare

🏆 Highlights

  • common subexpression elemination (#9632)

⚠️ Deprecations

  • Deprecate parsing string inputs as literals for when-then-otherwise (#10122)
  • deprecate "connection_uri" → "connection" param in read/write database methods (#10134)
  • remove/deprecate cache and its logic (#10066)
  • Add date_ranges/time_ranges expression functions (#10005)

🚀 Performance improvements

  • speedup mode on sorted data (#10084)
  • speedup boolean apply (#10073)
  • shrink alp/lp ~2.5x (#10039)

✨ Enhancements

  • suggest map_dict instead of lambda x: DICT[x] (#10123)
  • enable "inefficient apply" warnings from Series (#10104)
  • support writing duration type in json (#10112)
  • BytecodeParser can now handle mixed/nested and/or control flow (#10085)
  • inline lit(Series).cast(..) to -> lit(Series.cast(..)) (#10092)
  • Add ArcTan2 to SQLContext (#9571)
  • cse in groupby's (#10062)
  • Adds sql CASE statement expressions (#10065)
  • Add date_ranges/time_ranges expression functions (#10005)
  • comm_subexpr_elim in streaming 'select/with_columns' (#10050)
  • add dataframe.flags property (#10037)
  • common subexpression elemination (#9632)
  • detect and warn about usage of str/int/float python-based casts with apply (#10026)
  • detect and warn about usage of json.loads in conjunction with apply (#10023)
  • detect and warn about bare numpy functions passed to apply (#10021)
  • support bytecode identification/mapping of python string-case functions in UDFs (#10007)
  • support bytecode identification of numpy functions in UDFs that we can map to native expressions (#10003)

🐞 Bug fixes

  • adjust for null values in str.replace fast path (#10132)
  • clear bit settings in list iteration (#10131)
  • use row-encoded for struct::is_sorted (#10129)
  • fix(rust, python): don't run file-caching in streaming mode (#10117)
  • Allow initialize of pl.Array in Dataframe using schema alone (#10100)
  • silence Series.apply inefficient apply warning when calling Expr.apply (#10116)
  • don't panic if masked out values are invalid in temporal kernels (#10114)
  • Fix struct get field by index out of bounds error. (#10097)
  • fix ub in simd-json (#10093)
  • fix invalid access when groupby rolling produces empty sets (#10109)
  • respect null_on_oob=False in list.take when pa… (#10105)
  • undo regression in scan_parquet from s3 (#10098)
  • fix is_sorted for structs (#10099)
  • add file path to io error in scan_csv (#10076)
  • fix false positive in parquet stats evaluation (#10087)
  • Address .col(regex).exclude() operations not executing. (#10025)
  • address an inadvertently shallow-copy issue on underlying PySeries (#10086)
  • fix Boolean::isin(null values) (#10074)
  • predicate pushdown #10058 (#10071)
  • map 'postgres' URI prefix to ADBC 'postgresql' module (#10018)
  • Fix weighted quantile for 0 weights (#10051)
  • eager time_range/date_range dimensions fix (#9996)

🛠️ Other improvements

  • get test_udfs running on all python versions again (#10136)
  • temporarily turn off fail-fast so that ubuntu tests run (#10133)
  • clarify "clones data" in to_numpy (#10095)
  • Refactor when/then/otherwise internals (#9922)
  • Properly format Returns sections of docstrings (#10064)
  • much-improved Instruction matching for BytecodeParser (#10040)
  • add pure-python tests and CI for bytecodeparser (#10027)
  • split-out expression translation and instruction-rewrite logic from BytecodeParser (#10012)
  • cleans api sections in docs (#10004)
  • Bump some dependencies (#9997)
  • Add patchelf extra to maturin (#9995)
  • restructure all UDF parsing/translation methods into a new BytecodeParser class (#9993)
  • Clean up date_range/time_range (#9985)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @cmdlineluser, @jonashaag, @magarick, @mcrumiller, @rikkaka, @ritchie46 and @stinodego

Python Polars 0.18.8

20 Jul 17:09
05fa344
Compare
Choose a tag to compare

⚠️ Deprecations

  • Add Series.extend (#9901)
  • Deprecate functions series input (#9878)

🚀 Performance improvements

  • Rolling min/max for partially sorted data (#9819)
  • Use pyo3::intern to avoid needlessly recreating PyString (#9853)

✨ Enhancements

  • Name transpose from column (#9846)
  • adds SQRT, CBRT, PI functions to SQLContext (#9936)
  • Let qcut create evenly spaced probabilities (#9960)
  • add freeze_panes option to write_excel (#9974)
  • initial support for parsing the set of jump bytecode instructions required to reconstruct and/or logic (#9972)
  • suggest more efficient expression if user passes simple lambda to Expr.apply or DataFrame.apply (#9918)
  • sorted flag on singletons (#9933)
  • maintain sorted flag after partition_by (#9944)
  • keep sorted flag in streaming left join (#9932)
  • Add cloudpickle for serializing python UDFs (#9921)
  • Optional three-valued logic for any/all (#9848)
  • Add Series.extend (#9901)
  • pass through unknown schema in unnest (#9896)
  • convenience support for parsing a list of SQL strings with sql_expr (#9881)
  • respect and allow more options in eager json parsing (#9882)
  • allow set_sorted in streaming (#9876)
  • Expr.cat.get_categories expression (#9869)
  • add LENGTH and OCTET_LENGTH string functions for SQL (#9860)
  • polars_warn! macro (#9868)

🐞 Bug fixes

  • fix incorrect state in projection pushdown with joins (#9987)
  • don't pass predicates referring to renamed literal… (#9965)
  • fix regression in regex expansion (#9952)
  • potential SO in csv infer schema (#9950)
  • raise on unsupported transpose and object types (#9946)
  • Fix as-of join when by groups are interleaved (#9938)
  • Handle DataFrame.extend extending by itself (#9897)
  • don't SO on align_frames (#9911)
  • respect original series dtype when constructing LitIter (#9886)
  • Handle DataFrame.vstack stacking itself (#9895)
  • sum aggregation empty set is 0, not null (#9894)
  • preserve expression aliases when parsing SQL with pl.sql_expr (#9875)
  • fmt unknown dtype (#9872)

🛠️ Other improvements

  • Update autolabeler again (#9984)
  • use param_name more in udfs for greater defensiveness (#9969)
  • fix or/and docstrings to say bitwise, not logical (#9964)
  • minor fix for apply docstring example text (#9953)
  • add note that collect_all returns result frames in the same order as input (#9951)
  • Improve docstrings for renaming operations (#9942)
  • Move sink_* methods to IO chapter (#9939)
  • Add 'nearest' in Expr.interpolation docstring with an example (#9935)
  • fix hyperlinks to pandas (#9937)
  • Address ignored Ruff doc rules (#9919)
  • improve weekday, day, ordinal_day examples (#9926)
  • deprecate bins argument and rename to breaks in Series.cut (#9913)
  • Use Pathlib everywhere (#9914)
  • Add various unit tests (#9903)
  • add big warnings about using apply (#9906)
  • Update autolabeler (#9885)
  • Workaround for PyCharm deprecation warning (#9907)
  • Mention func_horizontal on deprecated func docstrings (#9863)
  • note ordering guarantee for groupby (#9879)
  • add logo link entry to sphinx conf and factor-out website root paths (#9864)

Thank you to all our contributors for making this release possible!
@0xbe7a, @JulianCologne, @MarcoGorelli, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @alexander-beedie, @c-peters, @fsimkovic, @ion-elgreco, @magarick, @mcrumiller, @messense, @ritchie46, @sorhawell, @stinodego, @thomasaarholt and @zundertj

Rust Polars 0.31.1

15 Jul 09:51
6729224
Compare
Choose a tag to compare

🚀 Performance improvements

  • Rolling min/max for partially sorted data (#9819)
  • use hash set in drop_many (#9807)
  • Faster is_sorted when no flag set (#9777)
  • optimize n_unique for integers (#9568)
  • remove sort columns on multiple-key OOC sort (#9545)
  • don't needlessly trigger bitcount (#9561)
  • don't initialize memory before row-encoding (#9435)
  • reduce page faults in q1 ~-30% (#9423)
  • reduce rayon/idle time in streaming (#9416)
  • use row format in streaming join ~15% (#9379)
  • row encode buffer reuse (#9371)
  • bytes row format for streaming groupby/unique keys >3.5x (#9346)
  • push slices down map functions (#9350)
  • increase streaming groupby spill size from 256 to 10_000 (#9312)
  • perf(rust, python) Improve rolling min and max for nonulls (#9277)
  • slightly improve n_unique performance (#9286)
  • speed up write_csv for time-zone-aware columns (#9093)
  • parallelize rolling_window group materialization (#9095)

✨ Enhancements

  • pass through unknown schema in unnest (#9896)
  • access OptState in LazyFrame to unit-test optimization toggle methods. (#9883)
  • respect and allow more options in eager json parsing (#9882)
  • allow set_sorted in streaming (#9876)
  • Expr.cat.get_categories expression (#9869)
  • add LENGTH and OCTET_LENGTH string functions for SQL (#9860)
  • polars_warn! macro (#9868)
  • Add Run-length Encoding functions (#9826)
  • add include_key parameter to partition_by (#9750)
  • add LEFT string function for SQL (#9836)
  • add REGEXP_LIKE function for SQL (both two and three parameter version) (#9838)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • add drop_many_amortized (#9814)
  • Dedicated horizontal aggregation functions (#9752)
  • implement with_row_count as private function (#9810)
  • add support for SQL SUBSTR function (#9803)
  • add SQL support for binary data and expand recognised SQL dtype strings (#9802)
  • reworked comfy-table layout constraints, improving table wrapping/repr (#9744)
  • allow qcut in window expressions (#9745)
  • Improve cut and allow use in expressions (#9580)
  • clearer message when stringcache-related errors occur (#9715)
  • improve expression formatting (#9704)
  • set string cache in window functions (#9705)
  • raise on both sides of datetime/str comparison (#9692)
  • support deserializing struct json into df (#9688)
  • add tree formatter for expressions (#9684)
  • add .list.any() and .list.all() (#9573)
  • extend dtype/selector matching for Datetime with a "*" wildcard for timezones (#9641)
  • add polars::VERSION (#9660)
  • add symmetric difference to list set operations (#9655)
  • add dt.base_utc_offset (#9636)
  • add dt.dst_offset feature (#9629)
  • allow to specify index order in to_numpy (#9592)
  • accept expressions in repeat (#9614)
  • set operations for list (#9599)
  • add drop_first parameter for to_dummies (issue #8246) (#9143)
  • raise if window size in rolling functions isn't strictly positive (#9465)
  • add infer schema len to json_extract (#9478)
  • Adds (Most) Remaining Trig Functions to SQLContext (#9453)
  • update error handling msg for sql functions (#9474)
  • add str.titlecase (#9457)
  • raise if period is negative in groupby_rolling (#9445)
  • add SQL round support (#9330)
  • dont error for time-zone-aware parsing if time zone is UTC (#9414)
  • support all numeric dtypes in serde (#9393)
  • ensure part of the plan is streaming if aggregati… (#9387)
  • add relaxed concatenation (#9382)
  • add sql DROP TABLE (#9355)
  • support ternary expressions in streaming (#9343)
  • add decoding support for row format (#9339)
  • add SQL support for null-aware equality checks (#9332)
  • add SQL support for regular expression operators (~, !~, ~*, and !~*) (#9327)
  • support // integer floordiv operator in the SQL engine (#9324)
  • serde for 'to_physical' expr (#9294)
  • add join cardinality validation (#9278)
  • keep sorted flag after Expr::truncate (#9275)
  • add "sql_expr" function (#9248)
  • rewrite correlation functions to expression architecture (#9258)
  • keep sorted flag on offset_by (#9253)
  • add intersection primitive for selector API (#9240)
  • building blocks for expression expansion sets (#9231)
  • Add ddof option to rolling_var and rolling_std (#8957)
  • immediately flatten nested unions (#9220)
  • support float expression on integers (#9210)
  • add binary to list<u8> cast (#9161)
  • add arr.unique expression (#9159)
  • implement explode for DataType::Array (#9157)
  • Decimal type: sum, min, max aggregations in select and agg context. (#9135)
  • Decimal arithmetic (#9123)
  • support decimals as cast types in csv parser (#9121)
  • Improve error handling for repeat (#9117)
  • conversion from Utf8 to Decimal. (#9090)

🐞 Bug fixes

  • fix(rust,python) respect original series dtype when constructing LitIter (#9886)
  • sum aggregation empty set is 0, not null (#9894)
  • Allow None as exponent (#9880)
  • preserve expression aliases when parsing SQL with pl.sql_expr (#9875)
  • fmt unknown dtype (#9872)
  • fix row-encode of 32 byte payloads (#9843)
  • shrink_type on all-null columns (#9811)
  • don't go into streaming engine when groupby by list (#9834)
  • fix regex + exclude (#9827)
  • potential integer overflow in drop_many_amortized (#9829)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • fix array concat and Series::fill_null (#9825)
  • dont preserve sortedness in offset_by for tz-aware non-constant durations (#9818)
  • Remove stray arr.eval references (#9821)
  • fix row-encode of null data (#9813)
  • allow +00:00 when loading from arrow (#9747)
  • fix row-count schema (#9797)
  • fix supertype detection (#9787)
  • merge rev-maps when building list arrays of categoricals. (#9742)
  • Loosen restrictions on cut expressions and add docs (#9730)
  • Fix list symmetric difference (#9732)
  • Fix list intersection (#9735)
  • don't clear rev_map when categorical series is cle… (#9720)
  • fix(rust, python) improve glob pattern testing (#9721)
  • don't run hstack checks when using cached names (#9709)
  • fix result dtype in date_range(..., eager=True) if duration contains "1s1d" (#9670)
  • increment seed between samples (#9694)
  • fix cse_plan invalid projection removal (#9700)
  • fix ne_missing for booleans vs lit (#9693)
  • raise if to_datetime would have parsed input incorrectly (#9675)
  • respect time_zone in lazy date_range (#8591)
  • redo weighted rolling var (#9609)
  • Correct weighted rolling quantile definition (#9608)
  • clear hashes buffer in generic streaming joins (#9612)
  • stable list namespace ouput when all elements are … (#9610)
  • validate time zone in cast and from_arrow operations (#9598)
  • make json feature depend on "dtype-struct" feature (#9589)
  • fix join suffix collision (#9579)
  • fix sum consistency (#9576)
  • fix take of array dtype (#9575)
  • fix predicate pushdown case before sort (#9574)
  • fix lazy schema of temporal_range functions when no alias is provided (#9543)
  • change the path parameter from to (#9531)
  • fix join validation when swapped (#9534)
  • fix race condition in out-of-core sort (#9521)
  • unset sortedness for local date and local datetime (#9515)
  • maintain sortedness flags on append/extend (#9496)
  • fix serde for small integer dtypes (#9495)
  • raise if window size in rolling functions isn't strictly positive (#9465)
  • groupby rolling with negative offset (#9428)
  • date_range with unit microseconds was producing incorrect results (#9413)
  • read_csv was parsing dates incorrectly when the dtype was overridden (#9420)
  • Compute Spearman rank correlations using average ra… (#9415)
  • Fix rolling min/max when window is empty (#9406)
  • fix compilation of other rustc versions (#9392)
  • list zip with (#9367)
  • parquet + categorical (#9363)
  • respect startby in groupby_dynamic when every is greater than 1d (#9362)
  • raise groupby apply on empty frame (#9360)
  • raise more informative error on string arguments (#9352)
  • correct assertion (#9320)
  • fix rolling weighted mean (#9292)
  • raise on invalid sort_by (#9262)
  • correct ne/e_missing schema (#9257)
  • fix cached reproject offsets (#9254)
  • delay opening files in streaming engine (#9251)
  • ensure agg(F(lit)) == lit (#9222)
  • don't SO on concat(expressions) (#9214)
  • clip window_size to length in rolling_apply (#9209)
  • rolling_apply window_size == len (#9181)
  • respect time zone in strptime/to_datetime when exact=False (#9171)
  • make null chunking behavior equal to other dtypes (#9176)
  • return single numpy array in Array dtype -> numpy (#9164)
  • fix regression in boolean nulls comparison (#9142)
  • fix struct null_count if fields are null arrays (#9151)
  • categorical construction from null values (#9145)
  • let apply caller determine if length needs to be checked. (#9140)
  • struct is_in should upcast numeric types (#9110)
  • json_extract on empty series (#9126)
  • bubble up dtype when converting from arrow (#9120)
  • rolling_groupy was returning incorrect results when offset was positive (#9082)

🛠️ Other improvements

  • Rolling quantile and median use DynArgs (#9867)
  • Clean up workspace definition (#9861)
  • Fix all clippy warnings in the test suite (#9839)
  • Refactor failing test (#9823)
  • Remove stray arr.eval references (#9821)
  • fix cut features (#9808)
  • cluster file scans in one node (#9799)
  • Remove old cut/qcut (#9763)
  • Small updates to issue templates (#9789)
  • unswap from_tz and to_tz in replace_timezone (#9768)
  • More cleanup around arange (#9769)
  • More cleanup for arange (#9681)
  • Fix small typo (#9714)
  • refactor arange and add int_range/int_ranges (#9666)
  • clean up inconsistencies in duration string language (#9551)
  • ensure date-range integration test runs in CI (#9554)
  • remove some redundancies in sort (#9541)
  • Fix some doc examples (#9405)
  • Remove outda...
Read more

Python Polars 0.18.7

12 Jul 19:03
96b0e07
Compare
Choose a tag to compare

🚀 Performance improvements

  • speed up python object to AnyValue construction (#9840)
  • use hash set in drop_many (#9807)
  • speed up in series 10x (#9794)
  • Faster is_sorted when no flag set (#9777)

✨ Enhancements

  • Add Run-length Encoding functions (#9826)
  • add include_key parameter to partition_by (#9750)
  • add LEFT string function for SQL (#9836)
  • add REGEXP_LIKE function for SQL (both two and three parameter version) (#9838)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • Dedicated horizontal aggregation functions (#9752)
  • support numpy datetime64 units (from 'ns' to 'D') in polars.from_numpy (#9783)
  • implement with_row_count as private function (#9810)
  • add support for SQL SUBSTR function (#9803)
  • add SQL support for binary data and expand recognised SQL dtype strings (#9802)
  • add new duration selector and improve selector typing (#9772)
  • reworked comfy-table layout constraints, improving table wrapping/repr (#9744)

🐞 Bug fixes

  • fix row-encode of 32 byte payloads (#9843)
  • shrink_type on all-null columns (#9811)
  • don't go into streaming engine when groupby by list (#9834)
  • fix regex + exclude (#9827)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • fix array concat and Series::fill_null (#9825)
  • dont preserve sortedness in offset_by for tz-aware non-constant durations (#9818)
  • Remove stray arr.eval references (#9821)
  • fix row-encode of null data (#9813)
  • allow +00:00 when loading from arrow (#9747)
  • improve/fix write_database handling of db schema and quoted table names (#9788)
  • fix row-count schema (#9797)
  • fix supertype detection (#9787)
  • fix import error when writing parquet with pyarrow (#9760)

🛠️ Other improvements

  • Refactor failing test (#9823)
  • Remove stray arr.eval references (#9821)
  • Remove old cut/qcut (#9763)
  • improve note about the behaviour when converting from ns-precision temporal values to python-native types (#9798)
  • Small updates to issue templates (#9789)
  • More cleanup around arange (#9769)
  • add missing last entry (#9782)
  • Add rows_by_key docs (#9766)

Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @jonashaag, @magarick, @mcrumiller, @ritchie46 and @stinodego

Python Polars 0.18.6

06 Jul 16:19
71106c3
Compare
Choose a tag to compare

✨ Enhancements

  • allow qcut in window expressions (#9745)

🐞 Bug fixes

  • merge rev-maps when building list arrays of categoricals. (#9742)
  • Loosen restrictions on cut expressions and add docs (#9730)
  • Fix list symmetric difference (#9732)
  • Fix list intersection (#9735)

Thank you to all our contributors for making this release possible!
@magarick and @ritchie46

Python Polars 0.18.5

05 Jul 14:00
d6898f7
Compare
Choose a tag to compare

🏆 Highlights

  • drop Python 3.7 support (#9679)

🚀 Performance improvements

  • optimize n_unique for integers (#9568)
  • remove sort columns on multiple-key OOC sort (#9545)
  • don't needlessly trigger bitcount (#9561)
  • optimize _datetime_to_pl_timestamp (#9533)

✨ Enhancements

  • Improve cut and allow use in expressions (#9580)
  • clearer message when stringcache-related errors occur (#9715)
  • improve expression formatting (#9704)
  • set string cache in window functions (#9705)
  • raise on both sides of datetime/str comparison (#9692)
  • support deserializing struct json into df (#9688)
  • add tree formatter for expressions (#9684)
  • streamline adbc connectivity, adding snowflake support (#9600)
  • improve selector utility functions with better docstrings/examples (#9683)
  • add .list.any() and .list.all() (#9573)
  • extend dtype/selector matching for Datetime with a "*" wildcard for timezones (#9641)
  • add symmetric difference to list set operations (#9655)
  • Pass through stdin/stderr buffer in to_csv (#9624)
  • add dt.base_utc_offset (#9636)
  • add dt.dst_offset feature (#9629)
  • allow to specify index order in to_numpy (#9592)
  • accept expressions in repeat (#9614)
  • set operations for list (#9599)
  • make LazyFrame.map pickle (#9597)
  • add a new rows_by_key method, returning a keyed-dictionary of row data (#9567)
  • implement apply object -> struct (#9578)

🐞 Bug fixes

  • don't clear rev_map when categorical series is cle… (#9720)
  • fix(rust, python) improve glob pattern testing (#9721)
  • don't run hstack checks when using cached names (#9709)
  • fix result dtype in date_range(..., eager=True) if duration contains "1s1d" (#9670)
  • increment seed between samples (#9694)
  • fix cse_plan invalid projection removal (#9700)
  • fix ne_missing for booleans vs lit (#9693)
  • raise if to_datetime would have parsed input incorrectly (#9675)
  • respect time_zone in lazy date_range (#8591)
  • Align dependency versions (#9661)
  • redo weighted rolling var (#9609)
  • Correct weighted rolling quantile definition (#9608)
  • clear hashes buffer in generic streaming joins (#9612)
  • stable list namespace ouput when all elements are … (#9610)
  • address schema edge-case with scalar-expanded data that resolves to an empty frame (#9593)
  • handle dictionary init with unsized iterators that also hits the scalar-expansion fast path (#9594)
  • validate time zone in cast and from_arrow operations (#9598)
  • ensure from_dicts drops columns explicitly omitted from schema (#9581)
  • fix join suffix collision (#9579)
  • fix sum consistency (#9576)
  • fix take of array dtype (#9575)
  • fix predicate pushdown case before sort (#9574)
  • fix lazy schema of temporal_range functions when no alias is provided (#9543)
  • fix join validation when swapped (#9534)

🛠️ Other improvements

  • More cleanup for arange (#9681)
  • Fix some more type hints (#9716)
  • Added trivial examples for the aggregation of columns in groupby (#9708)
  • Fix some type hints (#9695)
  • additional ADBC examples and docstring information for read_database (inc snowflake) (#9686)
  • drop Python 3.7 support (#9679)
  • improve selector utility functions with better docstrings/examples (#9683)
  • refactor arange and add int_range/int_ranges (#9666)
  • Clarify Dataframe.corr operates on columns (#9678)
  • remove false "eager=True" from date_range tests (#9663)
  • Add examples to .merge_sorted (#9664)
  • bump maturin from 1.0.1 to 1.1.0 in /py-polars (#9646)
  • remove deprecation warning of already-enforced valid timezones change (#9639)
  • fix failing ci test (#9638)
  • fix inconsistency in .list.difference() example (#9615)
  • Clean up doctests for rolling (#9626)
  • fix faulty test of to_numpy (#9619)
  • examples for .list.union(), .list.difference(), .list.intersection() (#9602)
  • fix see also broken links (#9607)
  • clarify sortedness condition of groupby_dynamic and groupby_rolling (#9606)
  • clean up inconsistencies in duration string language (#9551)
  • Adding examples to binary functions (#9553)
  • Minor cleanup of arange (#9544)
  • Remove outdated badges from README (#9532)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @borchero, @datapythonista, @dependabot, @dependabot[bot], @eitsupi, @guanqun, @jeroenjanssens, @jorisSchaller, @kljensen, @magarick, @mcrumiller, @messense, @mishpat, @moritzwilksch, @ritchie46, @stinodego, @ttencate, @universalmind303 and @zundertj