v0.19.1
🚨 Breaking Changes
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Don't identify decimals as strings. (#7710) @vyasr
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Join APIs that return gathermaps (#7454) @shwina
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Refactor strings column factories (#7397) @harrism
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
🐛 Bug Fixes
- Fix returned column type when extracting from an empty list column (#8031) @jlowe
- Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
- Fix a
NameError
in meta dispatch API (#7996) @galipremsagar - Reindex in
DataFrame.__setitem__
(#7957) @galipremsagar - jitify direct-to-cubin compilation and caching. (#7919) @cwharris
- Use dynamic cudart for nvcomp in java build (#7896) @abellina
- fix "incompatible redefinition" warnings (#7894) @cwharris
- cudf consistently specifies the cuda runtime (#7887) @robertmaynard
- disable verbose output for jitify_preprocess (#7886) @cwharris
- CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
- Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
- cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
- Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
- Sort by index in groupby tests more consistently (#7802) @shwina
- Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
- Add decimal column handling in copy_type_metadata (#7788) @shwina
- Add column names validation in parquet writer (#7786) @galipremsagar
- Fix Java explode outer unit tests (#7782) @jlowe
- Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
- User resource fix for replace_nulls (#7769) @magnatelee
- Fix type dispatch for columnar replace_nulls (#7768) @jlowe
- Add
ignore_order
parameter to dask-cudf concat dispatch (#7765) @galipremsagar - Fix slicing and arrow representations of decimal columns (#7755) @vyasr
- Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
- Implement scatter for struct columns (#7752) @ttnghia
- Fix data corruption in string columns (#7746) @galipremsagar
- Fix string length in stripe dictionary building (#7744) @kaatish
- Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
- Enable dask dispatch to cuDF's
is_categorical_dtype
for cuDF objects (#7740) @brandon-b-miller - Fix dictionary size computation in ORC writer (#7737) @vuule
- Fix
cudf::cast
overflow fordecimal64
toint32_t
or smaller in certain cases (#7733) @codereport - Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Disable column_view data accessors for unsupported types (#7725) @jrhemstad
- Materialize
RangeIndex
whenindex=True
in parquet writer (#7711) @galipremsagar - Don't identify decimals as strings. (#7710) @vyasr
- Fix return type of
DataFrame.argsort
(#7706) @galipremsagar - Fix/correct cudf installed package requirements (#7688) @robertmaynard
- Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
- Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
- Fix internal compiler error during JNI Docker build (#7645) @jlowe
- Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
- Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
- Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
- Fix specifying GPU architecture in JNI build (#7612) @jlowe
- Fix ORC writer OOM issue (#7605) @vuule
- Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
- Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
- Fix missing Dask imports (#7580) @kkraus14
- CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
- Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
- Fix ORC writer output corruption with string columns (#7565) @vuule
- Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
- FIX Fix Anaconda upload args (#7558) @dillon-cullinan
- Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
- FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
- Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
- Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Decimal32 Build Fix (#7544) @razajafri
- FIX Retry conda output location (#7540) @dillon-cullinan
- fix missing renames of dask git branches from master to main (#7535) @kkraus14
- Remove detail from device_span (#7533) @rwlee
- Change dask and distributed branch to main (#7532) @dantegd
- Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
- Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
- Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
- Change jit launch to safe_launch (#7510) @devavret
- Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
- Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
- Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
- Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
- Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
- Correctly compile benchmarks (#7485) @robertmaynard
- Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
- Fix
__repr__
for categorical dtype (#7476) @galipremsagar - Java cleaner synchronization (#7474) @abellina
- Fix java float/double parsing tests (#7473) @revans2
- Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
- Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
- Missing
device_storage_dispatch
change affectingcudf::gather
(#7449) @codereport - fix cuFile JNI compile errors (#7445) @rongou
- Support
Series.__setitem__
with key to a new row (#7443) @isVoid - Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
- Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
- Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
- Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
- Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
- Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
- fix Arrow CMake file (#7358) @rongou
- Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
- Handle cupy array in
Dataframe.__setitem__
(#7340) @galipremsagar - Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
- FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan
📖 Documentation
- Fix join API doxygen (#7890) @shwina
- Add Resources to README. (#7697) @bdice
- Add
isin
examples in Docstring (#7479) @galipremsagar - Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
- Fix typo in regex.md doc page (#7363) @davidwendt
- Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe
🚀 New Features
- Enable basic reductions for decimal columns (#7776) @ChrisJar
- Enable join on decimal columns (#7764) @ChrisJar
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
- Add support for
unique
groupby aggregation (#7726) @shwina - Expose libcudf's label_bins function to cudf (#7724) @vyasr
- Adding support for equi-join on struct (#7720) @hyperbolic2346
- Add decimal column comparison operations (#7716) @isVoid
- Implement scan operations for decimal columns (#7707) @ChrisJar
- Enable typecasting between decimal and int (#7691) @ChrisJar
- Enable decimal support in parquet writer (#7673) @devavret
- Adds
list.unique
API (#7664) @isVoid - Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
- Add
lists.sort_values
API (#7657) @isVoid - Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
- Adds
explode
API (#7607) @isVoid - Adds
list.take
, python binding forcudf::lists::segmented_gather
(#7591) @isVoid - Implement cudf::label_bins() (#7554) @vyasr
- Add Python bindings for
lists::contains
(#7547) @skirui-source - cudf::row_bit_count() support. (#7534) @nvdbaranec
- Implement drop_list_duplicates (#7528) @ttnghia
- Add Python bindings for
lists::extract_lists_element
(#7505) @skirui-source - Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Enable type conversion from float to decimal type (#7450) @ChrisJar
- Add cython for converting strings/fixed-point functions (#7429) @davidwendt
- Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
- Implement groupby collect_set (#7420) @ttnghia
- Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
- Refactor strings column factories (#7397) @harrism
- Add groupby scan operations (sort groupby) (#7387) @karthikeyann
- Add cudf::explode_position (#7376) @hyperbolic2346
- Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
- Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
- Add
Series.drop
api (#7304) @isVoid - get_json_object() implementation (#7286) @nvdbaranec
- Python API for
LIstMethods.len()
(#7283) @isVoid - Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
- Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
- Fix inplace update of data and add Series.update (#7201) @galipremsagar
- Implement
cudf::group_by
(hash) fordecimal32
anddecimal64
(#7190) @codereport - Adding support to specify "level" parameter for
Dataframe.rename
(#7135) @skirui-source
🛠️ Improvements
- fix GDS include path for version 0.95 (#7877) @rongou
- Update
dask
+distributed
to2021.4.0
(#7858) @jakirkham - Add ability to extract include dirs from
CUDF_HOME
(#7848) @galipremsagar - Add USE_GDS as an option in build script (#7833) @pxLi
- add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
- Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
- Revert dask versioning of concat dispatch (#7823) @galipremsagar
- add copy methods in Java memory buffer (#7791) @rongou
- Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Turn on NVTX by default in java build (#7761) @tgravescs
- Add Java bindings to join gather map APIs (#7751) @jlowe
- Add replacements column support for Java replaceNulls (#7750) @jlowe
- Add Java bindings for row_bit_count (#7749) @jlowe
- Remove unused JVM array creation (#7748) @jlowe
- Added JNI support for new is_integer (#7739) @revans2
- Create and promote library aliases in libcudf installations (#7734) @trxcllnt
- Support groupby operations for decimal dtypes (#7731) @vyasr
- Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
- Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
- Use stream in groupby calls (#7705) @karthikeyann
- Update codeowners file (#7701) @ajschmidt8
- Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
- Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
- Misc Python/Cython optimizations (#7686) @shwina
- Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
- Add column_device_view to orc writer (#7676) @kaatish
- cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
- Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
- Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
- Feature/optimize accessor copy (#7660) @vyasr
- Fix
find_package(cudf)
(#7658) @trxcllnt - Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
- Add in JNI support for count_elements (#7651) @revans2
- Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
- Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
- Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
- Handle constructing a
cudf.Scalar
from acudf.Scalar
(#7639) @shwina - Add in JNI support for table partition (#7637) @revans2
- Add explicit fixed_point merge test (#7635) @codereport
- Add JNI support for IDENTITY hash partitioning (#7626) @revans2
- Java support on explode_outer (#7625) @sperlingxx
- Java support of casting string from/to decimal (#7623) @sperlingxx
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
- Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
- Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
- Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
- Add gbenchmarks for string substrings functions (#7603) @davidwendt
- Refactor string conversion check (#7599) @ttnghia
- JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
- Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
- ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
- Fix auto-detecting GPU architectures (#7593) @trxcllnt
- Reduce cudf library size (#7583) @robertmaynard
- Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
- Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
- Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
- Add gbenchmark for strings::concatenate (#7560) @davidwendt
- Update Changelog Link (#7550) @ajschmidt8
- Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
- Add
__repr__
for Column and ColumnAccessor (#7531) @shwina - Support Decimal DIV changes in cudf (#7527) @razajafri
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
- Add gbenchmarks for strings extract function (#7522) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Reduce compile time/size for scan.cu (#7516) @davidwendt
- Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
- Removed unneeded includes from traits.hpp (#7509) @davidwendt
- FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
- xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
- JNI bit cast (#7493) @revans2
- Combine rolling window function tests (#7480) @mythrocks
- Prepare Changelog for Automation (#7477) @ajschmidt8
- Java support for explode position (#7471) @sperlingxx
- Update 0.18 changelog entry (#7463) @ajschmidt8
- JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
- Join APIs that return gathermaps (#7454) @shwina
- Remove dependence on managed memory for multimap test (#7451) @jrhemstad
- Use cuFile for Parquet IO when available (#7444) @vuule
- Statistics cleanup (#7439) @kaatish
- Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Improve string gather performance (#7433) @jlowe
- Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
- Detail APIs for datetime functions (#7430) @magnatelee
- Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
- Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
- Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Simplify type dispatch with
device_storage_dispatch
(#7419) @codereport - Java support for casting of nested child columns (#7417) @razajafri
- Improve scalar string replace performance for long strings (#7415) @jlowe
- Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
- bitmask_or implementation with bitmask refactor (#7406) @rwlee
- Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
- Clean up included headers in
device_operators.cuh
(#7401) @codereport - Move nullable index iterator to indexalator factory (#7399) @davidwendt
- ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
- upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
- Add gbenchmark for strings find/contains functions (#7392) @davidwendt
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
- Added in JNI support for out of core sort algorithm (#7381) @revans2
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - jitify 2 support (#7372) @cwharris
- compile_udf: Cache PTX for similar functions (#7371) @gmarkall
- Add string scalar replace benchmark (#7369) @jlowe
- Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
- Update orc reader and writer fuzz tests (#7357) @galipremsagar
- Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast
Small Refactorings (#7352) @codereport- Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
- Use
cudf::detail::make_counting_transform_iterator
(#7338) @codereport - Change block size parameter from a global to a template param. (#7333) @nvdbaranec
- Partial clean up of ORC writer (#7324) @vuule
- Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
- Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
- Move
cudf::test::make_counting_transform_iterator
tocudf/detail/iterator.cuh
(#7306) @codereport - Use string literals in
fixed_point
release_assert
s (#7303) @codereport - Fix merge conflicts for #7295 (#7297) @ajschmidt8
- Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
- Abstracting block reduce and block scan from cuIO kernels with
cub
apis (#7278) @rgsl888prabhu - Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
- Refactor dictionary support for reductions any/all (#7242) @davidwendt
- Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
- Interval index and interval_range (#7182) @marlenezw
- avro reader integration tests (#7156) @cwharris
- Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
- Adding Interval Dtype (#6984) @marlenezw
- Cleaning up
for
loops withmake_(counting_)transform_iterator
(#6546) @codereport