Compute and use the initial string offset when building `nested` large string cols with chunked parquet reader #17702

mhaseeb123 · 2025-01-09T05:07:56Z

Description

This PR enables computing the str_offset required to correctly compute the offsets columns for nested large strings columns with chunked Parquet reader when chunk_read_limit is small resulting in multiple output table chunks per subpass.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-01-09T05:08:01Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mhaseeb123 · 2025-01-14T00:36:40Z

CC: @etseidl would love your review here as well if possible!

cpp/include/cudf/detail/sizes_to_offsets_iterator.cuh

cpp/tests/large_strings/merge_tests.cpp

mhaseeb123 · 2025-01-15T20:25:26Z

cpp/src/io/parquet/page_string_utils.cuh

+ * @brief Converts string sizes to offsets if this is not a large string column. Otherwise,
+ * atomically update the initial string offset to be used during large string column construction
+ */
+template <int block_size>


has_lists can't be a tparam anymore as it is not known at compile time when called from page_delta_decode.cu. Also, we are only using it minimally at L120

etseidl

Looks like a reasonable solution. As we discussed offline, it would be nice to figure out a way to do this the same way as for small strings, but it's not worth holding a fix up for. Thanks @mhaseeb123.

cpp/src/io/parquet/page_string_utils.cuh

cpp/src/io/parquet/parquet_gpu.hpp

…rings

mhaseeb123 · 2025-01-16T22:33:12Z

Benchmark Results

TLDR; Virtually no difference in performance observed before or after this PR.

Benchmark name

export LIBCUDF_LARGE_STRINGS_THRESHOLD=1 # Set to a small number
./PARQUET_READER_NVBENCH --devices 0 --benchmark parquet_read_long_strings  --timeout 10

Setup

RMM memory resource = pool
CUIO host memory resource = pinned_pool
# Devices

## [0] `NVIDIA RTX 5880 Ada Generation`
* SM Version: 890 (PTX Version: 860)
* Number of SMs: 110
* SM Default Clock Rate: 18446744071874 MHz
* Global Memory: 11669 MiB Free / 48632 MiB Total
* Global Memory Bus Peak: 960 GB/sec (384-bit DDR @10001MHz)
* Max Shared Memory: 100 KiB/SM, 48 KiB/Block
* L2 Cache Size: 98304 KiB
* Maximum Active Blocks: 24/SM
* Maximum Active Threads: 1536/SM, 1024/Block
* Available Registers: 65536/SM, 65536/Block
* ECC Enabled: No

Numbers

|    io_type    | card | run_len | avg_str_len | half_width_str_len | CPU Time  | CPU Time  | GPU Time  | GPU Time  | bytes_per_sec |bytes_per_sec | peak_mem_use |
|---------------|------|---------|-------------|--------------------|-----------|-----------|-----------|-----------|---------------|--------------|--------------|
| DEVICE_BUFFER |    0 |       1 |          16 |                 16 | 22.411 ms | 22.300 ms | 22.407 ms | 22.296 ms |   23960106331 |  24079783553 |    1.747 GiB |
| DEVICE_BUFFER | 1000 |       1 |          16 |                 16 |  6.285 ms |  6.257 ms |  6.281 ms |  6.254 ms |   85471579784 |  85847870159 |  813.778 MiB |
| DEVICE_BUFFER |    0 |      32 |          16 |                 16 | 22.411 ms | 22.298 ms | 22.406 ms | 22.294 ms |   23960615809 |  24081539542 |    1.747 GiB |
| DEVICE_BUFFER | 1000 |      32 |          16 |                 16 |  5.793 ms |  5.798 ms |  5.789 ms |  5.795 ms |   92739345133 |  92650408801 |  766.991 MiB |
| DEVICE_BUFFER |    0 |       1 |          48 |                 16 | 15.501 ms | 15.444 ms | 15.497 ms | 15.440 ms |   34642524169 |  34771608791 |    1.588 GiB |
| DEVICE_BUFFER | 1000 |       1 |          48 |                 16 |  4.878 ms |  4.858 ms |  4.874 ms |  4.853 ms |  110156722203 | 110616404710 |  634.026 MiB |
| DEVICE_BUFFER |    0 |      32 |          48 |                 16 | 15.527 ms | 15.462 ms | 15.523 ms | 15.458 ms |   34586216123 |  34730863251 |    1.588 GiB |
| DEVICE_BUFFER | 1000 |      32 |          48 |                 16 |  4.603 ms |  4.594 ms |  4.599 ms |  4.590 ms |  116743572126 | 116955312690 |  610.050 MiB |
| DEVICE_BUFFER |    0 |       1 |          96 |                 16 | 11.399 ms | 11.394 ms | 11.395 ms | 11.390 ms |   47113803944 |  47135793174 |    1.545 GiB |
| DEVICE_BUFFER | 1000 |       1 |          96 |                 16 |  5.032 ms |  5.022 ms |  5.027 ms |  5.018 ms |  106786967427 | 106996868092 |  582.177 MiB |
| DEVICE_BUFFER |    0 |      32 |          96 |                 16 | 11.384 ms | 11.369 ms | 11.381 ms | 11.365 ms |   47174601420 |  47237372399 |    1.545 GiB |
| DEVICE_BUFFER | 1000 |      32 |          96 |                 16 |  4.488 ms |  4.484 ms |  4.484 ms |  4.480 ms |  119717585351 | 119833053632 |  568.538 MiB |
| DEVICE_BUFFER |    0 |       1 |          48 |                 32 | 15.306 ms | 15.262 ms | 15.302 ms | 15.258 ms |   35085028167 |  35185790249 |    1.587 GiB |
| DEVICE_BUFFER | 1000 |       1 |          48 |                 32 |  4.874 ms |  4.863 ms |  4.870 ms |  4.859 ms |  110243053035 | 110498657016 |  633.421 MiB |
| DEVICE_BUFFER |    0 |      32 |          48 |                 32 | 15.306 ms | 15.298 ms | 15.302 ms | 15.294 ms |   35084783465 |  35104370889 |    1.587 GiB |
| DEVICE_BUFFER | 1000 |      32 |          48 |                 32 |  4.607 ms |  4.606 ms |  4.603 ms |  4.602 ms |  116641335041 | 116672711936 |  609.449 MiB |
| DEVICE_BUFFER |    0 |       1 |          96 |                 32 |  7.522 ms |  7.515 ms |  7.518 ms |  7.511 ms |   71410632094 |  71481641317 |    1.044 GiB |
| DEVICE_BUFFER | 1000 |       1 |          96 |                 32 |  4.993 ms |  5.003 ms |  4.989 ms |  4.999 ms |  107611092338 | 107400268495 |  581.852 MiB |
| DEVICE_BUFFER |    0 |      32 |          96 |                 32 |  7.522 ms |  7.515 ms |  7.518 ms |  7.511 ms |   71408148016 |  71478916872 |    1.044 GiB |
| DEVICE_BUFFER | 1000 |      32 |          96 |                 32 |  4.473 ms |  4.484 ms |  4.469 ms |  4.480 ms |  120124714056 | 119836246375 |  568.165 MiB |
| DEVICE_BUFFER |    0 |       1 |          96 |                 64 |  7.515 ms |  7.530 ms |  7.511 ms |  7.526 ms |   71476190692 |  71339816000 |    1.041 GiB |
| DEVICE_BUFFER | 1000 |       1 |          96 |                 64 |  4.991 ms |  4.995 ms |  4.987 ms |  4.991 ms |  107660435719 | 107573894386 |  581.221 MiB |
| DEVICE_BUFFER |    0 |      32 |          96 |                 64 |  7.520 ms |  7.525 ms |  7.516 ms |  7.521 ms |   71430902778 |  71385333236 |    1.041 GiB |
| DEVICE_BUFFER | 1000 |      32 |          96 |                 64 |  4.465 ms |  4.466 ms |  4.461 ms |  4.462 ms |  120347921787 | 120319374388 |  567.460 MiB |

cpp/include/cudf/detail/sizes_to_offsets_iterator.cuh

cpp/tests/large_strings/parquet_tests.cpp

cpp/src/io/parquet/page_string_utils.cuh

cpp/src/io/parquet/page_delta_decode.cu

cpp/src/io/parquet/decode_fixed.cu

cpp/src/io/parquet/reader_impl.cpp

….com/mhaseeb123/cudf into fix/str_offset-nested-large-str-cols

copy-pr-bot · 2025-01-22T01:37:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

davidwendt · 2025-01-22T15:23:48Z

/ok to test

cpp/src/io/parquet/reader_impl.cpp

Compute and use str_offset for large nested string cols.

1ce570c

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 9, 2025

github-actions bot assigned mhaseeb123 Jan 9, 2025

mhaseeb123 added DO NOT MERGE Hold off on merging; see PR for details 2 - In Progress Currently a work in progress bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Jan 9, 2025

mhaseeb123 added 4 commits January 10, 2025 01:12

Clean up, add docstrings

e73432d

Fix copyright year

a146cd6

Fix comment

9f4ede3

Revert comment

edaff09

mhaseeb123 requested a review from nvdbaranec January 10, 2025 03:07

Merge branch 'branch-25.02' into fix/str_offset-nested-large-str-cols

2a279de

mhaseeb123 removed the DO NOT MERGE Hold off on merging; see PR for details label Jan 10, 2025

Minor optimization. Sync stream only for str_offsets vector

0d2317a

mhaseeb123 changed the title ~~🚧 Compute and use the str_offset when reading nested large string cols with chunked Parquet reader.~~ Compute and use the str_offset when reading nested large string cols with chunked Parquet reader. Jan 14, 2025

mhaseeb123 requested a review from vuule January 14, 2025 00:35

mhaseeb123 marked this pull request as ready for review January 14, 2025 00:36

mhaseeb123 requested a review from a team as a code owner January 14, 2025 00:36

Remove leftover cout

9622620

mhaseeb123 requested a review from davidwendt January 14, 2025 00:38

mhaseeb123 commented Jan 14, 2025

View reviewed changes

cpp/include/cudf/detail/sizes_to_offsets_iterator.cuh Outdated Show resolved Hide resolved

mhaseeb123 requested a review from PointKernel January 14, 2025 00:41

mhaseeb123 commented Jan 14, 2025

View reviewed changes

cpp/tests/large_strings/merge_tests.cpp Outdated Show resolved Hide resolved

mhaseeb123 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 14, 2025

Remove code duplication with utility function

38652c0

mhaseeb123 commented Jan 15, 2025

View reviewed changes

mhaseeb123 requested a review from davidwendt January 15, 2025 20:44

etseidl approved these changes Jan 16, 2025

View reviewed changes

cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved

cpp/src/io/parquet/parquet_gpu.hpp Outdated Show resolved Hide resolved

Trivial refactoring. Separate offset functions for small and large st…

fbe52d6

…rings

mhaseeb123 added 4 - Needs Review Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Jan 16, 2025

mhaseeb123 added 3 commits January 16, 2025 22:12

Revert unsetenv in merge_tests

399d8b0

Revert copyrights year

b7564d8

Fix copyrights year

c9ed1fc

Merge branch 'branch-25.02' into fix/str_offset-nested-large-str-cols

b5ff95f

vuule reviewed Jan 21, 2025

View reviewed changes

mhaseeb123 added 2 commits January 22, 2025 01:37

Address review comments

16b9843

Merge branch 'fix/str_offset-nested-large-str-cols' of https://github…

e838e72

….com/mhaseeb123/cudf into fix/str_offset-nested-large-str-cols

Remove unnecessary stream sync for h_initial_str_offsets

1a34807

mhaseeb123 requested a review from vuule January 22, 2025 02:14

Merge branch 'branch-25.02' into fix/str_offset-nested-large-str-cols

7990f7c

vuule reviewed Jan 24, 2025

View reviewed changes

cpp/src/io/parquet/reader_impl.cpp Outdated Show resolved Hide resolved

vuule reviewed Jan 24, 2025

View reviewed changes

cpp/src/io/parquet/reader_impl.cpp Outdated Show resolved Hide resolved

Address review comments

d8231ca

mhaseeb123 requested a review from vuule January 24, 2025 22:54

Merge branch 'branch-25.02' into fix/str_offset-nested-large-str-cols

c668fc6

vuule reviewed Jan 24, 2025

View reviewed changes

cpp/src/io/parquet/reader_impl.cpp Outdated Show resolved Hide resolved

mhaseeb123 added 2 commits January 24, 2025 23:06

Fill host vector with std instead of thrust

218782d

Fix the sanity check for offsets

3cfbe24

vuule approved these changes Jan 24, 2025

View reviewed changes

mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs Review Waiting for reviewer to review or respond labels Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute and use the initial string offset when building `nested` large string cols with chunked parquet reader #17702

Compute and use the initial string offset when building `nested` large string cols with chunked parquet reader #17702

mhaseeb123 commented Jan 9, 2025 •

edited

Loading

copy-pr-bot bot commented Jan 9, 2025

mhaseeb123 commented Jan 14, 2025

mhaseeb123 Jan 15, 2025

etseidl left a comment

mhaseeb123 commented Jan 16, 2025

copy-pr-bot bot commented Jan 22, 2025

davidwendt commented Jan 22, 2025

Compute and use the initial string offset when building nested large string cols with chunked parquet reader #17702

Are you sure you want to change the base?

Compute and use the initial string offset when building nested large string cols with chunked parquet reader #17702

Conversation

mhaseeb123 commented Jan 9, 2025 • edited Loading

Description

Checklist

copy-pr-bot bot commented Jan 9, 2025

mhaseeb123 commented Jan 14, 2025

mhaseeb123 Jan 15, 2025

Choose a reason for hiding this comment

etseidl left a comment

Choose a reason for hiding this comment

mhaseeb123 commented Jan 16, 2025

Benchmark Results

Benchmark name

Setup

Numbers

copy-pr-bot bot commented Jan 22, 2025

davidwendt commented Jan 22, 2025

Compute and use the initial string offset when building `nested` large string cols with chunked parquet reader #17702

Compute and use the initial string offset when building `nested` large string cols with chunked parquet reader #17702

mhaseeb123 commented Jan 9, 2025 •

edited

Loading