Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute and use the initial string offset when building nested large string cols with chunked parquet reader #17702

Open
wants to merge 35 commits into
base: branch-25.02
Choose a base branch
from

Conversation

mhaseeb123
Copy link
Member

@mhaseeb123 mhaseeb123 commented Jan 9, 2025

Description

Closes #17692.

This PR enables computing the str_offset required to correctly compute the offsets columns for nested large strings columns with chunked Parquet reader when chunk_read_limit is small resulting in multiple output table chunks per subpass.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Jan 9, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 9, 2025
@mhaseeb123 mhaseeb123 added DO NOT MERGE Hold off on merging; see PR for details 2 - In Progress Currently a work in progress bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Jan 9, 2025
@mhaseeb123 mhaseeb123 requested a review from nvdbaranec January 10, 2025 03:07
@mhaseeb123 mhaseeb123 removed the DO NOT MERGE Hold off on merging; see PR for details label Jan 10, 2025
@mhaseeb123 mhaseeb123 changed the title 🚧 Compute and use the str_offset when reading nested large string cols with chunked Parquet reader. Compute and use the str_offset when reading nested large string cols with chunked Parquet reader. Jan 14, 2025
@mhaseeb123 mhaseeb123 requested a review from vuule January 14, 2025 00:35
@mhaseeb123 mhaseeb123 marked this pull request as ready for review January 14, 2025 00:36
@mhaseeb123 mhaseeb123 requested a review from a team as a code owner January 14, 2025 00:36
@mhaseeb123
Copy link
Member Author

CC: @etseidl would love your review here as well if possible!

@mhaseeb123 mhaseeb123 requested a review from davidwendt January 14, 2025 00:38
@mhaseeb123 mhaseeb123 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 14, 2025
* @brief Converts string sizes to offsets if this is not a large string column. Otherwise,
* atomically update the initial string offset to be used during large string column construction
*/
template <int block_size>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_lists can't be a tparam anymore as it is not known at compile time when called from page_delta_decode.cu. Also, we are only using it minimally at L120

@mhaseeb123 mhaseeb123 requested a review from davidwendt January 15, 2025 20:44
Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a reasonable solution. As we discussed offline, it would be nice to figure out a way to do this the same way as for small strings, but it's not worth holding a fix up for. Thanks @mhaseeb123.

cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved
cpp/src/io/parquet/parquet_gpu.hpp Outdated Show resolved Hide resolved
@mhaseeb123 mhaseeb123 added 4 - Needs Review Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Jan 16, 2025
@mhaseeb123
Copy link
Member Author

Benchmark Results

TLDR; Virtually no difference in performance observed before or after this PR.

Benchmark name

export LIBCUDF_LARGE_STRINGS_THRESHOLD=1 # Set to a small number
./PARQUET_READER_NVBENCH --devices 0 --benchmark parquet_read_long_strings  --timeout 10

Setup

RMM memory resource = pool
CUIO host memory resource = pinned_pool
# Devices

## [0] `NVIDIA RTX 5880 Ada Generation`
* SM Version: 890 (PTX Version: 860)
* Number of SMs: 110
* SM Default Clock Rate: 18446744071874 MHz
* Global Memory: 11669 MiB Free / 48632 MiB Total
* Global Memory Bus Peak: 960 GB/sec (384-bit DDR @10001MHz)
* Max Shared Memory: 100 KiB/SM, 48 KiB/Block
* L2 Cache Size: 98304 KiB
* Maximum Active Blocks: 24/SM
* Maximum Active Threads: 1536/SM, 1024/Block
* Available Registers: 65536/SM, 65536/Block
* ECC Enabled: No

Numbers

|    io_type    | card | run_len | avg_str_len | half_width_str_len | CPU Time  | CPU Time  | GPU Time  | GPU Time  | bytes_per_sec |bytes_per_sec | peak_mem_use |
|---------------|------|---------|-------------|--------------------|-----------|-----------|-----------|-----------|---------------|--------------|--------------|
| DEVICE_BUFFER |    0 |       1 |          16 |                 16 | 22.411 ms | 22.300 ms | 22.407 ms | 22.296 ms |   23960106331 |  24079783553 |    1.747 GiB |
| DEVICE_BUFFER | 1000 |       1 |          16 |                 16 |  6.285 ms |  6.257 ms |  6.281 ms |  6.254 ms |   85471579784 |  85847870159 |  813.778 MiB |
| DEVICE_BUFFER |    0 |      32 |          16 |                 16 | 22.411 ms | 22.298 ms | 22.406 ms | 22.294 ms |   23960615809 |  24081539542 |    1.747 GiB |
| DEVICE_BUFFER | 1000 |      32 |          16 |                 16 |  5.793 ms |  5.798 ms |  5.789 ms |  5.795 ms |   92739345133 |  92650408801 |  766.991 MiB |
| DEVICE_BUFFER |    0 |       1 |          48 |                 16 | 15.501 ms | 15.444 ms | 15.497 ms | 15.440 ms |   34642524169 |  34771608791 |    1.588 GiB |
| DEVICE_BUFFER | 1000 |       1 |          48 |                 16 |  4.878 ms |  4.858 ms |  4.874 ms |  4.853 ms |  110156722203 | 110616404710 |  634.026 MiB |
| DEVICE_BUFFER |    0 |      32 |          48 |                 16 | 15.527 ms | 15.462 ms | 15.523 ms | 15.458 ms |   34586216123 |  34730863251 |    1.588 GiB |
| DEVICE_BUFFER | 1000 |      32 |          48 |                 16 |  4.603 ms |  4.594 ms |  4.599 ms |  4.590 ms |  116743572126 | 116955312690 |  610.050 MiB |
| DEVICE_BUFFER |    0 |       1 |          96 |                 16 | 11.399 ms | 11.394 ms | 11.395 ms | 11.390 ms |   47113803944 |  47135793174 |    1.545 GiB |
| DEVICE_BUFFER | 1000 |       1 |          96 |                 16 |  5.032 ms |  5.022 ms |  5.027 ms |  5.018 ms |  106786967427 | 106996868092 |  582.177 MiB |
| DEVICE_BUFFER |    0 |      32 |          96 |                 16 | 11.384 ms | 11.369 ms | 11.381 ms | 11.365 ms |   47174601420 |  47237372399 |    1.545 GiB |
| DEVICE_BUFFER | 1000 |      32 |          96 |                 16 |  4.488 ms |  4.484 ms |  4.484 ms |  4.480 ms |  119717585351 | 119833053632 |  568.538 MiB |
| DEVICE_BUFFER |    0 |       1 |          48 |                 32 | 15.306 ms | 15.262 ms | 15.302 ms | 15.258 ms |   35085028167 |  35185790249 |    1.587 GiB |
| DEVICE_BUFFER | 1000 |       1 |          48 |                 32 |  4.874 ms |  4.863 ms |  4.870 ms |  4.859 ms |  110243053035 | 110498657016 |  633.421 MiB |
| DEVICE_BUFFER |    0 |      32 |          48 |                 32 | 15.306 ms | 15.298 ms | 15.302 ms | 15.294 ms |   35084783465 |  35104370889 |    1.587 GiB |
| DEVICE_BUFFER | 1000 |      32 |          48 |                 32 |  4.607 ms |  4.606 ms |  4.603 ms |  4.602 ms |  116641335041 | 116672711936 |  609.449 MiB |
| DEVICE_BUFFER |    0 |       1 |          96 |                 32 |  7.522 ms |  7.515 ms |  7.518 ms |  7.511 ms |   71410632094 |  71481641317 |    1.044 GiB |
| DEVICE_BUFFER | 1000 |       1 |          96 |                 32 |  4.993 ms |  5.003 ms |  4.989 ms |  4.999 ms |  107611092338 | 107400268495 |  581.852 MiB |
| DEVICE_BUFFER |    0 |      32 |          96 |                 32 |  7.522 ms |  7.515 ms |  7.518 ms |  7.511 ms |   71408148016 |  71478916872 |    1.044 GiB |
| DEVICE_BUFFER | 1000 |      32 |          96 |                 32 |  4.473 ms |  4.484 ms |  4.469 ms |  4.480 ms |  120124714056 | 119836246375 |  568.165 MiB |
| DEVICE_BUFFER |    0 |       1 |          96 |                 64 |  7.515 ms |  7.530 ms |  7.511 ms |  7.526 ms |   71476190692 |  71339816000 |    1.041 GiB |
| DEVICE_BUFFER | 1000 |       1 |          96 |                 64 |  4.991 ms |  4.995 ms |  4.987 ms |  4.991 ms |  107660435719 | 107573894386 |  581.221 MiB |
| DEVICE_BUFFER |    0 |      32 |          96 |                 64 |  7.520 ms |  7.525 ms |  7.516 ms |  7.521 ms |   71430902778 |  71385333236 |    1.041 GiB |
| DEVICE_BUFFER | 1000 |      32 |          96 |                 64 |  4.465 ms |  4.466 ms |  4.461 ms |  4.462 ms |  120347921787 | 120319374388 |  567.460 MiB |

cpp/include/cudf/detail/sizes_to_offsets_iterator.cuh Outdated Show resolved Hide resolved
cpp/tests/large_strings/parquet_tests.cpp Show resolved Hide resolved
cpp/tests/large_strings/parquet_tests.cpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_delta_decode.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/decode_fixed.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/reader_impl.cpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/reader_impl.cpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/reader_impl.cpp Outdated Show resolved Hide resolved
Copy link

copy-pr-bot bot commented Jan 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mhaseeb123 mhaseeb123 requested a review from vuule January 22, 2025 02:14
@davidwendt
Copy link
Contributor

/ok to test

@mhaseeb123 mhaseeb123 requested a review from vuule January 24, 2025 22:54
@mhaseeb123 mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs Review Waiting for reviewer to review or respond labels Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: Burndown
Development

Successfully merging this pull request may close these issues.

[BUG] Chunked parquet reader with small chunk_read_limit does not correctly read nested large string columns.
5 participants