Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOM issue # 242 #252

Merged
merged 1 commit into from
Feb 29, 2024
Merged

Conversation

pravinbhat
Copy link
Collaborator

@pravinbhat pravinbhat commented Feb 28, 2024

What this PR does:
Fixes OOM exception when partition file present and numParts has a high value

Which issue(s) this PR fixes:
Fixes #242

Checklist:

  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

Job fails with OOM exception when partition file present and numParts has a high value
@pravinbhat pravinbhat requested a review from a team as a code owner February 28, 2024 22:21
@pravinbhat pravinbhat changed the title Fixed issue # 242 Fix OOM issue # 242 Feb 28, 2024
Copy link
Collaborator

@faizalrub-datastax faizalrub-datastax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praveen, I see that the MAX_NUM_PARTS_FOR_PARTITION_FILE = 10. Does this mean always the numParts will be 10 when reading partition ranges from CSV file? There could be partition ranges that require more than 10 for large datasets.

@pravinbhat pravinbhat merged commit b00c735 into main Feb 29, 2024
10 checks passed
@msmygit msmygit deleted the issue/242-partitionfile-outofmemoryerror branch February 29, 2024 15:39
@pravinbhat
Copy link
Collaborator Author

Praveen, I see that the MAX_NUM_PARTS_FOR_PARTITION_FILE = 10. Does this mean always the numParts will be 10 when reading partition ranges from CSV file? There could be partition ranges that require more than 10 for large datasets.

Good question @faizalrub-datastax. No, the num-parts value further splits each partition that is listed in the partition csv file. i.e. If the partition file has 100 partitions min/max lines & num-parts is set to 10, that will effectively convert to 100 * 10 = 1000 partitions. What was happening now is, when we run a default job with 10K numparts & then there are a few failures, let say 100. If you rerun this job, it will retry those 100 failed partition by split each of those partition by 10K, so effectively 100 * 10K = 1M partitions, which causes the OOM. Note that MAX_NUM_PARTS_FOR_PARTITION_FILE prop is used only when using a partition-file with the partitions listed. Ideally in such scenarios, the num-part should be set to 1 unless the failure was caused due to a large partition-range. If it is a large partition-range the max we may need to further split will not be more than 10. Hope this clarifies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

java.lang.OutOfMemoryError: Java heap space when running validation CDM job with 4.1.11
3 participants