Skip to content

Pull requests: NVIDIA/NeMo-Curator

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Clean up Pandas, cuDF, Dask, and Dask-cuDF DocumentDataset type logic gpuci Run GPU CI/CD on PR
#494 opened Jan 23, 2025 by sarahyurick Loading…
chore: Add license file
#493 opened Jan 21, 2025 by ko3n1g Loading…
3 tasks
Standardize text_field and id_field terminology gpuci Run GPU CI/CD on PR
#485 opened Jan 17, 2025 by sarahyurick Loading…
Minor CrossFit improvements gpuci Run GPU CI/CD on PR
#483 opened Jan 16, 2025 by sarahyurick Loading…
Add nemo-toolkit dependency to gpuCI gpuci Run GPU CI/CD on PR
#480 opened Jan 10, 2025 by sarahyurick Loading…
Enable ADD ID to work with CPU/GPU both gpuci Run GPU CI/CD on PR
#479 opened Jan 10, 2025 by VibhuJawa Loading…
Support dask_expr migration into dask.dataframe
#477 opened Jan 9, 2025 by rjzamora Loading…
3 tasks
[WIP] Efficient Removal Duplicate Code
#472 opened Jan 7, 2025 by praateekmahajan Draft
3 tasks
[pre-commit.ci] pre-commit suggestions
#470 opened Jan 7, 2025 by pre-commit-ci bot Loading…
[WIP] Add RAPIDS Nightly to GPU CI gpuci Run GPU CI/CD on PR
#436 opened Dec 17, 2024 by praateekmahajan Draft
3 tasks
Updating the Quick Example
#432 opened Dec 16, 2024 by stsfaroz Loading…
Add TrafilaturaExtractor class
#431 opened Dec 13, 2024 by sarahyurick Loading…
Bump nltk from 3.8.1 to 3.9 in /tutorials/dapt-curation/code dependencies Pull requests that update a dependency file
#429 opened Dec 13, 2024 by dependabot bot Loading…
Create notebook tutorials for distributed data classifiers documentation Improvements or additions to documentation
#415 opened Dec 6, 2024 by sarahyurick Loading…
3 tasks done
Version bump to 0.6.0rc1.dev0
#396 opened Nov 27, 2024 by github-actions bot Loading…
Fix GPU error messages for fuzzy deduplication gpuci Run GPU CI/CD on PR
#387 opened Nov 22, 2024 by sarahyurick Loading…
2 tasks done
Fuzzy Dedup: Make skipping the False positive check the default enhancement New feature or request gpuci Run GPU CI/CD on PR
#386 opened Nov 21, 2024 by ayushdg Loading…
2 of 3 tasks
Remove max_text_bytes_per_part gpuci Run GPU CI/CD on PR
#385 opened Nov 20, 2024 by sarahyurick Loading…
Create Cache class for exact, fuzzy, and semantic deduplication gpuci Run GPU CI/CD on PR
#384 opened Nov 19, 2024 by sarahyurick Draft
2 of 4 tasks
ci: Add copyright-check workflow
#369 opened Nov 14, 2024 by ko3n1g Loading…
3 tasks
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.