The transforms are delivered as a standard pyton library available on pypi and can be installed using pip install:
python -m pip install data-prep-toolkit-transforms[all]
or
python -m pip install data-prep-toolkit-transforms[ray, all]
or
python -m pip install data-prep-toolkit-transforms[language]
installing the python transforms will also install data-prep-toolkit
installing the ray transforms will also install data-prep-toolkit[ray]
Note: This list includes the transforms that were part of the release starting with data-prep-toolkit-transforms:0.2.1. This list may not always reflect up to date information. Users are encourage to raise an issue in git when they discover missing components or packages that are listed below but not in the current release they get from pypi.
- code
- language
- universal
Added Gneissweb transforms
fdedup fix for windows
PR #979 (code_profiler)
Added Profiler
Added Resize
Added Pii Redactor
Relax fasttext requirement >= 0.9.2
Added missing ray implementation for lang_id, doc_quality, tokenization and filter
Added ray notebooks for lang id, Doc Quality, tokenization, and Filter
Added code_profiler
Relax dependencies on pandas (use latest or whatever is installed by application) Relax dependencies on requests (use latest or whatever is installed by application)