-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce data loading utility for reading from local cache or downloading from external URL #3282
Conversation
This pull request was exported from Phabricator. Differential Revision: D68790695 |
This pull request was exported from Phabricator. Differential Revision: D68790695 |
…ading from external URL (facebook#3282) Summary: Pull Request resolved: facebook#3282 ## Context Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold. ## Changes This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include: - Introduced AbstractParquetDataLoader class, providing a way to load parquet data from a cache on local disk or download from an external URL. - Implemented methods for: * Getting the cache path * Checking if the data is cached * Reading the data from the cache * Downloading from an external URL and caching the data - Added abstract properties for getting the directory name and URL of the cached file, allowing easy specialization for other benchmark datasets. With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from `ax.fb` to `ax`. Differential Revision: D68790695
37eb6ee
to
1ec00fb
Compare
This pull request was exported from Phabricator. Differential Revision: D68790695 |
1ec00fb
to
0f9d6d3
Compare
…ading from external URL (facebook#3282) Summary: Pull Request resolved: facebook#3282 ## Context Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold. ## Changes This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include: - Introduced AbstractParquetDataLoader class, providing a way to load parquet data from a cache on local disk or download from an external URL. - Implemented methods for: * Getting the cache path * Checking if the data is cached * Reading the data from the cache * Downloading from an external URL and caching the data - Added abstract properties for getting the directory name and URL of the cached file, allowing easy specialization for other benchmark datasets. With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from `ax.fb` to `ax`. ## WIP/TODO 1. Add new unit tests 2. Address OSS coverage requirements Differential Revision: D68790695
This pull request was exported from Phabricator. Differential Revision: D68790695 |
…ading from external URL (facebook#3282) Summary: Pull Request resolved: facebook#3282 ## Context Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold. ## Changes This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include: - Introduced AbstractParquetDataLoader class, providing a way to load parquet data from a cache on local disk or download from an external URL. - Implemented methods for: * Getting the cache path * Checking if the data is cached * Reading the data from the cache * Downloading from an external URL and caching the data - Added abstract properties for getting the directory name and URL of the cached file, allowing easy specialization for other benchmark datasets. With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from `ax.fb` to `ax`. ## WIP/TODO 1. Add new unit tests 2. Address OSS coverage requirements Differential Revision: D68790695
0f9d6d3
to
468bb5a
Compare
This pull request was exported from Phabricator. Differential Revision: D68790695 |
…ading from external URL (facebook#3282) Summary: Pull Request resolved: facebook#3282 ## Context Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold. ## Changes This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include: - Introduced AbstractParquetDataLoader class, providing a way to load parquet data from a cache on local disk or download from an external URL. - Implemented methods for: * Getting the cache path * Checking if the data is cached * Reading the data from the cache * Downloading from an external URL and caching the data - Added abstract properties for getting the directory name and URL of the cached file, allowing easy specialization for other benchmark datasets. With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from `ax.fb` to `ax`. ## WIP/TODO 1. Add new unit tests 2. Address OSS coverage requirements Differential Revision: D68790695
468bb5a
to
05cece7
Compare
This pull request was exported from Phabricator. Differential Revision: D68790695 |
…ading from external URL (facebook#3282) Summary: Pull Request resolved: facebook#3282 ## Context Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold. ## Changes This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include: - Introduced AbstractParquetDataLoader class, providing a way to load parquet data from a cache on local disk or download from an external URL. - Implemented methods for: * Getting the cache path * Checking if the data is cached * Reading the data from the cache * Downloading from an external URL and caching the data - Added abstract properties for getting the directory name and URL of the cached file, allowing easy specialization for other benchmark datasets. With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from `ax.fb` to `ax`. ## WIP/TODO 1. Add new unit tests 2. Address OSS coverage requirements Differential Revision: D68790695
05cece7
to
ef84f84
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3282 +/- ##
==========================================
- Coverage 96.04% 95.74% -0.30%
==========================================
Files 518 525 +7
Lines 52162 52468 +306
==========================================
+ Hits 50098 50238 +140
- Misses 2064 2230 +166 ☔ View full report in Codecov by Sentry. |
This pull request was exported from Phabricator. Differential Revision: D68790695 |
…ading from external URL (facebook#3282) Summary: Pull Request resolved: facebook#3282 ## Context Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold. ## Changes This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include: - Introduced AbstractParquetDataLoader class, providing a way to load parquet data from a cache on local disk or download from an external URL. - Implemented methods for: * Getting the cache path * Checking if the data is cached * Reading the data from the cache * Downloading from an external URL and caching the data - Added abstract properties for getting the directory name and URL of the cached file, allowing easy specialization for other benchmark datasets. With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from `ax.fb` to `ax`. ## WIP/TODO 1. Add new unit tests 2. Address OSS coverage requirements Differential Revision: D68790695
ef84f84
to
b2d2497
Compare
This pull request was exported from Phabricator. Differential Revision: D68790695 |
b2d2497
to
981c50b
Compare
…ading from external URL (facebook#3282) Summary: Pull Request resolved: facebook#3282 ## Context Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold. ## Changes This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include: - Introduced AbstractParquetDataLoader class, providing a way to load parquet data from a cache on local disk or download from an external URL. - Implemented methods for: * Getting the cache path * Checking if the data is cached * Reading the data from the cache * Downloading from an external URL and caching the data - Added abstract properties for getting the directory name and URL of the cached file, allowing easy specialization for other benchmark datasets. With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from `ax.fb` to `ax`. ## WIP/TODO 1. Add new unit tests 2. Address OSS coverage requirements Reviewed By: esantorella Differential Revision: D68790695
…ading from external URL (facebook#3282) Summary: Pull Request resolved: facebook#3282 ## Context Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold. ## Changes This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include: - Introduced AbstractParquetDataLoader class, providing a way to load parquet data from a cache on local disk or download from an external URL. - Implemented methods for: * Getting the cache path * Checking if the data is cached * Reading the data from the cache * Downloading from an external URL and caching the data - Added abstract properties for getting the directory name and URL of the cached file, allowing easy specialization for other benchmark datasets. With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from `ax.fb` to `ax`. ## WIP/TODO 1. Add new unit tests 2. Address OSS coverage requirements Reviewed By: esantorella Differential Revision: D68790695
This pull request was exported from Phabricator. Differential Revision: D68790695 |
981c50b
to
61be9d1
Compare
This pull request was exported from Phabricator. Differential Revision: D68790695 |
This pull request has been merged in 38916d1. |
Summary:
Context
Our preprocessed and compressed derivatives of open-source benchmarking datasets (e.g., LCBench) are currently hosted in Manifold blob storage, which limits their accessibility in our open-source software (OSS). To address this, we need to remove the dependency on Manifold.
Changes
This diff introduces a data download utility that enables loading Pandas DataFrames (stored in a compressed parquet format) from local disk or downloading it from an external URL source if not found. The key changes include:
With these changes, we can now make our LCBench surrograte benchmark problems accessible in OSS and move from
ax.fb
toax
.Differential Revision: D68790695