[FEAT]: Reduce API throttling by reading all env variables per env at once. #2526

DavidWHarvey · 2024-12-30T16:50:46Z

Describe the need

When running plans against 64 repos, we hit API throttling at about an order of magnitude earlier due to the reading of env variables one at a time rather than one API call to get all of them (the data source can read them all in one call). WRT number of resources per repo, env variables currently dominate, since there may be 10 or 50 per environment.

We are re-planning repos and Azure resources (using a matrix per repo - so one plan per repo in parallel), and the process takes about 8 minutes on 64 repos, consuming about 60% of the budget, i.e., we cannot run this twice in the same hour.

I have identified 3 possible solutions:

on a read, read all values for the environment and cache them in memory. This is simple, but I'm unclear about how this might affect terraform.
create a new resource that holds the env variable map for the entire environment, similar to the data source.
Work around this with a provisioner and the data source to trigger the provisioner to run. The provisioner would sync a per env map with the contents of the environment.

SDK Version

No response

API Version

No response

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

DavidWHarvey · 2025-02-13T13:04:13Z

We implemented the provisioner workaround, with a single terraform_data. The "used" is now <400 vs ~9000 before this change for one run on 137 repos. I don't understand the metric here, but the limit is 15000.

The provisioner calls a simple python script which is passed a old & new map for an env, and syncs them, calling the gh command. The tedious parts after remembering to pass GH_TOKEN but claim it is nonsensitive since it is not logged deal with the data resource to read env vars not tolerating non-existing repo or env. You cannot use other data members to filter the keys to use for a foreach. We had to make an API call prior to running terraform to generate a list of environments for the repo to avoid errors on enumerating env vars.

Because of terraform not wanting you to act on prior state, drift detection was also tedious. We want to trigger replacement if current != desired to handle the case where current changed but desired didn't. But we need some desired prior state into to avoid needing a second plan/apply to clear the fact that we triggered due to a mismatch. Adding a var "LAST_CHANGED" set to a timestamp provided a means to force a trigger to replace (resync) but record a state that would be correct on the next run.

DavidWHarvey added Status: Triage This is being looked at and prioritized Type: Feature New feature or request labels Dec 30, 2024

octokitbot added this to 🧰 Octokit Active Dec 30, 2024

github-project-automation bot moved this to 🆕 Triage in 🧰 Octokit Active Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Reduce API throttling by reading all env variables per env at once. #2526

[FEAT]: Reduce API throttling by reading all env variables per env at once. #2526

DavidWHarvey commented Dec 30, 2024

DavidWHarvey commented Feb 13, 2025

[FEAT]: Reduce API throttling by reading all env variables per env at once. #2526

[FEAT]: Reduce API throttling by reading all env variables per env at once. #2526

Comments

DavidWHarvey commented Dec 30, 2024

Describe the need

SDK Version

API Version

Relevant log output

Code of Conduct

DavidWHarvey commented Feb 13, 2025