Automates the deployment of GPU-enabled servers on Google Cloud Platform (GCP) by dynamically identifying available zones for both CPU and GPU resources.
This script streamlines the process of deploying GCP Compute Engine instances equipped with NVIDIA Tesla T4 GPUs. It dynamically detects zones where both the required machine type (n1-standard-8
) and GPU (nvidia-tesla-t4
) are available, automates Terraform execution, and ensures resilience with error handling.
- Google Cloud CLI (gcloud) installed and configured
- Amazon Web Service CLI (aws) installed and configured
- Terraform installed (>= 1.10)
- Make utility installed
- Valid GCP credentials with compute permissions
-
Zone Discovery:
- The script compares the available zones for both the CPU (
n1-standard-8
) and GPU (nvidia-tesla-t4
) usingcomm
andgcloud
commands. - Overlapping zones where both resources are available are stored in an array.
- The script compares the available zones for both the CPU (
-
Dynamic Deployment:
- Iterates over the discovered zones.
- Sets the
TF_VAR_region
andTF_VAR_zone
environment variables for Terraform. - Executes
make
to trigger the Terraform workflow.
-
Error Handling:
- If
make
fails, the script waits for 30 seconds and runsmake clean
. - Continues to the next zone if deployment fails.
- Exits successfully upon the first successful deployment.
- If
.
├── create_server_with_dynamic_zones.sh # This script
├── terraform.prod.tfvars # Dockerhub credentials
├── credentials.json # GCP credentials
├── Makefile # Terraform commands
├── README.md # Project documentation
├── .env # Environment variables
└── src
├── main.tf # Terraform main config
├── provider.tf # Terraform provider config
├── storage.tf # Google cloud storage config
├── output.tf # Google cloud storage config output
├── modules # Terraform modules
│ ├── vpc
│ │ ├── main.tf
│ │ └── variables.tf
│ └── worker
│ ├── main.tf
│ └── variables.tf
└── variables.tf
-
Configure GCP Authentication:
gcloud auth login gcloud config set project [YOUR_PROJECT_ID]
-
Run the Script:
bash create_server_with_dynamic_zones.sh
-
Monitor Deployment:
- The script will attempt deployment in available zones.
- If successful, the script exits.
- On failure, it retries in the next available zone.
-
Change GPU Type: Modify the
TF_VAR_machine_type
env in the script:export TF_VAR_machine_type="n1-standard-8"
Replace with your desired GPU type (e.g.,
nvidia-tesla-v100
). -
Change GPU count: Modify the
TF_VAR_gpu_count
env in the script:export TF_VAR_gpu_count=1
Replace with the desired GPU count (e.g.,
2
). -
Change CPU Type: Modify the
TF_VAR_gpu_type
env in the script:export TF_VAR_gpu_type="nvidia-tesla-t4"
Replace with the desired machine type (e.g.,
n2-standard-16
).
For any issues or inquiries, please contact @falconlee236 or email at [email protected].
This project is licensed under the Apache 2.0 License.