Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump CUDA from 11.2 to 11.8 #505

Merged
merged 5 commits into from
Dec 18, 2023
Merged

Bump CUDA from 11.2 to 11.8 #505

merged 5 commits into from
Dec 18, 2023

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Dec 12, 2023

The CUDA 11.8 migration across conda-forge is practically complete (see https://conda-forge.org/status/#cuda118), so we can start updating to a newer version of CUDA!

This should add support for the new NVIDIA Ada Lovelace and Hopper generation GPUs that requires compute capability 8.9 or 9.0 (see https://docs.nvidia.com/cuda/archive/11.8.0/hopper-compatibility-guide/index.html#verifying-hopper-compatibility-using-cuda-11-8).

Note:

  • There are actually CUDA 12.0 builds on conda-forge already, but it's probably good to have some docker images with CUDA 11.8 first, and transition to CUDA 12 later.
  • CUDA 11.x has good forward and backwards compatibilty (see https://docs.nvidia.com/deploy/cuda-compatibility/index.html#cuda-intro), and as long as folks are using CUDA driver 450.36.06+, it should be ok.

Changes in this PR:

  • Update Pytorch, Torchvision and Tensorflow to use CUDA 11.8 builds
  • Update minimum pin on Tensorflow from >=2.9.1 to >=2.14.0 because lower versions <2.13.1 only has CUDA 11.2 on conda-forge.

References:

Update Pytorch, Torchvision and Tensorflow to use CUDA 11.8 builds. Also bumped tensorflow from 2.9.1 to 2.14.0 because lower versions <2.13.1 only has CUDA 11.2 on conda-forge.
@weiji14 weiji14 self-assigned this Dec 12, 2023
@pangeo-bot
Copy link
Collaborator

/condalock
Automatically locking new conda environment, building, and testing images...

Copy link
Contributor

Binder 👈 Try on Mybinder.org!

@weiji14 weiji14 marked this pull request as ready for review December 13, 2023 01:23
@weiji14
Copy link
Member Author

weiji14 commented Dec 18, 2023

/condalock

@@ -7,6 +7,6 @@ channels:
dependencies:
- jupyterlab-nvdashboard
- gpytorch
- pytorch>=2.0.0=*cuda112*
- torchvision>=0.15.1=*cuda112*
- pytorch>=2.0.0=*cuda118*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently we've been on 11.8 for some time, so perhaps this * syntax isn't really pinning anything?

cuda-version==11.8
cudatoolkit==11.8.0

Copy link
Member Author

@weiji14 weiji14 Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cudatoolkit. CUDA 11.2 and 11.8 does have this forward/backward compatibility thing, but if you look at the conda-lock.yml file, the Pytorch build is actually the CUDA 11.2 one:

url: https://conda.anaconda.org/conda-forge/linux-64/pytorch-2.1.0-cuda112py311ha0492fd_300.conda

Compared to Pytorch compiled with CUDA 11.2, the one compiled with CUDA 11.8 enables compute capability 8.9, as seen at https://github.com/conda-forge/pytorch-cpu-feedstock/blob/7c7a57b7515eaeda67d3879b56b68466f38f0b0d/recipe/build_pytorch.sh#L144-L153.

@scottyhq
Copy link
Member

looks like tensorflow image size dropped a bit, but pytorch keeps inflating :) would be nice to get these below 10GB if possible one of these days

pangeo/ml-notebook 
10.8GB ->  10GB

pangeo/pytorch-notebook   
13.7GB -> 13.9GB

@weiji14
Copy link
Member Author

weiji14 commented Dec 18, 2023

Yes, things should get smaller! This is because conda-forge has removed the need for a large cudatoolkit package in CUDA 12 (see conda-forge/conda-forge.github.io#1963) by breaking it into smaller components that are installed as needed. So hopefully that can cut down a few hundred megabytes 🤞

@weiji14 weiji14 merged commit 51db3df into master Dec 18, 2023
4 checks passed
@weiji14 weiji14 deleted the cuda-11.8 branch December 18, 2023 22:51
@weiji14 weiji14 mentioned this pull request Feb 16, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants