Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudatoolkit_11_7: init at 11.7.0 #179912

Merged
merged 5 commits into from
Aug 5, 2022
Merged

Conversation

dguibert
Copy link
Member

@dguibert dguibert commented Jul 2, 2022

Description of changes
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 2, 2022
@dguibert dguibert added 8.has: package (update) This PR updates a package to a newer version 6.topic: cuda Parallel computing platform and API labels Jul 2, 2022
@ajs124 ajs124 requested review from FRidh, samuela and SomeoneSerge and removed request for FRidh August 1, 2022 21:45
@samuela samuela changed the title update cudatoolkit to 11.7.0 cudatoolkit_11_7: init at 11.7.0 Aug 2, 2022
Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dguibert, thanks for contributing this!

pkgs/test/cuda/cuda-samples/extension.nix Show resolved Hide resolved
pkgs/top-level/all-packages.nix Show resolved Hide resolved
pkgs/development/compilers/cudatoolkit/versions.toml Outdated Show resolved Hide resolved
@dguibert dguibert force-pushed the dg/cudatoolkit_11_7_0 branch from c2dd11b to a43d296 Compare August 2, 2022 14:20
@ofborg ofborg bot added 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Aug 2, 2022
@samuela
Copy link
Member

samuela commented Aug 5, 2022

Result of nixpkgs-review pr 179912 run on x86_64-linux 1

2 packages marked as broken and skipped:
  • cudaPackages.nvidia_driver
  • truecrack-cuda
5 packages failed to build:
  • cudaPackages.cuda-samples
  • ethminer (ethminer-cuda)
  • mathematica-cuda
  • python310Packages.cupy
  • python39Packages.cupy
63 packages built:
  • colmapWithCuda
  • cudaPackages.cuda_cccl
  • cudaPackages.cuda_cudart
  • cudaPackages.cuda_cuobjdump
  • cudaPackages.cuda_cupti
  • cudaPackages.cuda_cuxxfilt
  • cudaPackages.cuda_demo_suite
  • cudaPackages.cuda_documentation
  • cudaPackages.cuda_gdb
  • cudaPackages.cuda_memcheck
  • cudaPackages.cuda_nsight
  • cudaPackages.cuda_nvcc
  • cudaPackages.cuda_nvdisasm
  • cudaPackages.cuda_nvml_dev
  • cudaPackages.cuda_nvprof
  • cudaPackages.cuda_nvprune
  • cudaPackages.cuda_nvrtc
  • cudaPackages.cuda_nvtx
  • cudaPackages.cuda_nvvp
  • cudaPackages.cuda_sanitizer_api
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cudnn (cudaPackages.cudnn_8_4_0)
  • cudaPackages.cudnn_8_3_2
  • cudaPackages.cutensor
  • cudaPackages.fabricmanager
  • cudaPackages.libcublas
  • cudaPackages.libcufft
  • cudaPackages.libcufile
  • cudaPackages.libcurand
  • cudaPackages.libcusolver
  • cudaPackages.libcusparse
  • cudaPackages.libnpp
  • cudaPackages.libnvidia_nscq
  • cudaPackages.libnvjpeg
  • cudaPackages.nccl
  • cudaPackages.nsight_compute
  • cudaPackages.nsight_systems
  • cudaPackages.nvidia_fs
  • forge
  • gpu-burn
  • gromacsCudaMpi
  • gwe
  • katagoWithCuda
  • librealsenseWithCuda
  • magma
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.numbaWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.pytorchWithCuda
  • python310Packages.tensorflowWithCuda
  • python39Packages.TheanoWithCuda
  • python39Packages.numbaWithCuda
  • python39Packages.pycuda
  • python39Packages.pynvml
  • python39Packages.pyrealsense2WithCuda
  • python39Packages.pytorchWithCuda
  • python39Packages.tensorflowWithCuda
  • xgboostWithCuda
  • xpraWithNvenc

@samuela
Copy link
Member

samuela commented Aug 5, 2022

  • cudaPackages.cuda-samples is known broken based on the comment in this PR
  • mathematica-cuda is an existing failure due to missing the installer download

That leaves ethminer-cuda and cupy, which appear to be broken by this 11.7 upgrade. @dguibert could you add the appropriate overrides to keep these two packages on 11.6 (or whatever the appropriate versions are)?

@samuela
Copy link
Member

samuela commented Aug 5, 2022

AFAICT cupy error looks like

cupy_backends/cuda/libs/cutensor.cpp: In function ‘uint64_t __pyx_f_13cupy_backends_4cuda_4libs_8cutensor_contractionGetWorkspaceSize(__pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_Handle*, __pyx_obj_13cupy_backen
ds_4cuda_4libs_8cutensor_ContractionDescriptor*, __pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_ContractionFind*, int, int)’:
cupy_backends/cuda/libs/cutensor.cpp:5717:17: error: ‘CUTENSOR_VERSION’ was not declared in this scope; did you mean ‘CUTENSOR_OP_SIN’?
 5717 |   __pyx_t_1 = ((CUTENSOR_VERSION < 0x2904) != 0);
      |                 ^~~~~~~~~~~~~~~~
      |                 CUTENSOR_OP_SIN
cupy_backends/cuda/libs/cutensor.cpp: In function ‘uint64_t __pyx_f_13cupy_backends_4cuda_4libs_8cutensor_reductionGetWorkspaceSize(__pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_Handle*, intptr_t, __pyx_obj_13cup
y_backends_4cuda_4libs_8cutensor_TensorDescriptor*, intptr_t, intptr_t, __pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_TensorDescriptor*, intptr_t, intptr_t, __pyx_obj_13cupy_backends_4cuda_4libs_8cutensor_TensorD
escriptor*, intptr_t, int, int, int)’:
cupy_backends/cuda/libs/cutensor.cpp:6584:17: error: ‘CUTENSOR_VERSION’ was not declared in this scope; did you mean ‘CUTENSOR_OP_SIN’?
 6584 |   __pyx_t_1 = ((CUTENSOR_VERSION < 0x2904) != 0);
      |                 ^~~~~~~~~~~~~~~~
      |                 CUTENSOR_OP_SIN

@SomeoneSerge
Copy link
Contributor

cupy has been broken on master for a while

@samuela
Copy link
Member

samuela commented Aug 5, 2022

ah gotcha, looks like the same is true for ethminer... in that case I guess we're good to merge

@samuela samuela merged commit a53c277 into NixOS:master Aug 5, 2022
@samuela samuela mentioned this pull request Aug 5, 2022
13 tasks
@zowoq
Copy link
Contributor

zowoq commented Aug 5, 2022

https://gist.github.com/GrahamcOfBorg/45ac7f5bc9e02a74cb1e4264f365417f

Seems this PR broke eval on master.

@winterqt
Copy link
Member

winterqt commented Aug 5, 2022

11.7.0 would need to be added here, assuming it's compatible. (cc @aidalgol)

@samuela
Copy link
Member

samuela commented Aug 5, 2022

@zowoq uh oh, sorry about that! I'll revert. Looks like we'll need to rebase onto latest master and try again

thanks for the heads up!

@aidalgol
Copy link
Contributor

aidalgol commented Aug 5, 2022

11.7.0 would need to be added here, assuming it's compatible. (cc @aidalgol)

@samuela
That version is not. The list of supported versions in that file is exactly as listed on the nvidia download page. There is a newer version of TensorRT that supports CUDA 11.7, but it requires cuDNN 8.4, which is not yet in nixpkgs.

@samuela
Copy link
Member

samuela commented Aug 5, 2022

Hmm interesting... so we should be marking TensorRT as broken in that case? at least we can't break evaluation haha

long term solution of course is to package cuDNN 8.4...

@aidalgol
Copy link
Contributor

aidalgol commented Aug 5, 2022

Hmm interesting... so we should be marking TensorRT as broken in that case? at least we can't break evaluation haha

That is already done automatically in the TensorRT derivations (see here).

@samuela
Copy link
Member

samuela commented Aug 5, 2022

Mm, I see what you mean. It appears that merging this did break eval however. I haven't dug into all the details just yet, but the logs suggest it has something to do with TensorRT...

@aidalgol
Copy link
Contributor

aidalgol commented Aug 5, 2022

It appears that the case where the CUDA version is not in tensorRTDefaultVersion is not handled, which is why adding CUDA 11.7 broke eval. I'm not sure how best to handle this case. Perhaps define a default version for CUDA 11.7, and let the logic in generic.nix mark it as broken, because 11.7 is not in the list of supported CUDA versions, but it will at least evaluate.

@samuela
Copy link
Member

samuela commented Aug 6, 2022

@zowoq What was the exact command that was failing? I just want to make sure that I'm testing correctly

@winterqt
Copy link
Member

winterqt commented Aug 6, 2022

See OfBorg's README for the command it runs.

@dguibert
Copy link
Member Author

dguibert commented Aug 6, 2022

long term solution of course is to package cuDNN 8.4...

introduced by a43d296 within this PR.

@dguibert
Copy link
Member Author

dguibert commented Aug 6, 2022

Mm, I see what you mean. It appears that merging this did break eval however. I haven't dug into all the details just yet, but the logs suggest it has something to do with TensorRT...

Adding a line: "11.7" = "8.4.0"; to tensorRTDefaultVersion should be enough.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/how-to-install-a-specific-version-of-cuda-and-cudnn/21725/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cuda Parallel computing platform and API 8.has: package (new) This PR adds a new package 8.has: package (update) This PR updates a package to a newer version 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants