FIX: Build succeeds and features succeed without lean server crash #106

Adarsh321123 · 2024-08-07T05:27:12Z

Previously on Ubuntu, lake build failed. Now, lake build succeeds, and features like premise selection and tactic suggestion also succeed rather than cause the Lean server to crash.

Peiyang-Song · 2024-08-14T03:32:33Z

Thanks @Adarsh321123 for the PR! Just adding a note in this PR: I think the Ubuntu we are talking about is one with CUDA-enabled GPU.

In the PR two groups of flags are deleted. I gave it a try and indeed it worked on Ubuntu with GPU, despite being fairly slow.

I also found that the first line of change seems not necessary -- the build succeeds with or without cutting two of the four flags on that line, i.e. from let mut flags := #["-DBUILD_CLI=OFF", "-DOPENMP_RUNTIME=NONE", "-DWITH_DNNL=OFF", "-DWITH_MKL=OFF"] to let mut flags := #["-DOPENMP_RUNTIME=NONE", "-DWITH_MKL=OFF"]. Any other reasons why deleting these two flags are needed?

So then the part of change that really functions here is the second, which is to delete the entire

  if ← useCUDA then
    flags := flags ++ #["-DWITH_CUDA=ON", "-DWITH_CUDNN=ON"]
  else
    flags := flags ++ #["-DWITH_CUDA=OFF", "-DWITH_CUDNN=OFF"]

It seems to me that this completely disables the use of CUDA on a machine with a CUDA-enabled GPU, instead of choosing whether to use CUDA based on the machine's specs. The fact that this works indicates that there might be issues with how the program is being built when WITH_CUDA is turned on. Yet at the same time, it seems a bit brutal that we just do not use CUDA anyways.

Peiyang-Song · 2024-08-14T03:56:33Z

The first change seems functionally unnecessary so far, i will add it back if there is a reason for that.

The second change seems suboptimal but does work, despite being slow. I think it indicates that we should look into how CUDA in used during the build. For now, I will incorporate the second part of change (deleting the CUDA-related flags) while we look more into actual reasons behind that.

Adarsh321123 · 2024-08-14T05:02:16Z

Hi Peiyang! Thank you for your comments. I think the "Ubuntu" I mentioned may not necessarily be one with a CUDA-enabled GPU. This is because the same build and run errors exist on my M1 Mac, which does not have a CUDA-enabled GPU. As such, the bug may be related to CUDA rather than being because of CUDA.

I agree that the first set of flags is unnecessary on Ubuntu. However, it is necessary for a successful build on my M1 Mac. Please let me know if you notice the same on your end.

I agree that the second change is suboptimal. However, given that we have found an alternative for a while, I agree that it is acceptable to incorporate this change. Also, although this removes CUDA capabilities, I noticed that the 3 features are very fast on an Ubuntu desktop with a 4090, but very slow on a DGX server with 4 A100s. This again suggests that CUDA may not be the direct issue, but may be related.

Peiyang-Song · 2024-08-14T05:05:43Z

I see, I said it's probably related to CUDA because I was able to build everything on a Ubuntu machine without CUDA-enabled GPUs. Anyways, I think we can incorporate the temporary suboptimal fix for now.

Peiyang-Song · 2024-08-14T05:06:17Z

For the M1 Mac, that's interesting. So you were unable to successfully lake build on your M1 Mac without changing the first line?

Adarsh321123 · 2024-08-14T05:10:08Z

It's interesting that you were able to build on an Ubuntu machine without CUDA-enabled GPUs.

Peiyang-Song · 2024-08-14T05:10:21Z

I actually don't have a M1 Mac myself, so if you could confirm that the first line is necessary for M1 Mac, and if I validate that the change in the first line does not make any other architecture not work (I'll handle that testing), we can just incorporate the change.

Adarsh321123 · 2024-08-14T05:10:43Z

Yes, I was unable to successfully run lake build on my M1 Mac without changing the first line. I will check this one more time!

Peiyang-Song · 2024-08-14T05:11:47Z

Sounds great! Let me know of the result after you double check that. Then I'll merge. Thanks!

Adarsh321123 · 2024-08-14T05:29:30Z

I can confirm that I cannot lake build on my M1 Mac without the first change.

Peiyang-Song · 2024-08-14T05:29:57Z

Sounds great, thanks for the contribution!

fix build and nd run issues on ubuntu

71af41a

yangky11 requested a review from Peiyang-Song August 7, 2024 13:29

Peiyang-Song changed the base branch from main to peiyang August 14, 2024 04:00

Peiyang-Song merged commit 063e036 into lean-dojo:peiyang Aug 14, 2024
1 of 2 checks passed

Peiyang-Song mentioned this pull request Aug 14, 2024

Fix CMake flags #109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Build succeeds and features succeed without lean server crash #106

FIX: Build succeeds and features succeed without lean server crash #106

Adarsh321123 commented Aug 7, 2024

Peiyang-Song commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Adarsh321123 commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Adarsh321123 commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Adarsh321123 commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Adarsh321123 commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

FIX: Build succeeds and features succeed without lean server crash #106

FIX: Build succeeds and features succeed without lean server crash #106

Conversation

Adarsh321123 commented Aug 7, 2024

Peiyang-Song commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Adarsh321123 commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Adarsh321123 commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Adarsh321123 commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024

Adarsh321123 commented Aug 14, 2024

Peiyang-Song commented Aug 14, 2024