Revert "Temporarily skip CUDA 11 wheel CI" #601

bdice · 2025-01-22T11:34:01Z

Reverts #599 now that rapidsai/raft#2548 has landed.

This reverts commit 9b7bb97.

codecov-commenter · 2025-01-22T12:44:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.30%. Comparing base (9b7bb97) to head (0702f89).

Additional details and impacted files

@@              Coverage Diff              @@
##           branch-25.02     #601   +/-   ##
=============================================
  Coverage         72.30%   72.30%           
=============================================
  Files                14       14           
  Lines                65       65           
=============================================
  Hits                 47       47           
  Misses               18       18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jameslamb · 2025-01-22T16:42:48Z

good news: the wheel tests that had been failing because of the cuBLAS issues are passing!

bad news: 1 wheel test

=========================== short test summary info ============================
FAILED python/cuvs/cuvs/test/test_distance.py::test_distance[float16-F-True-euclidean-50-100] - assert False
 +  where False = <function allclose at 0xfffee65add70>(array([[0.        , 2.94198351, 2.11872091, ..., 2.73895706, 2.80186958,\n        2.62724569],\n       [2.94198351, 0.  ...272 ],\n       [2.62724569, 2.84470779, 2.48090272, ..., 2.65241563, 2.7694272 ,\n        0.        ]], shape=(100, 100)), array([[0.       , 2.939494 , 2.1176343, ..., 2.738613 , 2.8034577,\n        2.625    ],\n       [2.939494 , 0.       , ...     [2.625    , 2.8449516, 2.4811792, ..., 2.6516504, 2.769815 ,\n        0.       ]], shape=(100, 100), dtype=float32), atol=0.1, rtol=0.1)
 +    where <function allclose at 0xfffee65add70> = np.allclose
====== 1 failed, 1917 passed, 116 skipped, 2 xfailed in 105.19s (0:01:45) ======

(build link)

That looks like a numerical-precision thing (which can sometimes show up as a flaky test), but I observed it on consecutive runs.

bdice · 2025-01-22T17:22:34Z

#596 looks like it could be related to the precision error. @rhdong Can you confirm if your PR is expected to fix this failure?

rhdong · 2025-01-22T17:52:06Z

good news: the wheel tests that had been failing because of the cuBLAS issues are passing!

bad news: 1 wheel test

=========================== short test summary info ============================
FAILED python/cuvs/cuvs/test/test_distance.py::test_distance[float16-F-True-euclidean-50-100] - assert False
 +  where False = <function allclose at 0xfffee65add70>(array([[0.        , 2.94198351, 2.11872091, ..., 2.73895706, 2.80186958,\n        2.62724569],\n       [2.94198351, 0.  ...272 ],\n       [2.62724569, 2.84470779, 2.48090272, ..., 2.65241563, 2.7694272 ,\n        0.        ]], shape=(100, 100)), array([[0.       , 2.939494 , 2.1176343, ..., 2.738613 , 2.8034577,\n        2.625    ],\n       [2.939494 , 0.       , ...     [2.625    , 2.8449516, 2.4811792, ..., 2.6516504, 2.769815 ,\n        0.       ]], shape=(100, 100), dtype=float32), atol=0.1, rtol=0.1)
 +    where <function allclose at 0xfffee65add70> = np.allclose
====== 1 failed, 1917 passed, 116 skipped, 2 xfailed in 105.19s (0:01:45) ======

(build link)

That looks like a numerical-precision thing (which can sometimes show up as a flaky test), but I observed it on consecutive runs.

Hi @jameslamb , this PR will resolve the issue, pls rerun your tests to ignore it temporarily.

vyasr · 2025-01-22T18:11:11Z

How many times should we try a rerun? Looks like it's failed three times now.

cjnolet · 2025-01-22T18:32:34Z

@vyasr @jameslamb cuVS CI started failing when the script to run the python tests was fixed. I’m not sure which tests were/weren’t running prior to that because I verified myself that there were Python tests running in CI prior to that fix. However, i suspect these tests hadn’t been running since October timeframe and that’s why we are now seeing failures.

One failure seems related to CUBLAS, another seems related to precision or a bug in a distance function/computation.

jameslamb · 2025-01-22T18:37:16Z

Oh wow! Thanks for that context.

One failure seems related to CUBLAS,

Take a look at "cuVS CI failures" in rapidsai/build-planning#137. If what you're referring to is the same as those logs, then that issue is now fixed.

another seems related to precision or a bug in a distance function/computation

Ok yep, that's the one we're running into here, I think: #601 (comment)

rhdong · 2025-01-22T19:37:06Z

How many times should we try a rerun? Looks like it's failed three times now.

Well... it looks like getting consecutive Aces in a poker game.. I just rerun it, let's see and the #596 is close to pass all CI tests, at least we can count on merge it in advance..

vyasr · 2025-01-22T19:38:36Z

Ha yes at this point I think we'll probably wind up waiting for #596 to finish CI, but since the wheel tests are fast no harm in attempting a rerun and seeing what happens.

Revert "Temporarily skip CUDA 11 wheel CI (#599)"

0702f89

This reverts commit 9b7bb97.

bdice requested a review from a team as a code owner January 22, 2025 11:34

bdice requested a review from jameslamb January 22, 2025 11:34

jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 22, 2025

jameslamb approved these changes Jan 22, 2025

View reviewed changes

jameslamb mentioned this pull request Jan 22, 2025

introduce libcuvs wheels #594

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Temporarily skip CUDA 11 wheel CI" #601

Revert "Temporarily skip CUDA 11 wheel CI" #601

bdice commented Jan 22, 2025

codecov-commenter commented Jan 22, 2025

jameslamb commented Jan 22, 2025

bdice commented Jan 22, 2025

rhdong commented Jan 22, 2025 •

edited

Loading

vyasr commented Jan 22, 2025

cjnolet commented Jan 22, 2025

jameslamb commented Jan 22, 2025

rhdong commented Jan 22, 2025

vyasr commented Jan 22, 2025

Revert "Temporarily skip CUDA 11 wheel CI" #601

Are you sure you want to change the base?

Revert "Temporarily skip CUDA 11 wheel CI" #601

Conversation

bdice commented Jan 22, 2025

codecov-commenter commented Jan 22, 2025

Codecov Report

jameslamb commented Jan 22, 2025

bdice commented Jan 22, 2025

rhdong commented Jan 22, 2025 • edited Loading

vyasr commented Jan 22, 2025

cjnolet commented Jan 22, 2025

jameslamb commented Jan 22, 2025

rhdong commented Jan 22, 2025

vyasr commented Jan 22, 2025

rhdong commented Jan 22, 2025 •

edited

Loading