-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UR][CI] Add first version of UR workflow #16827
base: sycl
Are you sure you want to change the base?
Conversation
f61518c
to
7762529
Compare
# Date: Wed Jan 22 11:06:11 2025 +0100 | ||
# [common] Bump UMF to early 0.11 version, from main | ||
# It includes i.a. MacOS fix for compiler. | ||
set(UNIFIED_RUNTIME_TAG d193046de592482c47d87fdfaf92c7b8c59c9b66) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tag update necessary for the workflow changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a dummy change to trigger sycl
changes (in detect_changes job) - replaced now with dummy changes for unified-runtime
dir - this commit will be removed before merging, of course
7762529
to
f9cecd5
Compare
f9cecd5
to
9a36d44
Compare
ba4ea20
to
d85dd23
Compare
d85dd23
to
1a68aa0
Compare
During the meeting we think that we should merge this before the PR which pulls in the UR source code once we have the jobs working correctly. Then, in the time between merging this and pulling in the UR sources we disable the workflow to avoid overloading the runners. |
So, FYI, the current status is that the CUDA runner went wild, not sure what happened (I'm betting a bad |
1a68aa0
to
c45824f
Compare
c45824f
to
4ac5046
Compare
there's new CUDA runner enabled, we should be good now. |
4ac5046
to
e0d7e97
Compare
@intel/dpcpp-devops-reviewers please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass!
.github/workflows/ur-build-hw.yml
Outdated
@@ -0,0 +1,132 @@ | |||
name: UR - Build adapters on HW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a nit, but what does it mean to build an adapter on hardware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it means run the tests on hardware, these jobs don't currently have a build/test separation. The name can be updated:
name: UR - Build adapters on HW | |
name: UR - Build adapters, test on HW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
- name: Download DPC++ | ||
run: | | ||
wget -O ${{github.workspace}}/dpcpp_compiler.tar.gz https://github.com/intel/llvm/releases/download/nightly-2024-12-12/sycl_linux.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to use the same dpcpp version every time or do we want to use the latest nightly
also we could use a docker image that has it preinstalled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving forwards we should definitely use the nightly. We had issues in the past which is why these jobs use a fixed version for now.
I think, for the same of ensuring as smooth a transition as possible, we should leave this fixed version and work on use the nightly post merge.
The work on enable use of docker images on these runners is ongoing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the TODO list, for after the move.
-DUR_DPCXX=${{github.workspace}}/dpcpp_compiler/bin/clang++ | ||
-DUR_SYCL_LIBRARY_DIR=${{github.workspace}}/dpcpp_compiler/lib | ||
-DCMAKE_INSTALL_PREFIX=${{github.workspace}}/install | ||
${{ matrix.adapter.name == 'HIP' && '-DUR_CONFORMANCE_AMD_ARCH=gfx1030' || '' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure i understand why we need to set DUR_CONFORMANCE_AMD_ARCH
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a hang over, I think we autodetect the AMD GPU arch for compiling test programs for the device now.
- name: Build | ||
# This is so that device binaries can find the sycl runtime library | ||
run: cmake --build ${{github.workspace}}/build -j $(nproc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use ninja instead of make as the cmake generator at the configure step (-GNinja
) you shouldnt need this, and i think ninja is just better anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree Ninja generator is better, we have had issues with Unix Makefiles becomeing broken due to not being tested so the CI is ensuring that didn't make its way downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
soo... I'm not sure, should we leave it as-is and add extra Ninja job...?
.github/workflows/ur-build-hw.yml
Outdated
|
||
- name: Test adapter specific | ||
working-directory: ${{github.workspace}}/build | ||
run: ctest -C ${{matrix.build_type}} --output-on-failure -L "adapter-specific" -E "memcheck" --timeout 600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to set the wd or can we just specify the path when running the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we could add the --test-dir ${{github.workspace}}/build
option.
We do intend to replae ctest with lit after the repo move so this will be temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.github/workflows/ur-build-hw.yml
Outdated
- name: Get information about platform | ||
working-directory: ${{github.workspace}}/unified-runtime | ||
if: ${{ always() }} | ||
run: .github/scripts/get_system_info.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just for debugging output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been very helpful for diagnosing job failures, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sarnex - thoughts? should I remove it?
.github/workflows/ur-build-hw.yml
Outdated
- name: Get information about platform | ||
working-directory: ${{github.workspace}}/unified-runtime | ||
if: ${{ always() }} | ||
run: .github/scripts/get_system_info.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think this file exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the UR repo https://github.com/oneapi-src/unified-runtime/blob/main/.github/scripts/get_system_info.sh, its found because of the working-directory
line pointing to the UR clone which exists in the same location as it will post repo move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed wd
# - test_job should be removed | ||
# | ||
test_job: | ||
name: UR test job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we instead say Detect UR changes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, this is a test_job
which is here just to show how the if
should be used. As stated in the TODO comment above - after moving UR here it should be removed. Added comment.
.github/workflows/ur-precommit.yml
Outdated
adapter_name: L0_V2 | ||
runner_name: UR_L0 | ||
|
||
level_zero_static: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use a matrix for all of the adapters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used this approach to separate adapters jobs from each other. Meaning, if any adapter failed, all other could still be tested out. I think it was added like that some time ago, because some adapters were less stable than the others.
I think we can change it to matrix now. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strategy:
fail-fast: false
matrix:
include:
? That's what we use in E2E tasks.
Expand-Archive -Path "$WorkingDir\doxygen.zip" | ||
Add-Content $env:GITHUB_PATH "$WorkingDir\doxygen" | ||
- name: "[Lin] Install hwloc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo we should split the win/lin steps into seperate workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm willing to have them in a single file for now. The benefits for SYCL were:
- Ability to cancel individual Linux/Windows if some CI job isn't necessary
- Job dispatch control via "Environment" for when we had zombie processes post testing making the runner unusable for subsequent tasks.
If UR CI won't be affected by these (I'd assume it won't), then there will be little value in splitting the OSes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah my comment was mainly about confusion reading the results because people are used to how it's done for sycl, but its not a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to the TODO list
name: UR - Build adapters on HW | ||
|
||
on: | ||
workflow_call: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer helper workflows like this one to have an additional workflow_dispatch trigger for testing/experimenting, e.g.
workflow_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.github/workflows/ur-build-hw.yml
Outdated
# Also, the step of downloading DPC++ should be integrated somehow; | ||
# either compile it or use cached binaries (if accessible)...? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nightly "release" maybe? We don't try re-use post-commit binaries (although artifacts might actually be available).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nightly would make sense, yeah. A related comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- name: Configure CMake | ||
working-directory: ${{github.workspace}}/unified-runtime | ||
run: > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run: > | |
run: | |
or a comment why different from other steps/workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented
- name: Build | ||
# This is so that device binaries can find the sycl runtime library | ||
run: cmake --build ${{github.workspace}}/build -j $(nproc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use ninja
instead of make
as generator and then
run: cmake --build ${{github.workspace}}/build -j $(nproc) | |
run: cmake --build ${{github.workspace}}/build |
.github/workflows/ur-precommit.yml
Outdated
level_zero: | ||
name: Level Zero | ||
needs: [source_checks] | ||
uses: ./.github/workflows/ur-build-hw.yml | ||
with: | ||
adapter_name: L0 | ||
runner_name: UR_L0 | ||
|
||
level_zero_v2: | ||
name: Level Zero V2 | ||
needs: [source_checks] | ||
uses: ./.github/workflows/ur-build-hw.yml | ||
with: | ||
adapter_name: L0_V2 | ||
runner_name: UR_L0 | ||
|
||
level_zero_static: | ||
name: Level Zero static | ||
needs: [source_checks] | ||
uses: ./.github/workflows/ur-build-hw.yml | ||
with: | ||
adapter_name: L0 | ||
runner_name: UR_L0 | ||
static_loader: ON | ||
static_adapter: ON | ||
|
||
opencl: | ||
name: OpenCL | ||
needs: [source_checks] | ||
uses: ./.github/workflows/ur-build-hw.yml | ||
with: | ||
adapter_name: OPENCL | ||
runner_name: UR_OPENCL | ||
platform: "Intel(R) OpenCL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this a matrix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Use custom runner names - UR_*
e0d7e97
to
b28c442
Compare
Last commit included just to verify iftest_job
will be executed.With extra dummy changes (now missing in PR) we just didn't skip the
test_job
, e.g. here: https://github.com/intel/llvm/actions/runs/13040807649/job/36382037543