Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UR][CI] Add first version of UR workflow #16827

Open
wants to merge 1 commit into
base: sycl
Choose a base branch
from

Conversation

lukaszstolarczuk
Copy link
Contributor

@lukaszstolarczuk lukaszstolarczuk commented Jan 29, 2025

Last commit included just to verify if test_job will be executed.

With extra dummy changes (now missing in PR) we just didn't skip the test_job, e.g. here: https://github.com/intel/llvm/actions/runs/13040807649/job/36382037543

.github/workflows/sycl-linux-precommit.yml Outdated Show resolved Hide resolved
.github/workflows/ur-build-hw.yml Outdated Show resolved Hide resolved
# Date: Wed Jan 22 11:06:11 2025 +0100
# [common] Bump UMF to early 0.11 version, from main
# It includes i.a. MacOS fix for compiler.
set(UNIFIED_RUNTIME_TAG d193046de592482c47d87fdfaf92c7b8c59c9b66)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this tag update necessary for the workflow changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a dummy change to trigger sycl changes (in detect_changes job) - replaced now with dummy changes for unified-runtime dir - this commit will be removed before merging, of course

.github/workflows/unified-runtime.yml Outdated Show resolved Hide resolved
@kbenzie
Copy link
Contributor

kbenzie commented Feb 4, 2025

During the meeting we think that we should merge this before the PR which pulls in the UR source code once we have the jobs working correctly. Then, in the time between merging this and pulling in the UR sources we disable the workflow to avoid overloading the runners.

@lukaszstolarczuk
Copy link
Contributor Author

So, FYI, the current status is that the CUDA runner went wild, not sure what happened (I'm betting a bad apt update). Working on reviving the machine, if nothing succeed I'll replace it with a different runner...

@lukaszstolarczuk
Copy link
Contributor Author

there's new CUDA runner enabled, we should be good now.

@kbenzie
Copy link
Contributor

kbenzie commented Feb 13, 2025

@intel/dpcpp-devops-reviewers please review

Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass!

@@ -0,0 +1,132 @@
name: UR - Build adapters on HW
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a nit, but what does it mean to build an adapter on hardware?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it means run the tests on hardware, these jobs don't currently have a build/test separation. The name can be updated:

Suggested change
name: UR - Build adapters on HW
name: UR - Build adapters, test on HW

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


- name: Download DPC++
run: |
wget -O ${{github.workspace}}/dpcpp_compiler.tar.gz https://github.com/intel/llvm/releases/download/nightly-2024-12-12/sycl_linux.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to use the same dpcpp version every time or do we want to use the latest nightly

also we could use a docker image that has it preinstalled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving forwards we should definitely use the nightly. We had issues in the past which is why these jobs use a fixed version for now.

I think, for the same of ensuring as smooth a transition as possible, we should leave this fixed version and work on use the nightly post merge.

The work on enable use of docker images on these runners is ongoing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the TODO list, for after the move.

-DUR_DPCXX=${{github.workspace}}/dpcpp_compiler/bin/clang++
-DUR_SYCL_LIBRARY_DIR=${{github.workspace}}/dpcpp_compiler/lib
-DCMAKE_INSTALL_PREFIX=${{github.workspace}}/install
${{ matrix.adapter.name == 'HIP' && '-DUR_CONFORMANCE_AMD_ARCH=gfx1030' || '' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure i understand why we need to set DUR_CONFORMANCE_AMD_ARCH here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a hang over, I think we autodetect the AMD GPU arch for compiling test programs for the device now.

- name: Build
# This is so that device binaries can find the sycl runtime library
run: cmake --build ${{github.workspace}}/build -j $(nproc)
Copy link
Contributor

@sarnex sarnex Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use ninja instead of make as the cmake generator at the configure step (-GNinja) you shouldnt need this, and i think ninja is just better anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree Ninja generator is better, we have had issues with Unix Makefiles becomeing broken due to not being tested so the CI is ensuring that didn't make its way downstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soo... I'm not sure, should we leave it as-is and add extra Ninja job...?


- name: Test adapter specific
working-directory: ${{github.workspace}}/build
run: ctest -C ${{matrix.build_type}} --output-on-failure -L "adapter-specific" -E "memcheck" --timeout 600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to set the wd or can we just specify the path when running the tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we could add the --test-dir ${{github.workspace}}/build option.

We do intend to replae ctest with lit after the repo move so this will be temporary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- name: Get information about platform
working-directory: ${{github.workspace}}/unified-runtime
if: ${{ always() }}
run: .github/scripts/get_system_info.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just for debugging output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been very helpful for diagnosing job failures, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarnex - thoughts? should I remove it?

- name: Get information about platform
working-directory: ${{github.workspace}}/unified-runtime
if: ${{ always() }}
run: .github/scripts/get_system_info.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this file exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the UR repo https://github.com/oneapi-src/unified-runtime/blob/main/.github/scripts/get_system_info.sh, its found because of the working-directory line pointing to the UR clone which exists in the same location as it will post repo move.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed wd

# - test_job should be removed
#
test_job:
name: UR test job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we instead say Detect UR changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, this is a test_job which is here just to show how the if should be used. As stated in the TODO comment above - after moving UR here it should be removed. Added comment.

adapter_name: L0_V2
runner_name: UR_L0

level_zero_static:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use a matrix for all of the adapters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used this approach to separate adapters jobs from each other. Meaning, if any adapter failed, all other could still be tested out. I think it was added like that some time ago, because some adapters were less stable than the others.
I think we can change it to matrix now. Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    strategy:
      fail-fast: false
      matrix:
        include:

? That's what we use in E2E tasks.

Expand-Archive -Path "$WorkingDir\doxygen.zip"
Add-Content $env:GITHUB_PATH "$WorkingDir\doxygen"
- name: "[Lin] Install hwloc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we should split the win/lin steps into seperate workflows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to have them in a single file for now. The benefits for SYCL were:

  1. Ability to cancel individual Linux/Windows if some CI job isn't necessary
  2. Job dispatch control via "Environment" for when we had zombie processes post testing making the runner unusable for subsequent tasks.

If UR CI won't be affected by these (I'd assume it won't), then there will be little value in splitting the OSes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my comment was mainly about confusion reading the results because people are used to how it's done for sycl, but its not a big deal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to the TODO list

name: UR - Build adapters on HW

on:
workflow_call:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer helper workflows like this one to have an additional workflow_dispatch trigger for testing/experimenting, e.g.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 66 to 67
# Also, the step of downloading DPC++ should be integrated somehow;
# either compile it or use cached binaries (if accessible)...?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nightly "release" maybe? We don't try re-use post-commit binaries (although artifacts might actually be available).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nightly would make sense, yeah. A related comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- name: Configure CMake
working-directory: ${{github.workspace}}/unified-runtime
run: >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: >
run: |

or a comment why different from other steps/workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented

- name: Build
# This is so that device binaries can find the sycl runtime library
run: cmake --build ${{github.workspace}}/build -j $(nproc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use ninja instead of make as generator and then

Suggested change
run: cmake --build ${{github.workspace}}/build -j $(nproc)
run: cmake --build ${{github.workspace}}/build

Comment on lines 65 to 98
level_zero:
name: Level Zero
needs: [source_checks]
uses: ./.github/workflows/ur-build-hw.yml
with:
adapter_name: L0
runner_name: UR_L0

level_zero_v2:
name: Level Zero V2
needs: [source_checks]
uses: ./.github/workflows/ur-build-hw.yml
with:
adapter_name: L0_V2
runner_name: UR_L0

level_zero_static:
name: Level Zero static
needs: [source_checks]
uses: ./.github/workflows/ur-build-hw.yml
with:
adapter_name: L0
runner_name: UR_L0
static_loader: ON
static_adapter: ON

opencl:
name: OpenCL
needs: [source_checks]
uses: ./.github/workflows/ur-build-hw.yml
with:
adapter_name: OPENCL
runner_name: UR_OPENCL
platform: "Intel(R) OpenCL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this a matrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Use custom runner names - UR_*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants