[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

wenju-he · 2024-12-13T01:08:02Z

Currently SYCLLowerWGLocalMemoryPass must run after AlwaysInlinerPass because in sycl header __sycl_allocateLocalMemory call is wrapped in group_local_memory/group_local_memory_for_overwrite function. Each call to __sycl_allocateLocalMemory represents a unique local memory, so group_local_memory/group_local_memory_for_overwrite must be inlined.

The dependency is implicit and prohibits SYCLLowerWGLocalMemoryPass being moved around in the pass pipeline.

Since the pass transforms __sycl_allocateLocalMemory call to access of global variable @WGLocalMem, moving the pass to beginning of pipeline could enable more optimization than the function call does.

We can't assume backend compiler lowers the global variable after AlwaysInlinerPass.

…ysInlinerPass and move to PipelineStart Currently SYCLLowerWGLocalMemoryPass must run after AlwaysInlinerPass because in sycl header __sycl_allocateLocalMemory call is wrapped in group_local_memory/group_local_memory_for_overwrite function. Each call to __sycl_allocateLocalMemory represents a unique local memory, so group_local_memory/group_local_memory_for_overwrite must be inlined. The dependency is implicit and prohibits SYCLLowerWGLocalMemoryPass being moved around in the pass pipeline. Since the pass transforms __sycl_allocateLocalMemory call to access of global variable @WGLocalMem, moving the pass to beginning of pipeline could enable more optimization than the function call does. In addition, intel gpu compiler has a pass to transform global variable in addrspace(3) to alloca that runs after pipeline basic simplification. Therefore, we shall run SYCLLowerWGLocalMemoryPass ealier.

clang/lib/CodeGen/BackendUtil.cpp

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

jsji · 2024-12-13T01:19:57Z

Looks like you need to rebase to pick up the new changes in this pass first.

…artEPCallback

wenju-he · 2024-12-13T02:17:39Z

Looks like you need to rebase to pick up the new changes in this pass first.

done

jsji

LGTM. Thanks!

bader · 2024-12-13T21:37:53Z

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

+      continue;
+    }
+    std::string FName = llvm::demangle(Caller->getName());
+    if (FName.find("sycl::_V1::ext::oneapi::group_local_memory") ==


Hardcoding current function name from DPC++ library is unfortunate. The code in the DPC++ header files can be changed at any time.

To make it more robust, I thought we could go up in the call stack up-to the kernel function ignoring all functions in sycl:: namespace. This will require SYCL kernel to be inlined into kernel function wrapper.

@Naghasan, do you have any thoughts on that?

I agree it is unfortunate, especially w.r.t. upstreaming. I don't know what the plans are for this one but if it is seen as important, we might want to improve this.

This will require SYCL kernel to be inlined into kernel function wrapper.

I don't think this is an issue TBH, I don't see any benefit in not inline the SYCL kernel in the wrapper, even in SPIR-V.

I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.

I also just realized syclcompat::local_mem uses it, it isn't technically a valid usage of it w.r.t. the extension but something the attribute would allow to correctly handle.

cc @elizabethandrews @joeatodd

I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.

A new attribute "sycl_forceinline" is added in a4fe915
Please review.

It is not clear to me why using a SYCL specific attribute 'sycl_forceinline' is better than just relying on the name of the function. Both these options seem equally vulnerable to changes to header code. May be I am missing something.

It is not clear to me why using a SYCL specific attribute 'sycl_forceinline' is better than just relying on the name of the function. Both these options seem equally vulnerable to changes to header code. May be I am missing something.

The attribute is our own internal interface and we can say that it is dictated by the compiler and headers have to follow.

Function name is user-visible public API dictated by the extension spec. Tomorrow we will add a new method to it which requires the same handling, rename/remove an existing method - plenty of possible reasons for the name change. And we cannot say that user-visible public high-level SYCL API is dictated by the compiler.

So, from that point of view an attribute is a safer option.

Naghasan · 2024-12-17T10:14:37Z

llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp

+    return false;
+
+  bool Changed = false;
+  for (auto *U : ALMFunc->users()) {


we need to use a work list here rather than the simple loop.

This function https://github.com/intel/llvm/blob/sycl/sycl/include/syclcompat/memory.hpp#L71 needs to be updated as well, and this function won't be able to handle the nesting. The CI is currently green because there is no test requesting 2 distinct local memory objects using this function in the same kernel.

done, thank you for the suggestion. Now I understand what you mean by syclcompat::local_mem.
Also added a new test sycl/test/check_device_code/syclcompat_local_mem.cpp that has two calls to syclcompat::local_mem in a kernel.

jsji · 2025-01-31T14:57:51Z

We still need the approvals from @intel/dpcpp-tools-reviewers and/or @intel/syclcompat-lib-reviewers:

Thanks @dm-vodopyanov . @intel/dpcpp-tools-reviewers and @intel/syclcompat-lib-reviewers Please approve or leave further comments. Thanks!

JackAKirk · 2025-01-31T15:00:35Z

The syclcompat changes look good to me, though I'll appreciate if @joeatodd also has a quick look to confirm, if possible.

@joeatodd could you take a look at this?

joeatodd

SYCLcompat changes LGTM

MrSidims

The reasoning makes sense to me, but I can't approve on behave of dpcpp-tools team. The reason is that @asudarsa and/or @maksimsab might have insights on how this patch (can) affect upstreaming efforts.

jsji · 2025-01-31T15:17:49Z

Thanks @MrSidims . @asudarsa @maksimsab Can you have a look and comments? Thanks.

asudarsa

DPC++ Tools related changes are in SYCLLowerIR. The logic that relies on an attribute introduced in the SYCL headers to selectively inline functions seems sound and should not impact upstreaming efforts. Approving these changes from DPC++ Tools side.

Thanks
P,S: My e-mail notifications have been a bit broken of late. I am starting to manually track the PRs here. So Sorry for the delay in response.

jsji · 2025-02-11T02:40:22Z

Retriggered the CI since last success run was weeks ago.

…roup_local_memory.cpp Since intel#16356 local memory is lowered to global variable before AddressSanitizerPass. The local memory access is optimized out before AddressSanitizerPass. This PR updates the test so that it won't be optimized out.

wenju-he · 2025-02-11T09:44:55Z

Failed Tests (1):
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp

This fail illustrates that this PR enables early optimizations to local memory. The fail will be fixed by #16959

This fixes "error: undef deprecator failed" in pre-commit test of intel#16356

aelovikov-intel · 2025-02-11T15:48:11Z

Failed Tests (1):
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp
This fail illustrates that this PR enables early optimizations to local memory. The fail will be fixed by #16959

Why is it a separate PR and not part of this one?!

bader · 2025-02-11T15:49:45Z

Failed Tests (1):
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp
This fail illustrates that this PR enables early optimizations to local memory. The fail will be fixed by #16959
Why is it a separate PR and not part of this one?!

It should be separate because the issue is not caused by changes of this patch.

aelovikov-intel · 2025-02-11T15:53:20Z

This fail illustrates that this PR enables ...

the issue is not caused by changes of this patch

I can't match these two together...

bader · 2025-02-11T15:56:17Z

This fail illustrates that this PR enables ...

the issue is not caused by changes of this patch

I can't match these two together...

The bug already in the code, it doesn't expose itself w/o this patch.

aelovikov-intel · 2025-02-11T16:02:29Z

This fail illustrates that this PR enables ...

the issue is not caused by changes of this patch

I can't match these two together...

The bug already in the code, it doesn't expose itself w/o this patch.

Thanks. Just to clarify, we'll merge the other PR with the fix first, and this PR will only go after that, right?

bader · 2025-02-11T16:06:18Z

This fail illustrates that this PR enables ...

the issue is not caused by changes of this patch

I can't match these two together...

The bug already in the code, it doesn't expose itself w/o this patch.

Thanks. Just to clarify, we'll merge the other PR with the fix first, and this PR will only go after that, right?

That's my expectation. We must get clean pre-commit before merging this change.

…pp (#16959) Disable the test temporarily due to two issues that appear after PR #16356: * Local memory in the test is optimized out before AddressSanitizerPass. * #16979

This fixes "error: undef deprecator failed" in pre-commit test of #16356

…artEPCallback

wenju-he requested review from a team as code owners December 13, 2024 01:08

wenju-he requested review from bader and jsji December 13, 2024 01:10

wenju-he mentioned this pull request Dec 13, 2024

[SYCL] Move SYCLLowerWGLocalMemoryPass to OptimizerEarlyEPCallback #16347

Closed

bader requested a review from Naghasan December 13, 2024 01:12

jsji reviewed Dec 13, 2024

View reviewed changes

wenju-he added 2 commits December 13, 2024 09:48

Merge branch 'sycl' into SYCLLowerWGLocalMemoryPass-inline-PipelineSt…

51b1ec9

…artEPCallback

fix inlineGroupLocalMemoryFunc

a4f8382

wenju-he had a problem deploying to WindowsCILock December 13, 2024 02:17 — with GitHub Actions Error

wenju-he requested a review from jsji December 13, 2024 02:24

inlineGroupLocalMemoryFunc: return false -> continue

b42fd22

wenju-he temporarily deployed to WindowsCILock December 13, 2024 02:40 — with GitHub Actions Inactive

jsji approved these changes Dec 13, 2024

View reviewed changes

wenju-he temporarily deployed to WindowsCILock December 13, 2024 03:28 — with GitHub Actions Inactive

bader reviewed Dec 13, 2024

View reviewed changes

check device code

44db66a

wenju-he requested a review from a team as a code owner December 16, 2024 03:28

wenju-he requested a review from uditagarwal97 December 16, 2024 03:28

wenju-he had a problem deploying to WindowsCILock December 16, 2024 03:29 — with GitHub Actions Error

clang-format

97add86

wenju-he temporarily deployed to WindowsCILock December 16, 2024 03:40 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock December 16, 2024 04:28 — with GitHub Actions Inactive

add ir attribute sycl_forceinline to group_local_memory

a4fe915

wenju-he temporarily deployed to WindowsCILock December 17, 2024 05:55 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock December 17, 2024 06:45 — with GitHub Actions Inactive

Naghasan requested changes Dec 17, 2024

View reviewed changes

joeatodd approved these changes Jan 31, 2025

View reviewed changes

MrSidims reviewed Jan 31, 2025

View reviewed changes

jsji requested a review from MrSidims January 31, 2025 15:16

asudarsa approved these changes Feb 11, 2025

View reviewed changes

jsji closed this Feb 11, 2025

jsji reopened this Feb 11, 2025

jsji temporarily deployed to WindowsCILock February 11, 2025 02:40 — with GitHub Actions Inactive

jsji temporarily deployed to WindowsCILock February 11, 2025 03:43 — with GitHub Actions Inactive

wenju-he mentioned this pull request Feb 11, 2025

[SYCL][E2E] Disable device AddressSanitizer test group_local_memory.cpp #16959

Merged

wenju-he added a commit to wenju-he/llvm that referenced this pull request Feb 11, 2025

[SYCL][SYCLLowerWGLocalMemoryPass] replace undef with poison

3cb9baf

This fixes "error: undef deprecator failed" in pre-commit test of intel#16356

wenju-he mentioned this pull request Feb 11, 2025

[SYCL][SYCLLowerWGLocalMemoryPass] replace undef with poison #16960

Merged

wenju-he mentioned this pull request Feb 12, 2025

Update device sanitizer instrument of static workgroup local memory per change in https://github.com/intel/llvm/pull/16356 #16979

Open

aelovikov-intel pushed a commit that referenced this pull request Feb 12, 2025

[SYCL][SYCLLowerWGLocalMemoryPass] replace undef with poison (#16960)

84d1236

This fixes "error: undef deprecator failed" in pre-commit test of #16356

wenju-he added 2 commits February 12, 2025 17:21

Merge branch 'sycl' into SYCLLowerWGLocalMemoryPass-inline-PipelineSt…

afaa941

…artEPCallback

update test: undef -> poison

661529e

wenju-he temporarily deployed to WindowsCILock February 13, 2025 01:43 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock February 13, 2025 03:41 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

wenju-he commented Dec 13, 2024 •

edited

Loading

jsji commented Dec 13, 2024

wenju-he commented Dec 13, 2024

jsji left a comment

bader Dec 13, 2024

Naghasan Dec 16, 2024

wenju-he Dec 17, 2024

asudarsa Feb 11, 2025

AlexeySachkov Feb 12, 2025

Naghasan Dec 17, 2024

wenju-he Dec 18, 2024

jsji commented Jan 31, 2025

JackAKirk commented Jan 31, 2025

joeatodd left a comment

MrSidims left a comment

jsji commented Jan 31, 2025

asudarsa left a comment

jsji commented Feb 11, 2025

wenju-he commented Feb 11, 2025

aelovikov-intel commented Feb 11, 2025

bader commented Feb 11, 2025

aelovikov-intel commented Feb 11, 2025

bader commented Feb 11, 2025

aelovikov-intel commented Feb 11, 2025

bader commented Feb 11, 2025

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

Are you sure you want to change the base?

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

Conversation

wenju-he commented Dec 13, 2024 • edited Loading

jsji commented Dec 13, 2024

wenju-he commented Dec 13, 2024

jsji left a comment

Choose a reason for hiding this comment

bader Dec 13, 2024

Choose a reason for hiding this comment

Naghasan Dec 16, 2024

Choose a reason for hiding this comment

wenju-he Dec 17, 2024

Choose a reason for hiding this comment

asudarsa Feb 11, 2025

Choose a reason for hiding this comment

AlexeySachkov Feb 12, 2025

Choose a reason for hiding this comment

Naghasan Dec 17, 2024

Choose a reason for hiding this comment

wenju-he Dec 18, 2024

Choose a reason for hiding this comment

jsji commented Jan 31, 2025

JackAKirk commented Jan 31, 2025

joeatodd left a comment

Choose a reason for hiding this comment

MrSidims left a comment

Choose a reason for hiding this comment

jsji commented Jan 31, 2025

asudarsa left a comment

Choose a reason for hiding this comment

jsji commented Feb 11, 2025

wenju-he commented Feb 11, 2025

aelovikov-intel commented Feb 11, 2025

bader commented Feb 11, 2025

aelovikov-intel commented Feb 11, 2025

bader commented Feb 11, 2025

aelovikov-intel commented Feb 11, 2025

bader commented Feb 11, 2025

wenju-he commented Dec 13, 2024 •

edited

Loading