Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356

Open
wants to merge 15 commits into
base: sycl
Choose a base branch
from

Conversation

wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Dec 13, 2024

Currently SYCLLowerWGLocalMemoryPass must run after AlwaysInlinerPass because in sycl header __sycl_allocateLocalMemory call is wrapped in group_local_memory/group_local_memory_for_overwrite function. Each call to __sycl_allocateLocalMemory represents a unique local memory, so group_local_memory/group_local_memory_for_overwrite must be inlined.

The dependency is implicit and prohibits SYCLLowerWGLocalMemoryPass being moved around in the pass pipeline.

Since the pass transforms __sycl_allocateLocalMemory call to access of global variable @WGLocalMem, moving the pass to beginning of pipeline could enable more optimization than the function call does.

We can't assume backend compiler lowers the global variable after AlwaysInlinerPass.

…ysInlinerPass and move to PipelineStart

Currently SYCLLowerWGLocalMemoryPass must run after AlwaysInlinerPass
because in sycl header __sycl_allocateLocalMemory call is wrapped in
group_local_memory/group_local_memory_for_overwrite function. Each call
to __sycl_allocateLocalMemory represents a unique local memory, so
group_local_memory/group_local_memory_for_overwrite must be inlined.

The dependency is implicit and prohibits SYCLLowerWGLocalMemoryPass
being moved around in the pass pipeline.

Since the pass transforms __sycl_allocateLocalMemory call to access of
global variable @WGLocalMem, moving the pass to beginning of pipeline
could enable more optimization than the function call does.

In addition, intel gpu compiler has a pass to transform global variable
in addrspace(3) to alloca that runs after pipeline basic simplification.
Therefore, we shall run SYCLLowerWGLocalMemoryPass ealier.
clang/lib/CodeGen/BackendUtil.cpp Outdated Show resolved Hide resolved
llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp Outdated Show resolved Hide resolved
llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp Show resolved Hide resolved
llvm/lib/SYCLLowerIR/LowerWGLocalMemory.cpp Outdated Show resolved Hide resolved
@jsji
Copy link
Contributor

jsji commented Dec 13, 2024

Looks like you need to rebase to pick up the new changes in this pass first.

@wenju-he
Copy link
Contributor Author

Looks like you need to rebase to pick up the new changes in this pass first.

done

@wenju-he wenju-he requested a review from jsji December 13, 2024 02:24
Copy link
Contributor

@jsji jsji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

continue;
}
std::string FName = llvm::demangle(Caller->getName());
if (FName.find("sycl::_V1::ext::oneapi::group_local_memory") ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding current function name from DPC++ library is unfortunate. The code in the DPC++ header files can be changed at any time.

To make it more robust, I thought we could go up in the call stack up-to the kernel function ignoring all functions in sycl:: namespace. This will require SYCL kernel to be inlined into kernel function wrapper.

@Naghasan, do you have any thoughts on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is unfortunate, especially w.r.t. upstreaming. I don't know what the plans are for this one but if it is seen as important, we might want to improve this.

This will require SYCL kernel to be inlined into kernel function wrapper.

I don't think this is an issue TBH, I don't see any benefit in not inline the SYCL kernel in the wrapper, even in SPIR-V.

I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.

I also just realized syclcompat::local_mem uses it, it isn't technically a valid usage of it w.r.t. the extension but something the attribute would allow to correctly handle.

cc @elizabethandrews @joeatodd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.

A new attribute "sycl_forceinline" is added in a4fe915
Please review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why using a SYCL specific attribute 'sycl_forceinline' is better than just relying on the name of the function. Both these options seem equally vulnerable to changes to header code. May be I am missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why using a SYCL specific attribute 'sycl_forceinline' is better than just relying on the name of the function. Both these options seem equally vulnerable to changes to header code. May be I am missing something.

The attribute is our own internal interface and we can say that it is dictated by the compiler and headers have to follow.

Function name is user-visible public API dictated by the extension spec. Tomorrow we will add a new method to it which requires the same handling, rename/remove an existing method - plenty of possible reasons for the name change. And we cannot say that user-visible public high-level SYCL API is dictated by the compiler.

So, from that point of view an attribute is a safer option.

@wenju-he wenju-he requested a review from a team as a code owner December 16, 2024 03:28
return false;

bool Changed = false;
for (auto *U : ALMFunc->users()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to use a work list here rather than the simple loop.

This function https://github.com/intel/llvm/blob/sycl/sycl/include/syclcompat/memory.hpp#L71 needs to be updated as well, and this function won't be able to handle the nesting. The CI is currently green because there is no test requesting 2 distinct local memory objects using this function in the same kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thank you for the suggestion. Now I understand what you mean by syclcompat::local_mem.
Also added a new test sycl/test/check_device_code/syclcompat_local_mem.cpp that has two calls to syclcompat::local_mem in a kernel.

@jsji
Copy link
Contributor

jsji commented Jan 31, 2025

We still need the approvals from @intel/dpcpp-tools-reviewers and/or @intel/syclcompat-lib-reviewers:

Thanks @dm-vodopyanov . @intel/dpcpp-tools-reviewers and @intel/syclcompat-lib-reviewers Please approve or leave further comments. Thanks!

@JackAKirk
Copy link
Contributor

The syclcompat changes look good to me, though I'll appreciate if @joeatodd also has a quick look to confirm, if possible.

@joeatodd could you take a look at this?

Copy link
Contributor

@joeatodd joeatodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCLcompat changes LGTM

Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning makes sense to me, but I can't approve on behave of dpcpp-tools team. The reason is that @asudarsa and/or @maksimsab might have insights on how this patch (can) affect upstreaming efforts.

@jsji jsji requested a review from MrSidims January 31, 2025 15:16
@jsji
Copy link
Contributor

jsji commented Jan 31, 2025

Thanks @MrSidims . @asudarsa @maksimsab Can you have a look and comments? Thanks.

Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DPC++ Tools related changes are in SYCLLowerIR. The logic that relies on an attribute introduced in the SYCL headers to selectively inline functions seems sound and should not impact upstreaming efforts. Approving these changes from DPC++ Tools side.

Thanks
P,S: My e-mail notifications have been a bit broken of late. I am starting to manually track the PRs here. So Sorry for the delay in response.

@jsji jsji closed this Feb 11, 2025
@jsji jsji reopened this Feb 11, 2025
@jsji
Copy link
Contributor

jsji commented Feb 11, 2025

Retriggered the CI since last success run was weeks ago.

wenju-he added a commit to wenju-he/llvm that referenced this pull request Feb 11, 2025
…roup_local_memory.cpp

Since intel#16356 local memory is lowered
to global variable before AddressSanitizerPass. The local memory access
is optimized out before AddressSanitizerPass.
This PR updates the test so that it won't be optimized out.
@wenju-he
Copy link
Contributor Author

Failed Tests (1):
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp

This fail illustrates that this PR enables early optimizations to local memory. The fail will be fixed by #16959

wenju-he added a commit to wenju-he/llvm that referenced this pull request Feb 11, 2025
This fixes "error: undef deprecator failed" in pre-commit test of intel#16356
@aelovikov-intel
Copy link
Contributor

Failed Tests (1):
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp

This fail illustrates that this PR enables early optimizations to local memory. The fail will be fixed by #16959

Why is it a separate PR and not part of this one?!

@bader
Copy link
Contributor

bader commented Feb 11, 2025

Failed Tests (1):
  SYCL :: AddressSanitizer/out-of-bounds/local/group_local_memory.cpp

This fail illustrates that this PR enables early optimizations to local memory. The fail will be fixed by #16959

Why is it a separate PR and not part of this one?!

It should be separate because the issue is not caused by changes of this patch.

@aelovikov-intel
Copy link
Contributor

This fail illustrates that this PR enables ...

the issue is not caused by changes of this patch

I can't match these two together...

@bader
Copy link
Contributor

bader commented Feb 11, 2025

This fail illustrates that this PR enables ...

the issue is not caused by changes of this patch

I can't match these two together...

The bug already in the code, it doesn't expose itself w/o this patch.

@aelovikov-intel
Copy link
Contributor

This fail illustrates that this PR enables ...

the issue is not caused by changes of this patch

I can't match these two together...

The bug already in the code, it doesn't expose itself w/o this patch.

Thanks. Just to clarify, we'll merge the other PR with the fix first, and this PR will only go after that, right?

@bader
Copy link
Contributor

bader commented Feb 11, 2025

This fail illustrates that this PR enables ...

the issue is not caused by changes of this patch

I can't match these two together...

The bug already in the code, it doesn't expose itself w/o this patch.

Thanks. Just to clarify, we'll merge the other PR with the fix first, and this PR will only go after that, right?

That's my expectation. We must get clean pre-commit before merging this change.

uditagarwal97 pushed a commit that referenced this pull request Feb 12, 2025
…pp (#16959)

Disable the test temporarily due to two issues that appear after PR
#16356:
* Local memory in the test is optimized out before AddressSanitizerPass.
* #16979
aelovikov-intel pushed a commit that referenced this pull request Feb 12, 2025
This fixes "error: undef deprecator failed" in pre-commit test of #16356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.