-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][SYCLLowerWGLocalMemoryPass] Remove implicit dependency on AlwaysInlinerPass and move to PipelineStart #16356
base: sycl
Are you sure you want to change the base?
Conversation
…ysInlinerPass and move to PipelineStart Currently SYCLLowerWGLocalMemoryPass must run after AlwaysInlinerPass because in sycl header __sycl_allocateLocalMemory call is wrapped in group_local_memory/group_local_memory_for_overwrite function. Each call to __sycl_allocateLocalMemory represents a unique local memory, so group_local_memory/group_local_memory_for_overwrite must be inlined. The dependency is implicit and prohibits SYCLLowerWGLocalMemoryPass being moved around in the pass pipeline. Since the pass transforms __sycl_allocateLocalMemory call to access of global variable @WGLocalMem, moving the pass to beginning of pipeline could enable more optimization than the function call does. In addition, intel gpu compiler has a pass to transform global variable in addrspace(3) to alloca that runs after pipeline basic simplification. Therefore, we shall run SYCLLowerWGLocalMemoryPass ealier.
Looks like you need to rebase to pick up the new changes in this pass first. |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
continue; | ||
} | ||
std::string FName = llvm::demangle(Caller->getName()); | ||
if (FName.find("sycl::_V1::ext::oneapi::group_local_memory") == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding current function name from DPC++ library is unfortunate. The code in the DPC++ header files can be changed at any time.
To make it more robust, I thought we could go up in the call stack up-to the kernel function ignoring all functions in sycl::
namespace. This will require SYCL kernel to be inlined into kernel function wrapper.
@Naghasan, do you have any thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is unfortunate, especially w.r.t. upstreaming. I don't know what the plans are for this one but if it is seen as important, we might want to improve this.
This will require SYCL kernel to be inlined into kernel function wrapper.
I don't think this is an issue TBH, I don't see any benefit in not inline the SYCL kernel in the wrapper, even in SPIR-V.
I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.
I also just realized syclcompat::local_mem
uses it, it isn't technically a valid usage of it w.r.t. the extension but something the attribute would allow to correctly handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think relying on an attribute is probably the most flexible: this makes the compiler agnostic to header refactor and changes in API. It is also cheap to add.
A new attribute "sycl_forceinline" is added in a4fe915
Please review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me why using a SYCL specific attribute 'sycl_forceinline' is better than just relying on the name of the function. Both these options seem equally vulnerable to changes to header code. May be I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me why using a SYCL specific attribute 'sycl_forceinline' is better than just relying on the name of the function. Both these options seem equally vulnerable to changes to header code. May be I am missing something.
The attribute is our own internal interface and we can say that it is dictated by the compiler and headers have to follow.
Function name is user-visible public API dictated by the extension spec. Tomorrow we will add a new method to it which requires the same handling, rename/remove an existing method - plenty of possible reasons for the name change. And we cannot say that user-visible public high-level SYCL API is dictated by the compiler.
So, from that point of view an attribute is a safer option.
return false; | ||
|
||
bool Changed = false; | ||
for (auto *U : ALMFunc->users()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to use a work list here rather than the simple loop.
This function https://github.com/intel/llvm/blob/sycl/sycl/include/syclcompat/memory.hpp#L71 needs to be updated as well, and this function won't be able to handle the nesting. The CI is currently green because there is no test requesting 2 distinct local memory objects using this function in the same kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thank you for the suggestion. Now I understand what you mean by syclcompat::local_mem
.
Also added a new test sycl/test/check_device_code/syclcompat_local_mem.cpp
that has two calls to syclcompat::local_mem
in a kernel.
Thanks @dm-vodopyanov . @intel/dpcpp-tools-reviewers and @intel/syclcompat-lib-reviewers Please approve or leave further comments. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCLcompat changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning makes sense to me, but I can't approve on behave of dpcpp-tools team. The reason is that @asudarsa and/or @maksimsab might have insights on how this patch (can) affect upstreaming efforts.
Thanks @MrSidims . @asudarsa @maksimsab Can you have a look and comments? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DPC++ Tools related changes are in SYCLLowerIR. The logic that relies on an attribute introduced in the SYCL headers to selectively inline functions seems sound and should not impact upstreaming efforts. Approving these changes from DPC++ Tools side.
Thanks
P,S: My e-mail notifications have been a bit broken of late. I am starting to manually track the PRs here. So Sorry for the delay in response.
Retriggered the CI since last success run was weeks ago. |
…roup_local_memory.cpp Since intel#16356 local memory is lowered to global variable before AddressSanitizerPass. The local memory access is optimized out before AddressSanitizerPass. This PR updates the test so that it won't be optimized out.
This fail illustrates that this PR enables early optimizations to local memory. The fail will be fixed by #16959 |
This fixes "error: undef deprecator failed" in pre-commit test of intel#16356
Why is it a separate PR and not part of this one?! |
It should be separate because the issue is not caused by changes of this patch. |
I can't match these two together... |
The bug already in the code, it doesn't expose itself w/o this patch. |
Thanks. Just to clarify, we'll merge the other PR with the fix first, and this PR will only go after that, right? |
That's my expectation. We must get clean pre-commit before merging this change. |
This fixes "error: undef deprecator failed" in pre-commit test of #16356
Currently SYCLLowerWGLocalMemoryPass must run after AlwaysInlinerPass because in sycl header __sycl_allocateLocalMemory call is wrapped in group_local_memory/group_local_memory_for_overwrite function. Each call to __sycl_allocateLocalMemory represents a unique local memory, so group_local_memory/group_local_memory_for_overwrite must be inlined.
The dependency is implicit and prohibits SYCLLowerWGLocalMemoryPass being moved around in the pass pipeline.
Since the pass transforms __sycl_allocateLocalMemory call to access of global variable @WGLocalMem, moving the pass to beginning of pipeline could enable more optimization than the function call does.
We can't assume backend compiler lowers the global variable after AlwaysInlinerPass.