-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Driver] Support bfloat16 devicelib selection when multiple AOT targets specified #16494
base: sycl
Are you sure you want to change the base?
Conversation
…support native bfloat16 conversion
clang/lib/Driver/ToolChains/SYCL.cpp
Outdated
// 1). clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device pvc,...," | ||
// 2). clang++ -fsycl -fsycl-targets=intel_gpu_pvc,... | ||
// We assume that users will only apply either 1) or 2) and won't mix the | ||
// 2 ways in their compiling command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this assumption, can we emit a diagnostic when this s combination occurs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @mdtoguchi
The assumption is listed here because compiler has already emitted error message about such mix use. For example, if we mix using like:
clang++ -fsycl -fsycl-targets=spir64_gen,intel_gpu_dg1 -Xs "-device pvc" test.cpp
Compilation will abort and emit:
clang++: error: unsupported option '-device' for target 'intel_gpu_dg1'
Thanks very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That error isn't guaranteed. For instance:
clang++ -fsycl -fsycl-targets=spir64_gen,intel_gpu_dg1 -Xsycl-target-backend=spir64_gen "-device pvc" test.cpp
This will compile, and create 2 different device binaries. The use of -Xs <arg>
is a more general usage where <arg>
is passed to all device compilations, while you can steer specific args using -Xsycl-target-backend=target <arg>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @mdtoguchi
Thanks for pointing out this, we need to take care of the scenario you mentioned, native bfloat16 devicelib can only be used when all GPU AOT devices support native bfloat16 conversion.
clang/lib/Driver/ToolChains/SYCL.cpp
Outdated
// 1). clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device pvc,...," | ||
// 2). clang++ -fsycl -fsycl-targets=intel_gpu_pvc,... | ||
// We assume that users will only apply either 1) or 2) and won't mix the | ||
// 2 ways in their compiling command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case where we can specify Intel GPUs via --offload-arch
option?
clang++ --offload-new-driver -fsycl --offload-arch=bdw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @srividya-sundaram
Thanks for pointing out this, this PR didn't consider new offload driver, I will update it.
Thanks very much.
Please add some description to the PR in addition to the PR title. |
Co-authored-by: Michael Toguchi <[email protected]>
Co-authored-by: Michael Toguchi <[email protected]>
Co-authored-by: Michael Toguchi <[email protected]>
Co-authored-by: Michael Toguchi <[email protected]>
Signed-off-by: jinge90 <[email protected]>
clang/lib/Driver/ToolChains/SYCL.cpp
Outdated
static llvm::SmallSet<StringRef, 8> GPUArchsWithNBF16{ | ||
"intel_gpu_pvc", "intel_gpu_acm_g10", "intel_gpu_acm_g11", | ||
"intel_gpu_acm_g12", "intel_gpu_bmg_g21"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine as a short-term fix, but longer-term we should be thinking of a different way to identify architectures with bfloat16 support that doesn't require maintaining a list of devices. For example, is there a way to query the properties of the target device(s) to determine if bfloat16 is supported?
Note that Lunar Lake GPUs also support bfloat16 (I think), and it's not in this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @bashbaug
The bf16 support can be queried in execution time because we can only know the real target the program is running during execution time. In compilation time for AOT mode, compiler driver can only decide according to target platform specified and we have to maintain the bf16 support information in compiler driver source code, otherwise compiler driver won't know whether a target platform supports bf16. The platform we are building the program may be different from the platform the program is going to run on.
Thanks very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we call into ocloc
to do the query? For example, if we're AOT compiling for DG2, we could do something like (this example uses the ocloc
command-line interface, but the same is supported for the library interface):
$ ocloc query CL_DEVICE_EXTENSIONS_WITH_VERSION -device dg2
<snip> cl_intel_bfloat16_conversions:1.0.0 <snip>
Because the cl_intel_bfloat16_conversions
extension is supported, we can know that DG2 supports SPIR-V bfloat16 conversion instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @bashbaug
I just tried the ocloc command but found after running "ocloc query CL_DEVICE_EXTENSIONS_WITH_VERSION -device dg2" but found a file named with "CL_DEVICE_EXTENSIONS_WITH_VERSION" is generated in local directory, it seems to be ocloc's behavior, is there any way to get rid of this?
Thanks very much.
if (Arg *SYCLTarget = Args.getLastArg(options::OPT_fsycl_targets_EQ)) { | ||
for (auto TargetsV : SYCLTarget->getValues()) { | ||
if (!checkSpirvJIT(StringRef(TargetsV)) && | ||
!StringRef(TargetsV).starts_with("spir64_gen") && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment states 'filter out CPU and FPGA' targets, but GPU is checked here as well, which would seamingly negate the -device
check above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @mdtoguchi
If we use following command:
clang++ -fsycl -fsycl-targets=spir64,spir64_fpga,spir64_gen,spir64_x86_64 -Xsycl-target-backend=spir64_gen "-device pvc" assert-debug.cpp
Although only GPU toolchain is checked here, the sycl target value got from "-fsycl-targets=" is a list including "spir64", "spir64_fpga", "spir64_gen", "spir64_x86_64" and we only need to care about "spir64_gen" and corresponding "-device XXX" argument, so I filtered "spir64", "spir64_x86_64", "spir64_fpga".
Thanks very much.
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE-FALLBACK | ||
// RUN: %clangxx -fsycl -fsycl-targets=intel_gpu_pvc,spirv64-unknown-unknown,intel_gpu_dg1 \ | ||
// RUN: --sysroot=%S/Inputs/SYCL %s -### 2>&1 \ | ||
// RUN: | FileCheck %s -check-prefix=BFLOAT16-FALLBACK-NONE-FALLBACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a check for the -fsycl-targets=intel_gpu_pvc,spir64_gen -Xsycl-target-backend=spir64_gen "-device dg1"
type of scenario if we are allowing that combination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @mdtoguchi
Already added the test. For this scenario (mix use of -fsycl-targets=intel_gpu_xxx,spir64_gen ... -Xsycl-target-backend=spir64_gen "-device YYY"), do we have plan to continue to support it in incoming new offload driver mode? Or do we have plan to unify all the ways to use offloading AOT compilation?
Currently, such combined use will create 2 device binaries and call device link 2 times, selectBfloat functions will be called in each device link process but I can't figure out a reliable way to get to know what the target device architecture is in each device link phase, so have to limit the use of native bfloat16 devicelibs only when all device architectures support native bfloat16 conversion, do you have any good idea to solve this?
Thanks very much.
Co-authored-by: Michael Toguchi <[email protected]>
Signed-off-by: jinge90 <[email protected]>
Hi, @mdtoguchi Thanks very much. |
Signed-off-by: jinge90 <[email protected]>
User can specify multiple AOT targets when building sycl program in followings ways:
1). via -fsycl-targets=intel_gpu_pvc,intel_gpu_acm_g10,....
2). via -fsycl-targets=spir64_gen ... -Xs "-device pvc,dg2...."
3). via -fsycl-targets=spir64_gen..., -Xsycl-target-backend=spir64_gen "-device pvc"
We should select native bfloat16 devicelib when all AOT targets specified support native bfloat16 conversion. Currently, pvc, dg2, bmg devices support native bfloat16.
If user specifies JIT target together with AOT targets which all support native bfloat16 conversion, we still select native bfloat16 devicelib since bfloat16 devicelib is skipped in linking step for JIT target.