Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sycl-web lit tests #16996

Closed
wants to merge 1,571 commits into from
Closed

Fix sycl-web lit tests #16996

wants to merge 1,571 commits into from

Conversation

Chenyang-L
Copy link
Contributor

Fix conflicts and update failing lit tests.

chandlerc and others added 30 commits February 4, 2025 18:04
… tables

This leverages the sharded structure of the builtins to make it easy to
directly tablegen most of the AArch64 and ARM builtins while still using
X-macros for a few edge cases. It also extracts common prefixes as part
of that.

This makes the string tables for these targets dramatically smaller.
This is especially important as the SVE builtins represent (by far) the
largest string table and largest builtin table across all the targets in
Clang.
…and info

This moves the main builtins and several targets to use nice generated
string tables and info structures rather than X-macros. Even without
obvious prefixes factored out, the resulting tables are significantly
smaller and much cheaper to compile with out all the X-macro overhead.

This leaves the X-macros in place for atomic builtins which have a wide
range of uses that don't seem reasonable to fold into TableGen.

As future work, these should move to their own file (whether as X-macros
or just generated patterns) so the AST headers don't have to include all
the data for other builtins.
This requires adding support to the general builtins emission for
producing prefixed builtin infos separately from un-prefixed which is
a bit crufty. But we don't currently have any good way of having a more
refined model than a single hard-coded prefix string per TableGen
emission. Something more powerful and/or elegant is possible, but this
is a fairly minimal first step that at least allows factoring out the
builtin prefix for something like X86.
This target's builtins have an especially long prefix and so we get over
2x reduction in string table size required with this change.
… (#125564)

Add Rocdl support for the following GFX950 instructions:

CVT_SCALE_PK_FP8_F32
CVT_SCALE_PK_BF8_F32
CVT_SCALE_SR_FP8_F32
CVT_SCALE_SR_BF8_F32
CVT_SCALE_PK_F32_FP8
CVT_SCALE_PK_F32_BF8
CVT_SCALE_F32_FP8
CVT_SCALE_F32_BF8
… reduction to scalar (#125288)

This generalizes handleVectorReduceIntrinsic to allow intrinsics where
the return type is not the same as the fields. This patch then applies
the generalized handleVectorReduceIntrinsic to support the following Arm
NEON add reduction to scalar intrinsics: llvm.aarch64.neon.{faddv,
saddv, uaddv}.

Updates the tests from llvm/llvm-project#125271
This adds handleVectorReduceWithStarterIntrinsic() (similar to
handleVectorReduceIntrinsic but for intrinsics with an additional
starting parameter) and uses it to handle
Intrinsic::vector_reduce_f{add,mul}.

Updates the tests from llvm/llvm-project#125597
This patch fixes:

  clang/lib/Frontend/CompilerInvocation.cpp:3854:16: error:
  enumeration value 'Ver20' not handled in switch [-Werror,-Wswitch]
Now that we store the command in the CommandReturnObject (#125132) we
can check the command in the print callback.
…571)

An LValueToRValue cast shouldn't be ignored, so bail out of the visitor
if we encounter one.
Teach InterleavedAccessPass to recognize the following patterns:
  - vp.store an interleaved scalable vector
  - Deinterleaving a scalable vector loaded from vp.load

Upon recognizing these patterns, IA will collect the interleaved /
deinterleaved operands and delegate them over to their respective
newly-added TLI hooks.

For RISC-V, these patterns are lowered into segmented loads/stores

Right now we only recognized power-of-two (de)interleave cases, in which
(de)interleave4/8 are synthesized from a tree of (de)interleave2.

---------

Co-authored-by: Nikolay Panchenko <[email protected]>
This PR adds support for UB constant materialization (i.e., generating
`ub::PoisonOp` to `VectorDialect::materializeConstant`. This was the
reason why the vector folders generating poison didn't work.
resource keys have the problem that you can’t parse them from mlir
assembly if they have special or non-printable characters, but nothing
prevents you from specifying such a key when you create e.g. a
DenseResourceElementsAttr, and it works fine in other ways, including
bytecode emission and parsing

this PR solves the parsing by quoting and escaping keys with special or
non-printable characters in mlir assembly, in the same way as symbols,
e.g.:
```
module attributes {
  fst = dense_resource<resource_fst> : tensor<2xf16>,
  snd = dense_resource<"resource\09snd"> : tensor<2xf16>
} {}

{-#
  dialect_resources: {
    builtin: {
      resource_fst: "0x0200000001000200",
      "resource\09snd": "0x0200000008000900"
    }
  }
#-}
```

by not quoting keys without special or non-printable characters, the
change is effectively backwards compatible

the change is tested by:
1. adding a test with a dense resource handle key with special
characters to `dense-resource-elements-attr.mlir`
2. adding special and unprintable characters to some resource keys in
the existing lit tests `pretty-resources-print.mlir` and
`mlir/test/Bytecode/resources.mlir`
… constructs (#125750)

Previous patch was too restrictive and didn't take into account cuf
kernels and openacc compute constructs as being device context.
If a variant part has a 128-bit discriminator, then
DwarfUnit::constructTypeDIE will assert.  This patch fixes the problem
by allowing any size of integer to be used here.  This is largely
accomplished by moving part of DwarfUnit::addConstantValue to a new
method.

Fixes #119655
The functions now use VPBuilder to insert recipes and the VPBB argument
is unused. Clean it up.
…san (#125763)

Previously this test was entirely disabled under asan, but not
hwasan.  Instead of disabling the test, make the test compatible
with both asan and hwasan by disabling sanitizers only on the
subroutine that does the stack-smashing.
Adds codegen support for fence.acquire and fence.release, a script and
generated tests for all possible legal fences, and cleans up some
tablegen rules.
After the changes in 89001d1, the
container pushes failed, because it was attempting to push the same
container twice. This fixes the sed expression used to push the :latest
alias for each container.
…#125729)

Currently handled (suboptimally) by handleUnknownInstruction:
- llvm.aarch64.neon.fmaxv (Floating-point Maximum (vector))
- llvm.aarch64.neon.fminv
- llvm.aarch64.neon.fmaxnmv (Floating-point Maximum Number across
Vector)
- llvm.aarch64.neon.fminnmv
(not to be mistaken with llvm.aarch64.neon.f{max,min}, which are
correctly handled by `maybeHandleSimpleNomemIntrinsic`)

Forked from llvm/test/CodeGen/AArch64/arm64-fminv.ll
…12792)

New proposed function `clang-format-vc-diff`.

It is the same as calling `clang-format-region` on all diffs between
the content of a buffer-file and the content of the file at git
revision HEAD. This is essentially the same thing as:
    `git-clang-format -f {filename}`
If the current buffer is saved.

The motivation is many project (LLVM included) both have code that is
non-compliant with there clang-format style and disallow unrelated
format diffs in PRs. This means users can't just run
`clang-format-buffer` on the buffer they are working on, and need to
manually go through all the regions by hand to get them
formatted. This is both an error prone and annoying workflow.
Currently handled (suboptimally) by handleUnknownInstruction:
- llvm.aarch64.neon.saddlv
- llvm.aarch64.neon.uaddlv

Forked from llvm/test/CodeGen/AArch64/arm64-vaddlv.ll
I thought I had added tests together with
llvm/llvm-project#125276
But there are still in my sandbox. These are the tests that were meant
for this PR.
This patch implement the instruction cost for vp.splice intrinsic.

To support type-based query for LV, adding a constant index when quering
`getShuffleCost()`. We get the same cost no matter what 
`index` is because it only change the cost from `vslide.vx` to
`vslide.vi` and
the cost of `vslide.vx` is same as `vslide.vi` in current
RISCV implementation.
Better to use TTI::getScalarizationOverhead instead of
TTI::getVectorInstrCost to correctly calculate the costs of
buildvectors/extracts.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm/llvm-project#125725
iclsrc and others added 6 commits February 12, 2025 08:38
  CONFLICT (content): Merge conflict in llvm/include/llvm/IR/Intrinsics.h
  CONFLICT (content): Merge conflict in llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
…19730)

Even in cases where handles are supported, references are still
preferable for performance. This is because, a ref uses one
less register and can avoid the handle creating code associated with
taking the address of a tex/surf/sampler.
@Chenyang-L Chenyang-L requested review from a team and bader as code owners February 12, 2025 22:53
@Chenyang-L Chenyang-L marked this pull request as draft February 12, 2025 22:54
@Chenyang-L Chenyang-L closed this Feb 12, 2025
@Chenyang-L Chenyang-L deleted the sycl-fix-lit branch February 12, 2025 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.