Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspected defect in TLS linker optimizations #1633

Open
Shai-Aviv opened this issue Aug 17, 2020 · 10 comments
Open

Suspected defect in TLS linker optimizations #1633

Shai-Aviv opened this issue Aug 17, 2020 · 10 comments

Comments

@Shai-Aviv
Copy link

Shai-Aviv commented Aug 17, 2020

My team has recently upgraded to AT12.0, and we've noticed a linker behavior which changes our register operands.

For example, this C code:

#include <stdint.h>
__thread int myvar;
int main() {
  register uint64_t base __asm__("r4") = 0x1000;
  register uint32_t result __asm__("r3");
  asm volatile("addis %0, %1, myvar@tprel@ha;" : "=r"(result) : "r"(base) );
  asm volatile("addi %0, %1, myvar@tprel@l; " : "=r"(result) : "r"(result) );
  return (int)result;
}

produces the following binary on AT7.0 (disassembled):

li r4,4096
addis r3,r4,0
addi r3,r3,-28672
extsw r3,r3
blr

but produces a different binary on AT12.0 (disassembled, see in Compiler Explorer):

li r4,4096
addis r3,r4,0
addi r3,r13,-28672
extsw r3,r3
blr

Note the change in the second operand of the addi instruction (r3 vs. r13).

We saw the different TLS code sequences that may be optimized by the linker in the following link: http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655241_61284.html
However, our code sequence does not seem to fall under any of the mentioned cases.
Is it intentional? Can this behavior be locally disabled?

@segher
Copy link

segher commented Aug 17, 2020

The source code uses @tprel for something that doesn't have anything
to do with thread local storage. How could that work?

@Shai-Aviv
Copy link
Author

Shai-Aviv commented Aug 17, 2020

This is just a example, but in this instance, I want to calculate:
0x1000 + <offset of "myvar" relative to thread pointer>
without actually reading the thread pointer (r13).
I am under the impression that the @tprel relocation is fit for this purpose.

@mscastanho
Copy link
Contributor

mscastanho commented Aug 19, 2020

@ShayAviv gcc -c generates two relocations for this code: R_PPC64_TPREL16_HA and R_PPC64_TPREL16_LO:

$ /opt/at12.0/bin/gcc -c -g /tmp/test.c
$ /opt/at12.0/bin/objdump -rS a.out

[...]
  asm volatile("addis %0, %1, myvar@tprel@ha;" : "=r"(result) : "r"(base) );
  10:   3c 64 00 00     addis   r3,r4,0
                        12: R_PPC64_TPREL16_HA  myvar
  asm volatile("addi %0, %1, myvar@tprel@l; " : "=r"(result) : "r"(result) );
  14:   38 63 00 00     addi    r3,r3,0
                        16: R_PPC64_TPREL16_LO  myvar

Starting with binutils 2.30 (more specifically commit 9a23f96e upstream), there was a change in the way these relocations are processed by the linker. Note that AT 12 was the first to include these changes, since AT 11 follows binutils 2.29 and AT 12 follows 2.31.

Quoting part of the ChangeLog for that commit:

(ppc64_elf_relocate_section): Nop addis on TPREL16_HA, and convert
insn on TPREL16_LO and TPREL16_LO_DS relocs to use r13 when
addis would add zero.

I'm not particularly familiar with the inner workings of the linker and also why this changed, but looks like in this case the linker replaces r3 with r13 because the first addis is adding 0.

There may be a different way to accomplish what you need, but I'm not aware of it.

@segher
Copy link

segher commented Aug 19, 2020

What happens here looks exactly like what the comment says, except that the addis isn't nopped?

@Shai-Aviv
Copy link
Author

Shai-Aviv commented Aug 20, 2020

To add on what @segher said, the following linker TLS optimization would have made perfect sense, becuase it preserves the result:

addis r3,r13,0
addi r3,r3,-28672

replaced with:

nop
addi r3,r13,-28672

but I argue that in my case, this transformation does not apply. Instead, it should have been something like this:

nop
addi r3,r4,-28672

@amodra
Copy link

amodra commented Aug 22, 2020

You're correct, this is a defect. The linker warns, but it should disable the optimization.

@amodra
Copy link

amodra commented Aug 24, 2020

Fixed in FSF binutils with git commit 252dcdf432c6.
https://sourceware.org/pipermail/binutils/2020-August/112967.html

@er-1
Copy link
Contributor

er-1 commented Aug 26, 2020

@amodra Do you know if it's planned to backport the fix to the following binutils branches?

  • binutils-2_31-branch
  • binutils-2_34-branch
  • binutils-2_35-branch

@amodra
Copy link

amodra commented Sep 10, 2020

I've pushed the fix to 2.34 and 2.35. 2.31 is no longer an active branch (strictly speaking, neither is 2.34).

@er-1
Copy link
Contributor

er-1 commented Sep 10, 2020

I've pushed the fix to 2.34 and 2.35. 2.31 is no longer an active branch (strictly speaking, neither is 2.34).

@amodra thanks

This means the fix will be available in the AT 13.0-3 and AT 14.0-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants