Suspected defect in TLS linker optimizations #1633

Shai-Aviv · 2020-08-17T13:03:41Z

My team has recently upgraded to AT12.0, and we've noticed a linker behavior which changes our register operands.

For example, this C code:

#include <stdint.h>
__thread int myvar;
int main() {
  register uint64_t base __asm__("r4") = 0x1000;
  register uint32_t result __asm__("r3");
  asm volatile("addis %0, %1, myvar@tprel@ha;" : "=r"(result) : "r"(base) );
  asm volatile("addi %0, %1, myvar@tprel@l; " : "=r"(result) : "r"(result) );
  return (int)result;
}

produces the following binary on AT7.0 (disassembled):

li r4,4096
addis r3,r4,0
addi r3,r3,-28672
extsw r3,r3
blr

but produces a different binary on AT12.0 (disassembled, see in Compiler Explorer):

li r4,4096
addis r3,r4,0
addi r3,r13,-28672
extsw r3,r3
blr

Note the change in the second operand of the addi instruction (r3 vs. r13).

We saw the different TLS code sequences that may be optimized by the linker in the following link: http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655241_61284.html
However, our code sequence does not seem to fall under any of the mentioned cases.
Is it intentional? Can this behavior be locally disabled?

The text was updated successfully, but these errors were encountered:

segher · 2020-08-17T14:01:31Z

The source code uses @tprel for something that doesn't have anything
to do with thread local storage. How could that work?

Shai-Aviv · 2020-08-17T14:37:58Z

This is just a example, but in this instance, I want to calculate:
0x1000 + <offset of "myvar" relative to thread pointer>
without actually reading the thread pointer (r13).
I am under the impression that the @tprel relocation is fit for this purpose.

mscastanho · 2020-08-19T19:44:36Z

@ShayAviv gcc -c generates two relocations for this code: R_PPC64_TPREL16_HA and R_PPC64_TPREL16_LO:

$ /opt/at12.0/bin/gcc -c -g /tmp/test.c
$ /opt/at12.0/bin/objdump -rS a.out

[...]
  asm volatile("addis %0, %1, myvar@tprel@ha;" : "=r"(result) : "r"(base) );
  10:   3c 64 00 00     addis   r3,r4,0
                        12: R_PPC64_TPREL16_HA  myvar
  asm volatile("addi %0, %1, myvar@tprel@l; " : "=r"(result) : "r"(result) );
  14:   38 63 00 00     addi    r3,r3,0
                        16: R_PPC64_TPREL16_LO  myvar

Starting with binutils 2.30 (more specifically commit 9a23f96e upstream), there was a change in the way these relocations are processed by the linker. Note that AT 12 was the first to include these changes, since AT 11 follows binutils 2.29 and AT 12 follows 2.31.

Quoting part of the ChangeLog for that commit:

(ppc64_elf_relocate_section): Nop addis on TPREL16_HA, and convert
insn on TPREL16_LO and TPREL16_LO_DS relocs to use r13 when
addis would add zero.

I'm not particularly familiar with the inner workings of the linker and also why this changed, but looks like in this case the linker replaces r3 with r13 because the first addis is adding 0.

There may be a different way to accomplish what you need, but I'm not aware of it.

segher · 2020-08-19T20:43:26Z

What happens here looks exactly like what the comment says, except that the addis isn't nopped?

Shai-Aviv · 2020-08-20T09:07:46Z

To add on what @segher said, the following linker TLS optimization would have made perfect sense, becuase it preserves the result:

addis r3,r13,0
addi r3,r3,-28672

replaced with:

nop
addi r3,r13,-28672

but I argue that in my case, this transformation does not apply. Instead, it should have been something like this:

nop
addi r3,r4,-28672

amodra · 2020-08-22T01:41:15Z

You're correct, this is a defect. The linker warns, but it should disable the optimization.

amodra · 2020-08-24T12:27:51Z

Fixed in FSF binutils with git commit 252dcdf432c6.
https://sourceware.org/pipermail/binutils/2020-August/112967.html

er-1 · 2020-08-26T10:33:07Z

@amodra Do you know if it's planned to backport the fix to the following binutils branches?

binutils-2_31-branch
binutils-2_34-branch
binutils-2_35-branch

amodra · 2020-09-10T07:16:33Z

I've pushed the fix to 2.34 and 2.35. 2.31 is no longer an active branch (strictly speaking, neither is 2.34).

er-1 · 2020-09-10T07:32:23Z

I've pushed the fix to 2.34 and 2.35. 2.31 is no longer an active branch (strictly speaking, neither is 2.34).

@amodra thanks

This means the fix will be available in the AT 13.0-3 and AT 14.0-1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suspected defect in TLS linker optimizations #1633

Suspected defect in TLS linker optimizations #1633

Shai-Aviv commented Aug 17, 2020 •

edited

Loading

segher commented Aug 17, 2020

Shai-Aviv commented Aug 17, 2020 •

edited

Loading

mscastanho commented Aug 19, 2020 •

edited

Loading

segher commented Aug 19, 2020

Shai-Aviv commented Aug 20, 2020 •

edited

Loading

amodra commented Aug 22, 2020

amodra commented Aug 24, 2020

er-1 commented Aug 26, 2020 •

edited

Loading

amodra commented Sep 10, 2020

er-1 commented Sep 10, 2020 •

edited

Loading

Suspected defect in TLS linker optimizations #1633

Suspected defect in TLS linker optimizations #1633

Comments

Shai-Aviv commented Aug 17, 2020 • edited Loading

segher commented Aug 17, 2020

Shai-Aviv commented Aug 17, 2020 • edited Loading

mscastanho commented Aug 19, 2020 • edited Loading

segher commented Aug 19, 2020

Shai-Aviv commented Aug 20, 2020 • edited Loading

amodra commented Aug 22, 2020

amodra commented Aug 24, 2020

er-1 commented Aug 26, 2020 • edited Loading

amodra commented Sep 10, 2020

er-1 commented Sep 10, 2020 • edited Loading

Shai-Aviv commented Aug 17, 2020 •

edited

Loading

Shai-Aviv commented Aug 17, 2020 •

edited

Loading

mscastanho commented Aug 19, 2020 •

edited

Loading

Shai-Aviv commented Aug 20, 2020 •

edited

Loading

er-1 commented Aug 26, 2020 •

edited

Loading

er-1 commented Sep 10, 2020 •

edited

Loading