Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch RELOC_REL32 cases where offset is > 2GiB #52

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Feb 10, 2018

Previously, if a DLL was loaded more than 2GiB away from the executable image in the address space, flexdll_relocate would fail if there were any RELOC_REL32 relocations as the offset couldn't be represented in a 32-bit number.

The solution at the time was to specify a base address of 0x10000 for the MSVC64 and Cygwin64 ports (for some reason, mingw64 got forgotten about). For MSVC64, this generally seemed to work. For Cygwin64, this broke Unix.fork. Recent changes to Cygwin64 also mean that Cygwin DLLs and executables are not supposed to occupy memory below 0x2:00000000 and 0x1:00000000 respectively.

This patch creates a spare executable section in the DLL which is used to create trampolines for relocations where the offset is over 2GiB. As a result, it's no longer necessary to specify the image base address, so this default has been removed and both the Cygwin64 fork issue and RELOC_REL32 are simultaneously solved.

This needs quite a lot of testing which I'm doing, but it would also be handy for the code to start being reviewed. The easiest option would have been to break backwards compatibility (format of symtbl would have changed and the number of parameters for flexdll_relocate). The version I give here allows mixing of DLLs and executables compiled with both this new version and older versions.

@alainfrisch
Copy link
Collaborator

Thanks for working on this!

(FTR, #2 has a similar proposal, but the discussion there died quickly. Contrary to the current PR, #2 generates the trampoline at link time (independently of the target distance) rather than at load time (the flexdll runtime does not need to be changed).)

Some comments/questions:

  1. The trampoline code destroys %rax. As far as I understand, this is in line with the Microsoft x64 ABI (volatile, and not used to pass arguments) for C code. But for instance, OCaml would use this register to pass an argument. I think it would be better to make flexdll less dependent on the ABI, even at the cost of an extra indirection. Looking at what gcc -O2generates for:
void (*f)(void);
void foo() { f(); }

=>

foo:
        .seh_endprologue
        rex.W jmp       *f(%rip)
        .seh_endproc

producing:

0000000000000000 <foo>:
   0:   48 ff 25 00 00 00 00    rex.W jmpq *0x0(%rip)        # 7 <foo+0x7>

(not sure if the prefix is required), which is what #2 was proposing to use.

  1. The criterion to determine if a trampoline can be used is to look if the target falls in an executable page. Of course, there can be non-code targets in such a section (I think even OCaml used to put jump tables in the code section). Should we harden the criterion by checking that the instruction just before the relocation is a jump or call? This would still be a heuristic, of course. Btw, have you ever seen a case where either cl or gcc produces relative relocations for non-code targets?

  2. Can you confirm the problem is only with C code? The OCaml backend should go through __caml_imp_XXX indirections for code jmp/call targets, and use movabsq (with a 64-bit literal) to load the address of symbols (=> absolute relocations).

@dra27
Copy link
Member Author

dra27 commented Feb 12, 2018

Hah - I hadn't realised that there was another PR which had worked on this! I'm still working through understanding this all properly - in particular, there's something weird happening with on mingw64 when the DLL is located far away where caml_reify_bytecode is raising an End_of_file exception!

I'm certainly only seeing the trampolines being generated for C primitives in the runtime, yes. AFAICT so far, the default -mcmodel=32 for gcc doesn't generate REL32 instructions for data symbols. The weak heuristic I'd added was when trying to break it by saying -mcmodel=small (which does break it). My reading so far suggests that mcmodel=medium would never generate REL32 instructions for data symbols, because that model doesn't permit the assumption that data is close enough - but I haven't yet found any documentation as to whether the default mcmodel=32 assumption uses this or not. And of course I don't know what MSVC's doing with all this (although it does seem to be working).

FWIW, I think patching the runtime is better - definitely on mingw64 and I think now on msvc64, the default load addresses for executables should be < 0xffffffff (I think that the issue seen with this in 2008 no longer exists). It seems unnecessary to be generating these jump tables when most of the time they're not needed. That said, on Cygwin64 they will always be needed!

This present implementation has a bug at the moment which becomes revealed if you load two DLLs (like ocamldoc, loading str and unix) - the cached trampolines for dllunix can be made to be too far away for dllstr, so I need to use a per-dll symbol cache which should actually make that code a bit neater anyway. It's also unnecessarily doing all this for 32-bit.

I won't have much time on this today, but I'll be back on it with a vengeance tomorrow!

@alainfrisch
Copy link
Collaborator

Note: the indirect jump instruction is also what is used for ___imp_XXX symbols, such as:

__declspec(dllimport)  void f(void);
void foo() { f(); }

==>

        rex.W jmp       *__imp_f(%rip)

Btw, adding back __declspec(dllimport) for all C functions in the OCaml runtime (that need to be accessed from .dll) would also solve the problem (in a less general way), right?

@dra27
Copy link
Member Author

dra27 commented Feb 14, 2018

That change still doesn't deal with the caml_reify_bytecode issue (I'm wondering if there's some dodgy pointer arithmetic somewhere...)

@dra27 dra27 force-pushed the fix-RELOC_REL32 branch 6 times, most recently from f47e73a to 5d79979 Compare February 15, 2018 11:55
@dra27
Copy link
Member Author

dra27 commented Feb 15, 2018

OK, I think this is now functionally complete. d0e07c2 includes building ocamldoc for the two OCaml tests which is a good idea in general, and should go in a separate PR. ocamldoc is a useful bytecode test for flexdll, because it uses both unix and str.

a84f9fb artificially places dllunix and dllcamlstr at base addresses which are both more than 2GiB from ocamlrun and also more than 2GiB from each other. I don't think we should commit with this: there are two better tests:

  1. Include a test in the flexdll testsuite which tests plugins on x64 which at distant base addresses and at close base addresses (for mingw64 and msvc64 - close base addresses are irrelevant on cygwin64, because they can't happen)
  2. Add an AppVeyor test to build OCaml on Cygwin64 ... but this would be easier if the Cygwin build also permitted bootstrapping flexdll (which I'm now working on)

dra27 added a commit to dra27/opam that referenced this pull request Jul 21, 2018
Until ocaml/flexdll#52 is resolved,
Cygwin64 is essentially a broken platform for OCaml.
dra27 added a commit to dra27/opam that referenced this pull request Jul 23, 2018
Until ocaml/flexdll#52 is resolved,
Cygwin64 is essentially a broken platform for OCaml.
rjbou pushed a commit to rjbou/opam that referenced this pull request Aug 9, 2018
Until ocaml/flexdll#52 is resolved,
Cygwin64 is essentially a broken platform for OCaml.
rjbou pushed a commit to rjbou/opam that referenced this pull request Aug 10, 2018
Until ocaml/flexdll#52 is resolved,
Cygwin64 is essentially a broken platform for OCaml.
rjbou pushed a commit to rjbou/opam that referenced this pull request Sep 25, 2018
Until ocaml/flexdll#52 is resolved,
Cygwin64 is essentially a broken platform for OCaml.
@dra27
Copy link
Member Author

dra27 commented Mar 2, 2021

I'm going to close, on the basis that I no longer think we should fix it. There are three reasons for this:

  • This PR pulls even more of the work of the loader into the FlexDLL runtime, when the linker already has a solution for this: __declspec(dllimport) tells the linker to ensure that this situation can't arise.
  • OCaml 4.12 now restores the use of __declspec(dllimport) to fix Cygwin64. The only change needed in flexlink was to stop setting an illegal base address for Cygwin64 DLLs.
  • These re-writing tricks simply can't be done for ARM64 (too many relocatable instructions), so the need for __declspec(dllimport) is definitely here to stay.

@dra27 dra27 closed this Mar 2, 2021
Previously:
  - -base applied to MSVC64
  - Cygwin64 had -base 0x10000 hard-coded until ocaml#89
  - mingw64 didn't specify -base at all

Now:
  - The default is not to change the base address at all
  - All 3 64-bit ports support -base
  - For now, still ignored for 32-bit ports
An additional executable section is added to each executable containing 16
bytes for each external symbol. This can then be used by
flexdll_relocate_v2 if a RELOC_REL32 refers to a function more than 2GiB
away.
@dra27 dra27 reopened this Dec 29, 2024
@dra27 dra27 linked an issue Dec 29, 2024 that may be closed by this pull request
dra27 added 11 commits December 30, 2024 13:57
relocate function now takes a pointer to the jmptbl page allocated in the
image being loaded. If a RELOC_REL32 relocation is encountered where the
offset is greater than 2GiB then 16 bytes of the page is used to generate
a trampoline to the function.

The new functionality is exposed as flexdll_relocate_v2 and passed in the
FLEXDLL_RELOCATE_V2 environment variable. The old flexdll_relocate pointer
is still passed in FLEXDLL_RELOCATE by flexdll_dlopen meaning that an
executable compiled with this new version can still load DLLs compiled
with older versions (these will simply fail if the relocations are too far
away, as before). Similarly, flexdll_init falls back to FLEXDLL_RELOCATE
if FLEXDLL_RELOCATE_V2 is not set meaning that a DLL compiled with this
new version can also be loaded by an executable compiled with older
versions.

The symbol table now has to be copied to be augmented with an extra
pointer for the trampoline, as the symtbl itself cannot be changed without
breaking this backwards compatibility.
Verify that the page containing the target symbol is marked executable
before creating a trampoline.
Each unit has enough jmptbl space to be able to have trampolines for all
of the symbols it imports. However, if two units load within 2GiB of each,
the previous trampolines can be reused. Turns out that this is simpler
than either resetting or storing multiple trampolines.
The wrong DLLs can end up being loaded while building OCaml :-/
Need some kind of check on the length of ptr->name
(cf. cannot_resolve_msg)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants