Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SWI-Prolog install from Termux repositories don't work on arm 32 bit devices: ABI mismatch #22737

Open
Rafael-Dev-21 opened this issue Jan 2, 2025 · 7 comments · May be fixed by #22742
Labels
arch-arm Issue reproducible on packages compiled for ARM bug report Something is not working properly

Comments

@Rafael-Dev-21
Copy link

Rafael-Dev-21 commented Jan 2, 2025

Problem description

Package swi-prolog, as installed from termux repositories, don't work on arm 32 bit devices. For now I'm using proot-distro on debian, but it's cumbersome and uses to much storage.

What steps will reproduce the bug?

$ apt update && apt upgrade -y
[omitted for brevity]
$ pkg install swi-prolog
[omitted for brevity]
$ swipl
FATAL: could not find SWI-Prolog home
  Tried source: environment $SWI_HOME_DIR or $SWIPL
  Tried source: using "swipl.home" from "/data/data/com.termux/files/usr/lib/swipl/bin/arm-android/swipl"
    Found /data/data/com.termux/files/usr/lib/swipl: ABI mismatch
  Tried source: compiled in
    Found /data/data/com.termux/files/usr/lib/swipl: ABI mismatch

What is the expected behavior?

swipl should open the interactive repl, or consult a file passed by command line.

System information

Termux Variables:
TERMUX_API_VERSION=0.50.1
TERMUX_APK_RELEASE=F_DROID
TERMUX_APP_PACKAGE_MANAGER=apt
TERMUX_APP_PID=5853
TERMUX_IS_DEBUGGABLE_BUILD=0
TERMUX_MAIN_PACKAGE_FORMAT=debian
TERMUX_VERSION=0.118.1
TERMUX__USER_ID=0
Packages CPU architecture:
arm
Subscribed repositories:
# sources.list
deb https://mirrors.aliyun.com/termux/termux-main stable main
Updatable packages:
All packages up to date
termux-tools version:
1.44.6
Android version:
11
Kernel build information:
Linux localhost 4.19.127 #1 SMP PREEMPT Tue Apr 4 18:44:37 IST 2023 armv7l Android
Device manufacturer:
LGE
Device model:
LM-K410
LD Variables:
LD_LIBRARY_PATH=
LD_PRELOAD=/data/data/com.termux/files/usr/lib/libtermux-exec.so
Installed termux plugins:
com.termux.api versionCode:51
com.termux.styling versionCode:1000
com.termux.widget versionCode:13
@Rafael-Dev-21 Rafael-Dev-21 added bug report Something is not working properly untriaged labels Jan 2, 2025
@robertkirkman
Copy link
Contributor

This is a pretty confusing problem, but since I can reproduce it, I have noticed that disabling the check that prints that message results in the software continuing to run and seemingly providing, at the very least, some subset of the normal functionality, despite the error.

I have written this patch that converts the error into a warning, so that we can see the problem at a slightly deeper level, and possibly find the root cause of the problem, find any further errors that currently occur when using an swi-prolog patched this way on 32-bit ARM, or both.

This patch can be applied and tested by putting it in the folder termux-packages/packages/swi-prolog/ and recompiling swi-prolog using the Docker container, using the command scripts/run-docker.sh ./build-package.sh -I -f -d -a arm swi-prolog, and if you do not have access to an amd64 Docker environment, but still want to test this type of change, let me know and I could do something like, open a draft PR with this patch in it, and allow it to build in CI here, so that you could download the artifact and attempt to use it to, for example, check whether using this change allows you to run the Prolog software that you are trying to use swi-prolog to run, or if further errors appear as a result of continued abnormal behavior of the swi-prolog package, that still prevent you from running Prolog software successfully.

swi-prolog cannot currently continue in Termux on 32-bit ARM architecture without this patch.
the root cause of the problem is not yet fully understood.
--- a/src/pl-init.c
+++ b/src/pl-init.c
@@ -256,11 +256,21 @@ check_home(const char *dir)
   Ssnprintf(abi_file_name, sizeof(abi_file_name),
 	    "%s/ABI", dir);
   if ( (fd = Sopen_file(abi_file_name, "r")) )
-  { char *abi_string = Sfgets(abi_buf, sizeof(abi_buf), fd);
+  { char *build_time_abi_string = Sfgets(abi_buf, sizeof(abi_buf), fd);
     Sclose(fd);
-    if ( abi_string )
-    { remove_trailing_whitespace(abi_string);
-      return match_abi_version(abi_version(), abi_string);
+    if ( build_time_abi_string )
+    { remove_trailing_whitespace(build_time_abi_string);
+      char *run_time_abi_string = abi_version();
+      if (match_abi_version(run_time_abi_string, build_time_abi_string) == 1)
+      { return true;
+      } else
+      { printf("WARNING: ABI mismatch!\n");
+        printf("build time ABI found in %s: %s\n", abi_file_name, build_time_abi_string);
+        printf("run time ABI returned by abi_version(): %s\n", run_time_abi_string);
+        printf("attempting to continue for workaround purposes...\n");
+        // returning true to force the error into a warning
+        return true;
+      }
     } else
     { return BAD_HOME_BAD_ABI;
     }

On my 32-bit Termux device, doing that results in this:

~ $ swipl
WARNING: ABI mismatch!
build time ABI found in /data/data/com.termux/files/usr/lib/swipl/ABI: swipl-abi-2-68-130c67ba-6ed28fea
run time ABI returned by abi_version(): swipl-abi-2-68-75f769d1-6ed28fea
attempting to continue for workaround purposes...
Welcome to SWI-Prolog (threaded, 32 bits, version 9.3.17)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- 

If I have time, I will try to continue troubleshooting this to look for additional leads on a root cause, for example, I would like to test a similar release of SWI-Prolog on a 32-bit ARM GNU/Linux device, to check whether the problem is specifically related to 32-bit ARM on bionic libc systems and/or using the termux-packages build.sh of swi-prolog , or if the problem also appears on GNU/Linux systems.

@Rafael-Dev-21
Copy link
Author

Update:

Searching cryptic forums, I found a solution.

The ABI file of the swi-prolog package on the termux repos for arm as the wrong ABI.

# correct
swipl-abi-2-68-5ae07055-6ed28fea
# on package
swipl-abi-2-68-130c67ba-6ed28fea

The correct ABI is returned by the --abi-version of swipl.

Replacing the ABI manually worked.

The real fix should be correct the packages.

Screenshot_20250102-052931

@robertkirkman
Copy link
Contributor

robertkirkman commented Jan 2, 2025

That is very interesting, since on my 32-bit ARM Termux device, the value printed by swipl --abi-version is "swipl-abi-2-68-75f769d1-6ed28fea" , not swipl-abi-2-68-5ae07055-6ed28fea, however, on my 64-bit ARM Termux devices, the value returned is swipl-abi-2-68-5ae07055-6ed28fea, so it seems like there is some nebulous factor involved, which can change the value returned slightly even between two different 32-bit ARM devices, making it hard to say whether hardcoding this value into the ABI text file is a viable solution for the problem, without understanding the problem at a deeper level.

Edit: I have noticed that this value printed by swipl --abi-version changes after recompiling swi-prolog, so if I use the same build of the package that you are using, I do see "swipl-abi-2-68-5ae07055-6ed28fea". This means that to create a complete solution, it is probably necessary to find or create some way to accurately predict the exact value that the command will print when run, without actually running the command, then store that value into the "ABI" file that will be stored inside the package .deb file, all within the Docker container before publishing the package.

This implies that this issue could probably be categorized as a cross-compilation-related issue, and therefore it might be helpful if I can find somewhere, other examples of cross-compiled swi-prolog packages, to check how others usually work around this problem, if it turns out that cross-compilation does trigger this.

It is true there are at least a few packages, like pypy3, that do use a method involving bionic-host and qemu-user-static to in fact force the resulting binaries to run during the build inside of the actual x86 Ubuntu Docker container, but I think it is reasonable for me to say that comes across as the last-resort "nuke from orbit" brute force solution, that can be used if a shorter solution, with fewer dependencies, cannot be easily created.

@truboxl truboxl added arch-arm Issue reproducible on packages compiled for ARM and removed untriaged labels Jan 2, 2025
@Rafael-Dev-21
Copy link
Author

TL;DR: MurmurHashAligned2 is broken on big endian machines, producing different hashes according to alignment. But this all could well be a red herring.

Well, this is all very weird. Strange.

Here's my findings, I don't know if they will be of help or if you already know them.

The api is determined by 5 values, but 4 of them are practically constant, except for one, the penultimate string, which is a global variable, GD->foreign.signature. Searching for the repo, it appears to be only modified inside a loop on pl-ext.c, line 330. The loop loops for every extension until there's no more predicates, then if it's signonly and the extensions are not loaded, it is xor-ed with the result of the call predicate_signature(f->predicate_name, f->arity, flags). predicate_signature is defined on the same file on line 199. It prints the three paramaters on a formatted string ("%s/%zu/0x%" appended to a constant PRIx64), And calls a hash function on them (MurmurHashAligned2), together with the seed 0x1a3be34a. PRIx64 indicates a 64 bit hexadecimal. MurmurHashAligned2 is defined on pl-hash.c, according to the documentation comment above it, it's broken on big endian machines, producing different hashes according to the alignment. I think I found something…

@robertkirkman
Copy link
Contributor

MurmurHashAligned2 is broken on big endian machines

Typically, in the context of UNIX-like operating systems, it can usually be assumed that the system being "big endian" is not a problematic factor in most cases with mainstream, consumer devices. For example, sometimes when there is a PowerPC-based or z/Architecture-based device, the developer needs to start thinking in a "big endian compatibility" context, in order to attempt to write code that is fully portable between little-endian devices, and those devices which might be truly "big endian".

On the other hand, in this particular context, it can be entirely assumed that all supported platforms are little-endian only, specifically because Android does not support PowerPC, or z/Architecture, and is in fact explicitly defined as supporting only little-endian devices, in the documentation here, where it is stated "Android is always little-endian." For this reason, it is more or less acceptable from a practicality and scope-of-support perspective, to write code for Termux with the assumption that it will never be exposed to a truly "big endian" system (outside of full system emulators like QEMU), and when troubleshooting bugs in Termux, it can be reasonably assumed that "big endian compatibility" problems are highly unlikely to be the direct cause of most bugs that do not involve full system emulators of true "big endian" devices.

@Rafael-Dev-21
Copy link
Author

Rafael-Dev-21 commented Jan 3, 2025

Yeah, makes sense. But that function in particular is the maximum I could narrow down in my exploration of the source code. I'm not familiarized enough with this area of development. EDIT: and it's exactly processing at the part of ABI string that changes. It's the one that is defined by a global variable instead of a constant, and I could find only one place where it could change.

robertkirkman added a commit to robertkirkman/termux-packages that referenced this issue Jan 3, 2025
one to obtain a working string for the "ABI" file, and the second one to produce artifacts for everything else that is not the "ABI" file.

fixes termux#22737
@robertkirkman
Copy link
Contributor

I thought of another way to work around the problem that does involve micromanaging the contents of the $PREFIX/lib/swipl/ABI file, as opposed to patching the code to ignore that file, and I posted it in that PR, but I'm not sure which way would objectively be the better workaround, or if there is possibly an even better or less invasive way to do this than either of these two ways.

robertkirkman added a commit to robertkirkman/termux-packages that referenced this issue Jan 3, 2025
robertkirkman added a commit to robertkirkman/termux-packages that referenced this issue Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm Issue reproducible on packages compiled for ARM bug report Something is not working properly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants