Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux ARM packages are not compiled #52

Open
SimonRit opened this issue Jan 13, 2023 · 8 comments
Open

Linux ARM packages are not compiled #52

SimonRit opened this issue Jan 13, 2023 · 8 comments

Comments

@SimonRit
Copy link
Collaborator

ARM Linux modules fail (silently, which my be linked to #38). See RTK test e.g. here. The error message is

Building wheels for aarch64 using manylinux_2_28
+ sudo ldconfig
/opt/rh/gcc-toolset-11/root/usr/bin/sudo: line 41: /usr/bin/sudo: No such file or directory
Cleaning up artifacts from module build

and in the Publish Python package as GitHub Artifact:

Run actions/upload-artifact@v3
  with:
    name: LinuxWheel38
    path: dist/*.whl
    if-no-files-found: warn
Warning: No files were found with the provided path: dist/*.whl. No artifacts will be uploaded.

I probably did something wrong but that's not obvious to me what...

@tbirdso
Copy link
Collaborator

tbirdso commented Jan 13, 2023

Hi @SimonRit , it looks like the GitHub runner encountered a connection issue while trying to install sudo on the aarch64 image during the run:

Status: Downloaded newer image for quay.io/pypa/manylinux_2_28_aarch64:2022-11-19-1b19e81
WARNING: The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64) and no specific platform was requested
AlmaLinux 8 - BaseOS                             54 kB/s | 296 kB     00:05    
Errors during downloading metadata for repository 'baseos':
  - Status code: 404 for http://mirrors.cat.pdx.edu/alma/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 131.252.208.20)
  - Status code: 404 for http://westus2.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 20.83.88.250)
  - Status code: 404 for http://westus2.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 20.83.88.250)
  - Status code: 404 for http://eastus.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 20.81.65.177)
  - Status code: 404 for http://cvo.almalinux.osuosl.org/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 140.211.166.134)
  - Status code: 404 for http://eastus.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 20.81.65.177)
  - Status code: 404 for http://mirrors.cat.pdx.edu/alma/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 131.252.208.20)
  - Status code: 404 for http://westus2.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 20.83.88.[250](https://github.com/RTKConsortium/RTK/actions/runs/3910081483/jobs/6681868066#step:5:251))
  - Status code: 404 for http://mirrors.cat.pdx.edu/alma/8.7/BaseOS/aarch64/os/repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 131.252.208.20)
  - Status code: 404 for http://eastus.azure.repo.almalinux.org/almalinux/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 20.81.65.177)
  - Status code: 404 for http://dfw.mirror.rackspace.com/almalinux/8.7/BaseOS/aarch64/os/repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz (IP: 74.205.112.120)
  - Status code: 404 for http://cvo.almalinux.osuosl.org/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 140.211.166.134)
  - Status code: 404 for http://dfw.mirror.rackspace.com/almalinux/8.7/BaseOS/aarch64/os/repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz (IP: 74.205.112.120)
  - Status code: 404 for http://cvo.almalinux.osuosl.org/8.7/BaseOS/aarch64/os/repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 140.211.166.134)
  - Status code: 404 for http://dfw.mirror.rackspace.com/almalinux/8.7/BaseOS/aarch64/os/repodata/39fb15d[259](https://github.com/RTKConsortium/RTK/actions/runs/3910081483/jobs/6681868066#step:5:260)36ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz (IP: 74.205.112.120)
Error: Failed to download metadata for repo 'baseos': Yum repo downloading error: Downloading error(s): repodata/92157d88cc011d93db8ee5c5f73c3983de90180863e3111794ac0e7cacbb9785-primary.xml.gz - Cannot download, all mirrors were already tried without success; repodata/d43f6d3a153ef11cc11bc936e10ec1aee97c0b3272921b37305c608dac295780-filelists.xml.gz - Cannot download, all mirrors were already tried without success; repodata/39fb15d25936ba34e0318ec264407348094fd096a46d95df80772cd14c6264bc-updateinfo.xml.gz - Cannot download, all mirrors were already tried without success

Would you please try re-running the job and see if the build succeeds on retry?

I agree that a silent failure is not ideal. As part of #38 we should add a sanity check to at minimum verify that a wheel is generated in dist/.

@SimonRit
Copy link
Collaborator Author

Sure, I have launched it again...

@SimonRit
Copy link
Collaborator Author

Same result, https://github.com/RTKConsortium/RTK/actions/runs/3910081483/jobs/6691993707. But for a different reason it seems

2023-01-14T01:34:44.1371139Z c++: fatal error: Killed signal terminated program lto1
2023-01-14T01:34:44.1371471Z compilation terminated.
2023-01-14T01:34:44.1382174Z lto-wrapper: fatal error: /opt/rh/gcc-toolset-11/root/usr/bin/c++ returned 1 exit status
2023-01-14T01:34:44.1386395Z compilation terminated.
2023-01-14T01:34:44.1386924Z /opt/rh/gcc-toolset-11/root/usr/bin/ld: error: lto-wrapper failed
2023-01-14T01:34:44.1389567Z collect2: error: ld returned 1 exit status

No clue why. I'll disable ARM for the time being and keep that for later...

@SimonRit
Copy link
Collaborator Author

I have opened a new PR with the same result.

@tbirdso
Copy link
Collaborator

tbirdso commented Jan 16, 2023

@SimonRit Thanks for re-running to reproduce the error. Unfortunately I am not familiar with the lto-wrapper issue. It seems that something is going wrong with LinkTimeOptimization (lto) but the error message doesn't leave us much to go on. Perhaps @thewtex or @jcfr might have additional thoughts here?

It would be helpful if you could attempt the following:

  1. Try configuring with CMAKE_VERBOSE_MAKEFILE:BOOL=ON to see if you can get any more details on the failure;
  2. Try stepping through the build procedure in ITKRemoteModuleBuildTestPackageAction on your local system to reproduce;
  3. The GitHub Actions approach uses ARM emulation on an x64 machine, so try building on an ARM machine and see whether the error persists. If you don't have an ARM machine readily available I've had a good experience with ARM instances on AWS EC2.

Additional notes:

  1. It looks like the Python 3.8 ARM build timed out at the GitHub runner limit of 6 hours before it could reach the lto-wrapper failure. If RTK ARM builds are consistently approaching the timeout limit then it might be worthwhile to investigate self-hosting for a faster build, preferably on an ARM machine.
  2. I will follow up in BUG: Linux internal build failure is silent #38 regarding the silent failure.

@SimonRit
Copy link
Collaborator Author

Thanks a lot for the suggestions. I have pushed a commit for 1. I might try the rest later on but this does not have the highest priority on my side... And I don't have an ARM machine and I have never used AWS so that might not be easy. To be continued...

@thewtex
Copy link
Member

thewtex commented Jan 18, 2023

Link Time Optimization is memory heavy, so we may be running out of memory. LTO can be disabled, so that may be one option for some modules.

@tbirdso
Copy link
Collaborator

tbirdso commented Jan 19, 2023

@SimonRit silent failures are now addressed in 1b946ca, please update the next time you push changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants