Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xvnc crashes with SIGBUS on cross-GPU DRI usage #1772

Open
CendioOssman opened this issue Jun 21, 2024 · 5 comments
Open

Xvnc crashes with SIGBUS on cross-GPU DRI usage #1772

CendioOssman opened this issue Jun 21, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@CendioOssman
Copy link
Member

Describe the bug
If I start Xvnc with -renderNode set to my integrated AMD GPU, and then start an application using my discrete Nvidia GPU, then Xvnc will crash with SIGBUS:

(EE) 
(EE) Backtrace:
(EE) 0: Xvnc (xorg_backtrace+0x82) [0x557530197d42]
(EE) 1: Xvnc (0x55752ffe1000+0x1b7f4c) [0x557530198f4c]
(EE) 2: /lib64/libc.so.6 (0x7f475db30000+0x40710) [0x7f475db70710]
(EE) 3: /lib64/libpixman-1.so.0 (0x7f475e151000+0x8a2d0) [0x7f475e1db2d0]
(EE) 4: /lib64/libpixman-1.so.0 (pixman_blt+0x81) [0x7f475e15f8d1]
(EE) 5: Xvnc (vncDRI3SyncPixmapFromGPU+0x10e) [0x55753004303e]
(EE) 6: Xvnc (0x55752ffe1000+0x622c3) [0x5575300432c3]
(EE) 7: Xvnc (dri3_pixmap_from_fds+0xcf) [0x5575300cfdaf]
(EE) 8: Xvnc (0x55752ffe1000+0xf1309) [0x5575300d2309]
(EE) 9: Xvnc (Dispatch+0x426) [0x557530133f56]
(EE) 10: Xvnc (dix_main+0x46a) [0x557530142d4a]
(EE) 11: /lib64/libc.so.6 (0x7f475db30000+0x2a088) [0x7f475db5a088]
(EE) 12: /lib64/libc.so.6 (__libc_start_main+0x8b) [0x7f475db5a14b]
(EE) 13: Xvnc (_start+0x25) [0x55753003ed75]
(EE) 
(EE) Bus error at address 0x7f4753011000
(EE) 
Fatal server error:
(EE) Caught signal 7 (Bus error). Server aborting
(EE) 

To Reproduce
Steps to reproduce the behavior:

  1. Xvnc -renderNode /dev/dri/renderD128 :2 (assuming renderD128 is the AMD iGPU)
  2. DISPLAY=:2 vkcube --gpu-number 1 (assuming GPU 1 is the Nvidia dGPU)

Expected behavior
vkcube renders perfectly normal on the Xvnc display.

Client (please complete the following information):
No client needed.

Server (please complete the following information):

  • OS: Fedora 40
  • VNC server: TigerVNC
  • VNC server version: 1.14.0 beta
  • Server downloaded from: Built from contrib spec file
  • Server was started using: See above

Additional context
Also crashes with an Intel ARC discrete GPU instead of the Nvidia one.

Does not crash if Xvnc is started with the discrete GPU and the application uses the integrated GPU. Possible bug in AMD driver?

@CendioOssman
Copy link
Member Author

More details available in this thread:

https://lists.freedesktop.org/archives/mesa-dev/2024-June/226245.html

@CendioOssman CendioOssman added the bug Something isn't working label Jun 21, 2024
@CendioHalim
Copy link
Contributor

A bug has been reported to the kernel: https://bugzilla.kernel.org/show_bug.cgi?id=218993

@CendioHalim
Copy link
Contributor

@dcommander
Copy link
Contributor

I observe a bus error when attempting to start a VMware virtual machine with 3D acceleration. VMware uses Vulkan, and the failure seems to occur at exactly the same place as the failure described in this issue. (The symptoms are identical when I start a VMware virtual machine with 3D acceleration vs. when I run vkcube --gpu_number 1.) Symptomatically, a pixmap is allocated from a file descriptor, and a buffer object is successfully imported. However, when attempting to synchronize the buffer object and the pixmap, the pointer obtained from gbm_bo_map() appears to be invalid, so the pixel copy crashes.

@dcommander
Copy link
Contributor

It does appear to be the same issue. If I set VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json to force VMware to use the AMD Vulkan driver, then all is well.

dcommander added a commit to TurboVNC/turbovnc that referenced this issue Jul 12, 2024
(based on the implementation in TigerVNC 1.14 beta)

- Synchronize pixels between DRI3 pixmaps and their corresponding GBM
  buffer objects on an as-needed basis, in response to specific X11
  operations rather than on a schedule.

- Implement the simpler DRI3 v1 interface rather than DRI3 v2.  This
  avoids the need to implement the get_formats(), get_modifiers(), and
  get_drawable_modifiers() methods.

- Use Pixman (which is SIMD-accelerated) to synchronize pixels.

- Hook the DestroyPixmap() screen method to clean up a pixmap's
  corresponding GBM buffer object if there are no more references to the
  pixmap.

- Hook the CloseScreen() screen method to clean up the GBM device and
  close the DRM render node.

To do:

- Synchronize only the pixels that have changed.

Known issues:

TigerVNC/tigervnc#1772
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants