Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to do inode lock with lk-owner*** on any subvolume while attempting WRITE on gfid: [Transport endpoint is not connected] #4422

Open
xyz5578 opened this issue Oct 14, 2024 · 0 comments
Assignees

Comments

@xyz5578
Copy link

xyz5578 commented Oct 14, 2024

Description of problem:
1.There is a three-way replicated environment.I couldn't write to a specified file on two of the three nodes, but all the bricks worked fine.And all files except this one can be read and written normally from all nodes.
2.The file in problem is a qemu disk, on the two nodes where I can't write to it, I can't even read basic information through "qemu-img info", it waits for dozens of minutes before it displays the content.
3.I can be sure that the file read/write exception occurred at the same time on the two nodes.
The exact command to reproduce the issue:
There's no way to reproduce it.

The full output of the command that failed:

Expected results:
This qemu disk can be read and written on all three nodes.

Mandatory info:
- The output of the gluster volume info command:
All ok.
- The output of the gluster volume status command:
All ok.
- The output of the gluster volume heal command:
All ok.
**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/
Every time I try to write this file, the volume log on both problem nodes will report this error:
W [MSGID: 108019] [afr-lk-common.c:262:afr_log_locks_failure] 0-engine-replicate-0: Unable to do inode lock with lk-owner:d876d81a6e550000 on any subvolume while attempting WRITE on gfid:749becc7-c6b8-4a0b-8aff-68819af15c85. [Transport endpoint is not connected]

After I executed "gluster volume stop/start", reading and writing to the file returned to normal on all three nodes.
During the vol restart, multiple logs of "releasing lock on 749becc7-c6b8-4a0b-8aff-68819af15c85" appear on all three nodes:
[2024-10-10 07:16:21.929053 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-engine-server: releasing lock on 749becc7-c6b8-4a0b-8aff-68819af15c85 held by {client=0x55e4ad758888, pid=92863 lk-owner=3895e9e7ba550000}
[2024-10-10 07:16:21.929063 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-engine-server: releasing lock on 749becc7-c6b8-4a0b-8aff-68819af15c85 held by {client=0x55e4ad758888, pid=92863 lk-owner=3895e9e7ba550000}
[2024-10-10 07:16:21.930042 +0000] W [inodelk.c:617:pl_inodelk_log_cleanup] 0-engine-server: releasing lock on 749becc7-c6b8-4a0b-8aff-68819af15c85 held by {client=0x55e4ad758978, pid=43623 lk-owner=4839ce1a27560000}

These logs are on the two nodes in problem and not on the other normal node:
[2024-10-10 07:14:01.800654 +0000] E [inodelk.c:504:__inode_unlock_lock] 0-engine-locks: Matching lock not found for unlock 0-9223372036854775807, by 1870012b76550000 on 0x55e4ad7585b8 for gfid:749becc7-c6b8-4a0b-8aff-68819af15c85

**- Is there any crash ? Provide the backtrace and coredump
No

Additional info:

- The operating system / glusterfs version:
glusterfs 10.0

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants