Truncated core file when COMP_COMPRESSION is set to "true" #165

amikugup · 2024-10-16T14:13:25Z

We are observing a strange issue with IBM core dump handler. Actually, we are getting truncated core file when COMP_COMPRESSION flag is set to "true". gdb is complaining about the truncated file and core file size is close to 900 MB while gdb is expecting a core file size of 3 GB.

We didn't see any such issue when we turned off the compression. we got a full core file and gdb is also happy.
Is this a known issue with the compression flag?

pereyra-m · 2024-11-05T04:05:41Z

Hi.

I'm getting problems too with big dumps, they can't be read with gdb.
I'll try without compression.

No9 · 2024-11-06T01:09:26Z

Let me know how you get on
We use the zip crate and just use 'COMP_COMPRESSION' as a flag so I'd say it's likely a bug in that crate.

core-dump-handler/core-dump-composer/src/main.rs

Line 118 in 327ed2e

zip::CompressionMethod::Deflated

Looks like a lot has been added to zip as it's now on version 2.2.0 so a PR with a bump would be appreciated.

Thanks

pereyra-m · 2024-11-07T00:42:24Z

Hi again.

We were using the 8.6.0 version, and even when the flag was set to "false", the dumps were uploaded compressed and the big ones were corrupt.
The release notes show that this was solved in recent versions, so we upgraded to 8.10.0 and now it's working.

amikugup · 2024-11-07T05:19:54Z

We are already using v8.10.0 but that doesn't solve the problem and we still need to disable the compression to make this work. @pereyra-m this would be helpful if you can share more details regarding the core file size that you have tried and configurable value you are using.

pereyra-m · 2024-11-07T13:11:49Z

We haven't any special configuration, and we noticed the corruptions when the dumps were higher than 1GB more or less.
The error was something like

Failed to read a valid object file image from memory.

Maybe in your case is something else.

amikugup · 2024-12-02T17:01:57Z

We are seeing this issue without compression as well in certain scenario.
Is this issue get fixed in latest release? Any thoughts regarding this IBCH Team?

No9 · 2024-12-02T23:16:40Z

Can you set the composer log level to Debug
See: https://github.com/IBM/core-dump-handler/blob/main/charts/core-dump-handler/values.yaml#L28

logLevel: "Debug"

And once the issue arises provide the output of cat /var/mnt/core-dump-handler/composer.log from an agent on a node that has collected a core dump.
N.B. The location of composer.log will depend on your mountpoint settings if you have overridden it.

Thanks

connectrajeev · 2024-12-05T11:06:47Z

Hello IBM-CDH team,

This reponse is on behalf of @amikugup, here are the requested debug logs. We have deleted certain logs from the file as we felt that has our setup and proprietary details. Do let us know if there is some relevant info removed by us from the composer.log

composer.log

During further debugging of this issue, we concluded that it might not be related to the IBM-CDH. Instead, the problem seems to related to Linux pipe. The Linux kernel writes core-dump data very fast and the CDC is unable to consume it at the same speed, causing the Linux pipe to overflow. As a result, the CDC misses reading a portion of the data.

We were unable to find a way to increase the Linux pipe size since it appears to be a read-only parameter according to ulimit. If you know of any method to increase the Linux pipe size, please share it with us. We would like to try it and see if it helps prevent the core file from being truncated.
Thanks.

No9 · 2024-12-06T19:41:05Z

Hi @connectrajeev

I agree it's likely the issue is upstream as I am not seeing a Error writing core file message in the compose log.

Can you confirm the following please:
Host Operating System with version number
The file is getting consistently truncated at a certain size (e.g. 900 MB as @amikugup originally stated)

I don't think it's the page size as I would expect the OS to block until its read but I would need to read the kernel core dump code to confirm.

connectrajeev · 2024-12-07T06:19:57Z

Hello @No9,

Thanks for your response!
The core-file size of our application is somewhere between 5GB to 10GB, and we have noticed only under some specific testcases of our application core-file is truncated, if we manually send any core signal to our application and try to generate the core-file it doesn't get truncated.

Host OS: "Oracle Linux Server 9.3"
What are the possibilities for a core-file to truncate in case if there are some pending signals at the application to process and at the same time application receives the SIGSEGV?

No9 · 2024-12-07T11:58:02Z

OK This is progress - I think the core dump will take precedence over all signals.
Looking at the code for the core dump in the kernel
https://github.com/torvalds/linux/blob/18bf34080c4c3beb6699181986cc97dd712498fe/fs/coredump.c#L567
I would suggest using dmesg and look for the KERNEL_WARNING errors to make sure the above assumption is true.

I've not tested this on Oracle Linux Server at all and don't have access to one with a k8s config so I can only suggest ideas at this stage.
Does the host have systemd-core dump installed and running? systemctl list-units | grep core

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncated core file when COMP_COMPRESSION is set to "true" #165

Truncated core file when COMP_COMPRESSION is set to "true" #165

amikugup commented Oct 16, 2024

pereyra-m commented Nov 5, 2024

No9 commented Nov 6, 2024

pereyra-m commented Nov 7, 2024

amikugup commented Nov 7, 2024

pereyra-m commented Nov 7, 2024

amikugup commented Dec 2, 2024

No9 commented Dec 2, 2024 •

edited

Loading

connectrajeev commented Dec 5, 2024

No9 commented Dec 6, 2024

connectrajeev commented Dec 7, 2024

No9 commented Dec 7, 2024

Truncated core file when COMP_COMPRESSION is set to "true" #165

Truncated core file when COMP_COMPRESSION is set to "true" #165

Comments

amikugup commented Oct 16, 2024

pereyra-m commented Nov 5, 2024

No9 commented Nov 6, 2024

pereyra-m commented Nov 7, 2024

amikugup commented Nov 7, 2024

pereyra-m commented Nov 7, 2024

amikugup commented Dec 2, 2024

No9 commented Dec 2, 2024 • edited Loading

connectrajeev commented Dec 5, 2024

No9 commented Dec 6, 2024

connectrajeev commented Dec 7, 2024

No9 commented Dec 7, 2024

No9 commented Dec 2, 2024 •

edited

Loading