Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"panic: runtime error: invalid memory address" in release 1.25.0 #12403

Closed
thomasmyn opened this issue Dec 15, 2022 · 7 comments · Fixed by #12494
Closed

"panic: runtime error: invalid memory address" in release 1.25.0 #12403

thomasmyn opened this issue Dec 15, 2022 · 7 comments · Fixed by #12494
Assignees
Labels
bug unexpected problem or unintended behavior panic issue that results in panics from Telegraf upstream bug or issues that rely on dependency fixes

Comments

@thomasmyn
Copy link

Relevant telegraf.conf

# Telegraf configuration

[global_tags]
    group = "mailcluster"

# Configuration for telegraf agent
[agent]
    interval = "60s"
    debug = false
    hostname = "XXX"
    round_interval = true
    flush_interval = "60s"
    flush_jitter = "0s"
    collection_jitter = "0s"
    metric_batch_size = 1000
    metric_buffer_limit = 10000
    quiet = false
    logfile = ""
    omit_hostname = false

###############################################################################
#                                  OUTPUTS                                    #
###############################################################################

[[outputs.influxdb]]
    urls = ["https://XXX:8086"]
    database = "mon"
    username = "xxx"
    password = "XXX"
    timeout = "10s"

###############################################################################
#                                  INPUTS                                     #
###############################################################################

[[inputs.cpu]]
    percpu = true
[[inputs.disk]]
    ignore_fs = ["tmpfs", "devtmpfs"]
[[inputs.io]]
[[inputs.mem]]
[[inputs.net]]
[[inputs.system]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.netstat]]

Logs from Telegraf

Dec 15 11:17:02 mail1 systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 15 11:17:02 mail1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
Dec 15 11:17:02 mail1 systemd[1]: Failed to start Telegraf.
Dec 15 11:17:02 mail1 systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 1.
Dec 15 11:17:02 mail1 systemd[1]: Stopped Telegraf.
Dec 15 11:17:02 mail1 systemd[1]: Starting Telegraf...
Dec 15 11:17:02 mail1 telegraf[2451440]: panic: runtime error: invalid memory address or nil pointer dereference
Dec 15 11:17:02 mail1 telegraf[2451440]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x98fadf]
Dec 15 11:17:02 mail1 telegraf[2451440]: goroutine 1 [running]:
Dec 15 11:17:02 mail1 telegraf[2451440]: github.com/awnumar/memguard/core.Purge.func1(0xc00018f930)
Dec 15 11:17:02 mail1 telegraf[2451440]: /go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:23 +0x3f
Dec 15 11:17:02 mail1 telegraf[2451440]: github.com/awnumar/memguard/core.Purge()
Dec 15 11:17:02 mail1 telegraf[2451440]: /go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:51 +0x25
Dec 15 11:17:02 mail1 telegraf[2451440]: github.com/awnumar/memguard/core.Panic({0x53132e0, 0xc0000c78f0})
Dec 15 11:17:02 mail1 telegraf[2451440]: /go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:85 +0x25
Dec 15 11:17:02 mail1 telegraf[2451440]: github.com/awnumar/memguard/core.NewBuffer(0x20)
Dec 15 11:17:02 mail1 telegraf[2451440]: /go/pkg/mod/github.com/awnumar/[email protected]/core/buffer.go:73 +0x2d5
Dec 15 11:17:02 mail1 telegraf[2451440]: github.com/awnumar/memguard/core.NewCoffer()
Dec 15 11:17:02 mail1 telegraf[2451440]: /go/pkg/mod/github.com/awnumar/[email protected]/core/coffer.go:30 +0x34
Dec 15 11:17:02 mail1 telegraf[2451440]: github.com/awnumar/memguard/core.init.0()
Dec 15 11:17:02 mail1 telegraf[2451440]: /go/pkg/mod/github.com/awnumar/[email protected]/core/enclave.go:15 +0x2e

System info

Telegraf 1.25, runs inside systemd-nspawn container Ubuntu 20.04

Docker

No response

Steps to reproduce

  1. Update 1.24.1 -> 1.25 from influxdb repo
  2. telegraf.service: Failed with result 'exit-code'
  3. Downgrade to 1.24.4
  4. restart
  5. telegraf runs without problems

Expected behavior

Actual behavior

Additional info

No response

@thomasmyn thomasmyn added the bug unexpected problem or unintended behavior label Dec 15, 2022
@thomasmyn thomasmyn changed the title "panic: runtime error: invalid memory address" in release 1.25 "panic: runtime error: invalid memory address" in release 1.25.0 Dec 15, 2022
@powersj
Copy link
Contributor

powersj commented Dec 15, 2022

inside systemd-nspawn container Ubuntu 20.04

Can you reproduce this outisde the container?

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x98fadf]
goroutine 1 [running]:
github.com/awnumar/memguard/core.Purge.func1(0xc00018f930)
/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:23 +0x3f

memguard is a new dependency to support secret stores. Also was that the full log? I'd like to see what line in telegraf is calling into memguard causing the panic.

Thanks

@powersj powersj added waiting for response waiting for response from contributor panic issue that results in panics from Telegraf labels Dec 15, 2022
@thomasmyn
Copy link
Author

telegraf --debug --config telegraf.conf --once

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x98fadf]

goroutine 1 [running]:
github.com/awnumar/memguard/core.Purge.func1(0xc000149930)
/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:23 +0x3f
github.com/awnumar/memguard/core.Purge()
/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:51 +0x25
github.com/awnumar/memguard/core.Panic({0x53132e0, 0xc0001198b0})
/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:85 +0x25
github.com/awnumar/memguard/core.NewBuffer(0x20)
/go/pkg/mod/github.com/awnumar/[email protected]/core/buffer.go:73 +0x2d5
github.com/awnumar/memguard/core.NewCoffer()
/go/pkg/mod/github.com/awnumar/[email protected]/core/coffer.go:30 +0x34
github.com/awnumar/memguard/core.init.0()
/go/pkg/mod/github.com/awnumar/[email protected]/core/enclave.go:15 +0x2e 

container systemd-nspawn service file

[Unit]
Description=Container mail1.example.com
Documentation=man:systemd-nspawn(1)
PartOf=machines.target
Before=machines.target
After=network.target

[Service]
ExecStart=/usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-macvlan=mvlan1 --network-macvlan=mvlan2 --machine=mail1.example.com -D /data1/machines/mail1.example.com
KillMode=mixed
Type=notify
RestartForceExitStatus=133
SuccessExitStatus=133
Delegate=yes

[Install]
WantedBy=machines.target

telegraf 1.25.0 works without problems on many other servers (no containers)

Thanks

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Dec 15, 2022
@powersj
Copy link
Contributor

powersj commented Dec 15, 2022

I have filed an upstream issue awnumar/memguard#144 about the panic when running in systemd-nspawn. I was able to reproduce outside of telegraf with only an empty import. I do not know enough about systemd-nspawn containers to say if there is a setting or additional config option that would avoid this for now. It would help if you could subscribe to that issue in case they ask for any sort of testing.

Thanks

@powersj powersj added the upstream bug or issues that rely on dependency fixes label Dec 15, 2022
@KaraRyougi
Copy link

I also got a similar panic error for the conntrack plugin:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x137d4d3]

goroutine 42 [running]:
github.com/influxdata/telegraf/plugins/inputs/conntrack.(*Conntrack).Gather(0xc000e1e150, {0x6b7cce0, 0xc0008a3060})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/conntrack/conntrack.go:109 +0x133
github.com/influxdata/telegraf/agent.(*Agent).testRunInputs.func2(0xc000dd7590)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:419 +0x2f8
created by github.com/influxdata/telegraf/agent.(*Agent).testRunInputs
	/go/src/github.com/influxdata/telegraf/agent/agent.go:388 +0xcf

@eljef
Copy link

eljef commented Dec 23, 2022

I noted in the upstream bug for memguard, adding --capability=CAP_IPC_LOCK to systemd-nspawn fixes the problem of building for myself. I tested running telegraf in a systemd-nspawn container with --capability=CAP_IPC_LOCK and did not run into the issue. I didn't do much testing outside of making sure it started up and my configuration worked.

I don't believe this should be classified as a workaround, but rather, a should be classified as a correct fix. IPC_LOCK controls locking of memory and huge page allocation. The capability is denied by default in most systemd packages. It is expected to have to enable this if an application requires mlock, mlockall, mmap, shmctl, or memfd_create. Upstream memguard should consider updating documentation to reflect.

I suspect that this will soon apply outside of systemd-nspawn containers, trailing into other container implementations (OCI compatible / docker / podman / etc...) as well as binaries running on strictly configured systems. (Very tightly configured apparmor / selinux setups.) Telegraf might run into issues with these systems unless it specifically requests the IPC_LOCK capability.

@thomasmyn
Copy link
Author

thanks @eljef

@powersj
Copy link
Contributor

powersj commented Jan 9, 2023

Hi Folks,

Sorry for the delay, still catching up on notifications post-holiday.

While I would love for the library to not panic, it sounds like the solution here is to add the --capability=CAP_IPC_LOCK option when launching.

I am going to put up a PR and update the docs with this note.

@powersj powersj self-assigned this Jan 9, 2023
powersj added a commit to powersj/telegraf that referenced this issue Jan 11, 2023
Telegraf will now panic when launched in a systemd-nspawn. This is
because of the memguard dependency. It requires the CAP_IPC_LOCK
capability to correctly lock and secure memory.

fixes: influxdata#12403
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior panic issue that results in panics from Telegraf upstream bug or issues that rely on dependency fixes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants