eBPF (The Extended Berkeley Packet Filter) is an in-kernel virtual machine that allows code execution in the kernel space, in a restricted sandbox environment with access to a limited set of functions. There are numerous components shipped by Red Hat that utilize the eBPF virtual machine. Each component is in a different development phase, and thus not all components are currently fully supported. Those that are not supported are available as a Technology Preview and are not intended for production use.For information on Red Hat scope of support for Technology Preview features, see: Technology Preview Features Support Scope
In this unit, we will be focusing on the kernel tracing offering shipped as bcc-tools (BPF Compiler Collection Tools). There are just over 100 tools shipped in this package. A few things of note:
-
All of these tools live in
/usr/share/bcc/tools
. -
These tools must run as the root user as any eBPF program can read kernel data. As such, injecting eBPF bytecode as a regular user is not allowed in RHEL 8.
-
Each tool has a man page to view the man page, run
man <tool name>
. These man pages include descriptions of the tools, provide the options that can be called and have information on the expected overhead of the specific tool.
For these exercises, you will primarily be using the host node2
as user root
. However, these exercises do REQUIRE a second terminal session to run commands on other hosts. Please pay careful attention to what commands to run on different hosts.
💡
|
Now is a great time to use the multi session capability of tmux. Use Ctrl-b " to create another session with a split screen. Cycle the active session back and forth with CTRL-b n (next) and CTRL-b p (previous).
|
From host bastion
, ssh to node2
.
ssh node2
Use sudo
to elevate your priviledges.
sudo -i
Verify that you are on the right host for these exercises.
workshop-ebpf-checkhost.sh
You are now ready to proceed with these exercises.
Start by installing the BPF Compiler Collection (bcc-tools) and kernel-devel packages for the installed kernel:
yum install bcc-tools kernel-devel-$(uname -r) -y
Now, we have a lot of interesting tools installed in /usr/share/bcc/tools along with accompanying man pages.
The tcplife
tool monitors tcp connections on the system and measures the amount of time that they’ve been open, the local and remote ports involved in the connection and the amount of data transferred and received in kilobytes.
To run this tool, execute the following and give it a couple of seconds to get set up:
/usr/share/bcc/tools/tcplife
PID COMM LADDR LPORT RADDR RPORT TX_KB RX_KB MS
In your second terminal, establish yourself as described below and then run the following command:
-
host: bastion
-
user: root
ssh node2 "sudo workshop-ebpf-rootkit.sh"
This command should run for about 10 seconds and then exit.
In the tcplife
terminal, you should now see output similar to:
PID COMM LADDR LPORT RADDR RPORT TX_KB RX_KB MS 28478 sshd 10.0.0.12 22 10.0.0.10 33514 14 2 9091.95
Doing the math with my numbers above, my session was open for about 9 seconds and was initiated by IP 10.0.0.10, the bastion.
In the tcplife
terminal, issue a Ctrl-C and this will return you to a prompt.
The execsnoop
script monitors all calls to execve() and catches all processes that follow the fork→exec sequence, as well as processes that re-exec() themselves. Processes that fork() but do not exec() won’t be caught by this script.
Continuing on host node2
, to run execsnoop
as follows:
/usr/share/bcc/tools/execsnoop
PCOMM PID PPID RET ARGS
In your second terminal, establish yourself as described below and then run the following command:
-
host: bastion
-
user: root
ssh node2 "sudo workshop-ebpf-rootkit.sh"
In the execsnoop
terminal, you should see output similar to:
PCOMM PID PPID RET ARGS sshd 28512 749 0 /usr/sbin/sshd -D [email protected],[email protected],aes256-ctr,aes256-cbc,[email protected],aes128-ctr,aes128-cb [email protected],[email protected],[email protected],[email protected],hmac-sha2- -oGSSAPIKexAlgorithms=gss-gex-sha1-,gss-group14-sha1- [email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-excha -oHostKeyAlgorithms=rsa-sha2-256,ecdsa-sha2-nistp256,[email protected],ecdsa-sha2-nistp384,ecdsa-sha2-nis -oPubkeyAcceptedKeyTypes=rsa-sha2-256,ecdsa-sha2-nistp256,[email protected],ecdsa-sha2-nistp384,ecdsa-sha -R unix_chkpwd 28514 28512 0 /usr/sbin/unix_chkpwd root chkexpiry bash 28516 28515 0 /bin/bash -c workshop-ebpf-rootkit.sh grepconf.sh 28517 28516 0 /usr/libexec/grepconf.sh -c grep 28518 28517 0 /usr/bin/grep -qsi ^COLOR.*none /etc/GREP_COLORS grepconf.sh 28519 28516 0 /usr/libexec/grepconf.sh -c grep 28520 28519 0 /usr/bin/grep -qsi ^COLOR.*none /etc/GREP_COLORS grepconf.sh 28521 28516 0 /usr/libexec/grepconf.sh -c grep 28522 28521 0 /usr/bin/grep -qsi ^COLOR.*none /etc/GREP_COLORS sed 28524 28523 0 /usr/bin/sed -r -e s/^[[:blank:]]*([[:upper:]_]+)=([[:print:][:digit:]\._-]+|"[[:print:][:digit:]\._-]+")/export \1=\2/;t;d /etc/locale.conf uname 28525 28516 0 /usr/bin/uname -a sleep 28526 28516 0 /usr/bin/sleep 1 who 28527 28516 0 /usr/bin/who sleep 28528 28516 0 /usr/bin/sleep 1 grep 28530 28516 0 /usr/bin/grep root /etc/passwd sleep 28531 28516 0 /usr/bin/sleep 1 grep 28532 28516 0 /usr/bin/grep root /etc/shadow sleep 28533 28516 0 /usr/bin/sleep 1 cat 28534 28516 0 /usr/bin/cat /etc/fstab sleep 28535 28516 0 /usr/bin/sleep 1 ps 28536 28516 0 /usr/bin/ps -ef sleep 28537 28516 0 /usr/bin/sleep 1 netstat 28538 28516 0 /usr/bin/netstat -tulpn sleep 28539 28516 0 /usr/bin/sleep 1 getenforce 28540 28516 0 /usr/sbin/getenforce sleep 28541 28516 0 /usr/bin/sleep 1 firewall-cmd 28542 28516 0 /usr/bin/firewall-cmd --state
This shows you all the processes that ran exec() during that ssh login, their PID, their parent PID, their return code, and the arguments that were sent to the process. You could keep monitoring this for quite some time to catch potential bad actors on the system.
In the execsnoop
terminal, issue a Ctrl-C and this will return you to a prompt.
Similar in nature to execsnoop
, mountsnoop
traces the mount() and umount() syscalls which show processes that are attempting to mount (or unmount) filesystems.
To run this tool, execute the following and give it a couple of seconds to get set up:
/usr/share/bcc/tools/mountsnoop
COMM PID TID MNT_NS CALL
PAY ATTENTION!!! USING A DIFFERENT HOST
In your second terminal, establish yourself as described below and then run the following command:
-
host: node2
-
user: root
Let’s try to unmount a mount that we know cannot be unmounted. For this, we’ll pick the root filesystem '/'.
umount /
umount: /: target is busy.
Taking a look at the terminal running mountsnoop
, we see:
umount 20001 20001 4026531840 umount("/", 0x0) = -EBUSY
This shows us that the mount is busy and cannot be unmounted.
Now let’s try to unmount a filesystem that we should be able to unmount. But before doing so, look at the mount options to ensure we can restore it correctly. On node2
run the following:
grep /dev/shm /proc/mounts
tmpfs /dev/shm tmpfs *rw,seclabel,nosuid,nodev,relatime* 0 0
Now proceed to umount /dev/shm
on node2
umount /dev/shm
Back to the mountsnoop
terminal and you should see the following:
umount 20003 20003 4026531840 umount("/dev/shm", 0x0) = 0
The umount command succeeded.
Proceed to restore the /dev/shm mount as follows:
mount -t tmpfs -o rw,nosuid,nodev,relatime,seclabel tmpfs /dev/shm
Finally, back to the mountsnoop
terminal and you should see the following:
mount 20004 20004 4026531840 mount("tmpfs", "/dev/shm", "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_SYNCHRONOUS|MS_DIRSYNC|MS_NOATIME|MS_NODIRATIME|MS_MOVE|MS_REC|MS_UNBINDABLE|MS_SLAVE|MS_SHARED|MS_I_VERSION|MS_STRICTATIME|MS_LAZYTIME|MS_NOUSER|0x7f2b30000000, "") = 0
This shows us that the mount succeeded and all the options that were passed into the system call.
As you can see, the mountsnoop
tool is very useful for seeing what processes are calling the mount and umount system calls and what the results of those calls are.
In the mountsnoop
terminal, issue a Ctrl-C and this will return you to a prompt.
|
Please verify the filesystem your host is using with the command df -T / . If your host is configured with ext4, then substitute the command ext4slower in place of xfsslower .
|
The purpose of the xfsslower
tool (also ext4slower
and nfsslower
) is to show you filesystem operations slower than a particular threshold, that defaults to 10ms. It traces reads, writes, opens, and syncs and then prints out the timestamp of the operation, the process name, the ID, the type of operation, the file offset in kilobytes, the latency of the I/O measured from when it was issued by VFS to the filesystem to when it was completed, and finally, the filename being operated on.
To run this tool, execute the following and give it a couple of seconds to get set up:
/usr/share/bcc/tools/xfsslower
Tracing xfs operations slower than 10 ms TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
In your second terminal, establish yourself as described below and then run the following command:
-
host: node2
-
user: root
dd if=/dev/urandom of=bigfile bs=1024 count=20000
This writes out a 20M file called bigfile and should not register on your xfsslower
window.
Now, let’s execute the above command in a for loop so that we get more I/O going in parallel:
for i in $(seq 1 10); do dd if=/dev/urandom of=bigfile$i bs=1024 count=20000 & done
Now you should see similar output in your xfsslower
window:
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 20:44:43 b'dd' 32446 W 1024 778 44.11 b'bigfile1' 20:44:43 b'dd' 32455 W 1024 818 55.11 b'bigfile10' 20:44:43 b'dd' 32452 W 1024 1712 44.11 b'bigfile7' 20:44:43 b'dd' 32455 W 1024 1778 55.02 b'bigfile10' 20:44:43 b'dd' 32451 W 1024 2850 44.11 b'bigfile6' 20:44:43 b'dd' 32447 W 1024 3598 44.10 b'bigfile2' 20:44:43 b'dd' 32451 W 1024 3805 55.11 b'bigfile6' 20:44:43 b'dd' 32446 W 1024 4612 44.28 b'bigfile1' 20:44:43 b'dd' 32446 W 1024 5529 33.01 b'bigfile1' 20:44:43 b'dd' 32454 W 1024 4504 55.11 b'bigfile9' 20:44:43 b'dd' 32447 W 1024 7335 44.10 b'bigfile2' 20:44:43 b'dd' 32455 W 1024 7545 44.02 b'bigfile10' 20:44:43 b'dd' 32446 W 1024 8344 49.16 b'bigfile1' 20:44:43 b'dd' 32448 W 1024 8183 44.18 b'bigfile3' 20:44:43 b'dd' 32447 W 1024 9168 55.10 b'bigfile2' 20:44:43 b'dd' 32449 W 1024 9728 54.10 b'bigfile4' 20:44:43 b'dd' 32454 W 1024 10244 33.11 b'bigfile9' 20:44:43 b'dd' 32447 W 1024 10989 55.02 b'bigfile2' 20:44:43 b'dd' 32453 W 1024 11276 54.10 b'bigfile8' 20:44:43 b'dd' 32453 W 1024 12169 33.10 b'bigfile8' 20:44:43 b'dd' 32451 W 1024 13292 91.11 b'bigfile6' 20:44:43 b'dd' 32453 W 1024 13108 47.24 b'bigfile8' 20:44:43 b'dd' 32448 W 1024 13788 44.01 b'bigfile3' 20:44:43 b'dd' 32454 W 1024 14137 44.23 b'bigfile9' 20:44:43 b'dd' 32446 W 1024 16076 44.02 b'bigfile1' 20:44:43 b'dd' 32447 W 1024 15796 44.26 b'bigfile2' 20:44:44 b'dd' 32446 W 1024 17004 44.10 b'bigfile1' 20:44:44 b'dd' 32455 W 1024 16697 44.16 b'bigfile10' 20:44:44 b'dd' 32450 W 1024 18505 44.01 b'bigfile5' 20:44:44 b'dd' 32451 W 1024 19056 44.17 b'bigfile6' 20:44:44 b'dd' 32446 W 1024 19868 44.38 b'bigfile1' 20:44:44 b'dd' 32452 W 1024 19272 44.14 b'bigfile7' 20:44:44 b'dd' 32455 W 1024 19168 30.75 b'bigfile10' 20:44:44 b'dd' 32453 W 1024 19612 31.16 b'bigfile8' 20:44:44 b'dd' 32454 W 1024 19460 24.59 b'bigfile9' 20:44:44 b'dd' 32447 W 1024 19508 36.20 b'bigfile2'
So we can see that when writing these files in parallel, we have xfs operations taking longer than 10ms to complete.
In the xfsslower
terminal, issue a Ctrl-C and this will return you to a prompt.
The cachestat
tool traces kernel page cache functions and prints every five second summaries to aid you in workload characterization.
To run this tool, execute the following and give it a couple of seconds to get set up:
/usr/share/bcc/tools/cachestat
TOTAL MISSES HITS DIRTIES BUFFERS_MB CACHED_MB
In your second terminal, establish yourself as described below and then run the following command:
-
host: node2
-
user: root
echo 3 > /proc/sys/vm/drop_caches
for i in $(seq 1 10); do dd if=/dev/urandom of=bigfile$i bs=1024 count=20000 & done
This flushes the cache and then runs a series of dd
commands to create some I/O.
In the cachestat
window, you should output similar to:
TOTAL MISSES HITS DIRTIES BUFFERS_MB CACHED_MB 0 0 0 0 0 154 14773 901 13872 44133 1 200
This shows that we had 901 page cache misses during a five second period while running the above loop, but during that same second, there were 9,821 hits, indicating great performance from the page cache.
In the cachestat
terminal, issue a Ctrl-C and this will return you to a prompt.
This tool is a swiss army knife allowing you to specify functions to trace and messages to be printed when certain conditions are met. You can read more about this by running:
man 8 trace
Let’s do a simple trace in which we will dynamically trace the do_sys_open() kernel function and print the names of the files opened.
To run this tool, execute the following and give it a couple of seconds to get set up:
/usr/share/bcc/tools/trace 'p::do_sys_open "%s", arg2'
PID TID COMM FUNC -
In your second terminal, establish yourself as described below and then run the following command:
-
host: node2
-
user: root
cat /etc/fstab
In the trace
terminal, you will see something similar to:
32559 32559 cat do_sys_open b'/etc/ld.so.cache' 32559 32559 cat do_sys_open b'/lib64/libc.so.6' 32559 32559 cat do_sys_open b'/usr/lib/locale/locale-archive' 32559 32559 cat do_sys_open b'/usr/share/locale/locale.alias' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_IDENTIFICATION' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION' 32559 32559 cat do_sys_open b'/usr/lib64/gconv/gconv-modules.cache' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_MEASUREMENT' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_MEASUREMENT' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_TELEPHONE' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_TELEPHONE' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_ADDRESS' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_ADDRESS' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_NAME' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_NAME' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_PAPER' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_PAPER' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_MESSAGES' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_MESSAGES' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_MONETARY' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_MONETARY' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_COLLATE' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_COLLATE' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_TIME' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_TIME' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_NUMERIC' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_NUMERIC' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.UTF-8/LC_CTYPE' 32559 32559 cat do_sys_open b'/usr/lib/locale/en_US.utf8/LC_CTYPE' 32559 32559 cat do_sys_open b'/etc/fstab'
Go ahead and Ctrl-C trace
and then let’s do one more trace, this time, tracing the return values out of trace
:
/usr/share/bcc/tools/trace 'r::do_sys_open "ret: %d", retval'
cat /etc/fstab
PID TID COMM FUNC -
In the trace
terminal, you will see something similar to:
PID TID COMM FUNC - 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: -2 32576 32576 cat do_sys_open ret: 3 32576 32576 cat do_sys_open ret: 3
Go to the terminal with trace
running and issue a Ctrl-C. This will end the process and return you to the command line.
You may also now type exit
in one of your tmux panes to return to having just one pane visible.
There is a lot more that you can do with this tool when you actually need to start tracing what is getting passed into kernel functions and what is being returned by those kernel functions.
ℹ️
|
You are not required to reference any additional resources for these exercises. This is informational only. |
-
bcc-tools brings dynamic kernel tracing to Red Hat Enterprise Linux 8.1
-
Why is NetworkManager talking to static.redhat.com? Find out with bcc-tools!
-
Performance observability in practice with bcc-tools: A lab on lab.redhat.com
-
Upstream XDP Tutorial (eXpress Data Path networking is tech preview in RHEL 8.2.)