Figure 4 of the paper shows the bugs in NOVA, NOVA-Fortis, and PMFS that we found with Vinter. In this walkthrough, we reproduce the underlying data of this figure and the details in Section 5.3. As explained in the paper, this process is only partially automated.
-
State extraction works by communicating with the virtual machine over a serial console. The output sometimes also captures part of the state extraction command at the top. As that command is identical within a test, it has no effect on the analysis.
-
report-results.py
outputs ANSI color codes, useless -R
if you want to view the results in a pager.- States printed in green by
report-results.py
have at least one image of the "fully persisted" type. States in purple are those found exclusively by the heuristic.
- States printed in green by
-
VM execution is not entirely deterministic, so some identifiers may vary.
-
Timestamps in the file metadata may lead to non-determinism and extra semantic states reported by Vinter.
Run the full analysis with the command below. Choose the number of parallel
jobs appropriately for your environment (option -P
).
# From vinter repository root
fs-testing/scripts/script_run-parallel-all.sh -P 8 --trace-metadata walkthrough_bugs
The script will write results to results/vm_{nova,nova-protection,pmfs}/walkthrough_bugs/
.
In the following, we walk through the manual analysis of these results.
Run:
vinter_python/report-results.py analyze results/vm_nova/walkthrough_bugs/*
The tool summarizes the results by reading crash_images/index.yaml
and
semantic_states/index.yaml
for each test directory.
From the output, we can immediately identify test cases that do not detect bugs, marked by "single final state" and "atomic" in green text, and a checkmark in Figure 4:
(syntax: <name in Figure 4> = <test name>)
- append = test_append
- atime = test_atime
- [cm]time = test_ctime-mtime
- chmod = test_chmod
- chown = test_chown
- mkdir/rmdir = test_mkdir-rmdir
- unlink = test_unlink
We now go through the test cases where report-results.py
reported multiple
final states or atomicity violations.
Run:
vinter_python/report-results.py analyze --diff results/vm_nova/walkthrough_bugs/test_hello-world
We can see multiple final states:
- state03: content
HelloWorld\n
(correct) - state04: content
HelloWor\0\0\0
- state05: content
HelloWorl\0\0
- state06: content
HelloWorld\0
The last three bytes of file content may thus be lost.
We now reproduce the root-cause analysis given in Section 5.3.1 in the paper.
The output of report-results.py analyze --verbose
includes the numbers of
dirty lines at the checkpoint (e.g., 46784, the memory layout varies across
runs so the number may differ). Run the following:
vinter_python/report-results.py analyze --verbose results/vm_nova/walkthrough_bugs/test_hello-world
vinter_python/report-results.py trace-lines results/vm_nova/walkthrough_bugs/test_hello-world 46784
The last few lines of the output should match Figure 5 and Figure 6, for example (without stack traces):
304192 NT-write line 46784 + 0
content: b'HelloWor'
metadata: True!T __copy_user_nocache!
304193 write line 46784 + 8
content: b'l'
metadata: True!T __copy_user_nocache!
304194 write line 46784 + 9
content: b'd'
metadata: True!T __copy_user_nocache!
304195 write line 46784 + 10
content: b'\n'
metadata: True!T __copy_user_nocache!
The syntax of the metadata is as follows:
<in kernel mode?>!<function at instruction pointer>!<call stack>
Run:
vinter_python/report-results.py analyze --diff results/vm_nova/walkthrough_bugs/test_link-hard
We can see that Vinter identifies three states between checkpoint 2 and 3 where
the test code deletes a hardlink. From the diff output, we can see that in the
second state, the deleted file disappears, and then in the third state, the
hard link counter st_nlink
of the hardlink is decreased. Thus, the deletion
of the hard link is not atomic.
Run:
vinter_python/report-results.py analyze --diff results/vm_nova/walkthrough_bugs/test_link-sym
Vinter identifies four final states that differ in the link target (/mnt/myf, /mnt/myfi, /mnt/myfil, /mnt/myfile).
With an analysis similar to the "write" test, we can see that this bug shares the same root cause.
vinter_python/report-results.py analyze --verbose results/vm_nova/walkthrough_bugs/test_link-sym
vinter_python/report-results.py trace-lines results/vm_nova/walkthrough_bugs/test_link-sym 81088
Run:
vinter_python/report-results.py analyze --diff results/vm_nova/walkthrough_bugs/test_rename
The code creates two files /mnt/myfile
and /mnt/myfile2
and then renames
myfile2 to myfile, which should atomically replace the latter file.
Vinter identifies four crash states. From the diffs, we can see:
- state01: /mnt/myfile has disappeared
- state02: /mnt/myfile2 has disappeared. At that point, we have data loss as both files are now gone.
- state03: The renamed /mnt/myfile has reappeared.
We visualize these states in Figure 7 of the paper. For a correct atomic rename, we would expect only states state00 and state03.
In Section 5.3.2 of the paper, we note that all states have at least one crash
image of type "fully persisted". The report-results
tool prints such states
in green.
Run:
vinter_python/report-results.py analyze --diff results/vm_nova/walkthrough_bugs/test_rename-dir
Checkpoint 1 -> 2 creates a directory and is correct. Checkpoint 2 -> 3 creates a file and writes to that file, which is not expected to be atomic.
Checkpoint 3 -> 4 moves /mnt/newdir to /mnt/newdir2. Vinter identifies three states:
- state04: initial state
- state05: /mnt/newdir disappears, including its content. At this point, we have data loss.
- state06: /mnt/newdir reappears with its content in /mnt/newdir2.
For atomic directory rename, we would expect only state04 and state06.
rename long name = test_rename-long-name: atomicity violation, data loss, read/write fails after recovery
Run:
vinter_python/report-results.py analyze --diff results/vm_nova/walkthrough_bugs/test_rename-long-name
Vinter reports state extraction failure for state03 and state04. Looking at the error messages (see semantic_states/state0[34].txt), we can see that the extraction program cannot access the files, even though they are listed in the directory.
The remaining states also show issues:
- state01: /mnt/myfile disappears completely, resulting in data loss
- state02: the file reappears
For atomic rename, we would only expect state00 and state02 (and no errors).
Run:
vinter_python/report-results.py analyze --diff results/vm_nova/walkthrough_bugs/test_touch
Vinter reports an extra state (state02) occuring during the touch
operation,
where st_blocks
(the number of blocks allocated by the file system for the
file) increases from 0 to 8. We do not consider this a violation of atomicity.
Run:
vinter_python/report-results.py analyze results/vm_nova/walkthrough_bugs/test_touch-long-name
We can see that state extraction fails with three distinct errors (look at
semantic_states/stateXX.txt
):
- state02: accessing file /mnt/eizAKifFfyOn72ieKYxbCraXxNonCfH8CargS4xDIbOGGW6BPBCPEc1RYyNyZWZg
- state03: accessing file /mnt/eizAKifFfyOn72ieKYxbCraXxNonCfH8CargS4xDIbOGGW6BPBCPEc1RYyNyZWZgX
- state04: accessing file /mnt/eizAKifFfyOn72ieKYxbCraXxNonCfH8CargS4xDIbOGGW6BPBCPEc1RYyNyZWZgXX
(note the extra X at the end - the correct file has three X)
Vinter reports multiple final states. We refer to write which shares the same root-cause (unaligned data is not persisted correctly). We thus mark this test case as "correct", since there is no bug specific to updating (as opposed to writing/appending) to a file.
Run:
vinter_python/report-results.py analyze results/vm_nova-protection/walkthrough_bugs/*
Again, we can immediately detect tests without bugs:
- atime = test_atime
- chmod = test_chmod
- chown = test_chown
Since NOVA-Fortis is an extension of NOVA, we focus on the differences here.
Run:
vinter_python/report-results.py analyze --diff results/vm_nova-protection/walkthrough_bugs/test_hello-world
Vinter reports a single final state, so the unpersisted data bug from NOVA does not occur in NOVA-Fortis. The test creates a file and writes to it which is inherently not atomic. However, we see crash images where writing to another file fails after recovery.
Note that the shell command performing this write is not shown in the output of
report-results.py
and can be found in fs-testing/scripts/test_hello-world.yaml
.
Run:
vinter_python/report-results.py analyze --diff results/vm_nova-protection/walkthrough_bugs/test_{append,update-middle}
For both test cases, we can see an additional state where the complete content of the file is missing, resulting in data loss.
We can see that there are crash images for which the state extraction fails.
Looking at the output file (semantic_states/stateXX.txt
) we can see that our
state extraction tool reports an I/O error while reading the directory.
Similar to the previous bug, we see crash images here where writing to a file after recovery fails.
Run:
vinter_python/report-results.py analyze results/vm_pmfs/walkthrough_bugs/*
Again, we can immediately detect tests without bugs:
- append = test_append
- atime = test_atime
- chmod = test_chmod
- chown = test_chown
- link = test_link-hard
- symlink = test_link-sym
- rename long name = test_rename-long-name
- touch = test_touch
Run:
vinter_python/report-results.py analyze --diff results/vm_pmfs/walkthrough_bugs/test_hello-world
The tested operation echo HelloWorld > /mnt/myfile
creates a file, then
writes to it. We can see that these two steps are atomic, so there are no
issues.
Run:
vinter_python/report-results.py analyze --diff results/vm_pmfs/walkthrough_bugs/test_{ctime-mtime,mkdir-rmdir,rename,unlink}
We can see that deleting a file (or directory) first updates that file's ctime and mtime, then removes it from the directory. This could be considered a minor violation of atomicity.
For mkdir/rmdir, rename overwrite, and unlink, Vinter generates crash images where state extraction fails with a Linux kernel crash ("kernel BUG at /mnt/pmfs/fs/pmfs/balloc.c:70!").
Run:
vinter_python/report-results.py analyze --diff results/vm_pmfs/walkthrough_bugs/test_rename-dir
Between checkpoints 2 and 3, the test creates a new file and writes to it. These are two operations, so the atomicity violation that Vinter reports is correct, but not an issue.
Run:
vinter_python/report-results.py analyze --diff results/vm_pmfs/walkthrough_bugs/test_touch-long-name
We can see that when writing to a file, file timestamps are updated separately from the remaining metadata and contents, violating atomicity.
PMFS updates the file contents byte-by-byte. Since the semantics of such a write are not clear, we consider this correct behavior.