This repository serves as a workflow demonstration of 3C, a tool that automatically converts C code to Checked C code.
Checked C is an extension to C which aims to ensure spatial memory safety. It achieves this by supporting three new checked pointer types:
_Ptr<
T>
is either NULL or points to a single object of type T_Array_ptr<
T>
is either NULL or points to an array of zero or more objects of type T_Nt_array_ptr<
T>
is either NULL or points to an array of zero or more objects of type T up to a NULL terminator
The latter two types are associated with bounds declarations, that define how long the pointed-to array can be. For example, _Array_ptr<int> x : count(10);
is a declaration of variable x
which is a pointer to an array of size 10.
Programmers can annotate regions of code as checked; e.g., a code block { ... }
can be marked _Checked { ... }
. Checked regions restrict their contents so as to ensure spatial safety; e.g., checked regions may only use checked pointers. Regions of code deemed to be unchecked may freely intermix checked and legacy pointers. This flexibility permits a partial, incremental conversion of programs.
3C analyzes a codebase to determine how its pointers are used, and whether they are used safely. Safely used pointers have their declarations rewritten so as to have checked types. For example, a pointer declared int *p = &x
might get rewritten to _Ptr<int> p = &x
.
3C is unlikely to completely and correctly convert a whole program in one go, so programmers are encouraged to make additional changes to the converted code. Once they have, they may find it useful to run through 3C again.
This demo therefore takes place over a number of commits, each of which represents the project after a step in the process.
You will need to install 3C and Checked C's clang
compiler before beginning. Since 3C's repository also includes the clang
compiler, you can build both when installing 3C. Follow the instructions linked above and also build TARGET clang
(along with target 3c
).
Make sure to clone the checkedc
project into the checkedc-wrapper
directory as in the instructions. There are important header files that 3C needs. Note the location of the build/bin
directory, which might change if you use an IDE. Add the build directory to your PATH
so that both 3c
and clang
are available.
The demo uses the version of 3C and Checked C's clang
current as of November 22, 2021 (this commit for 3C and clang, and this commit for the Checked C headers, needed when building them both).
We start from a fork of the tiny-bignum-c project. This is a small library with multiple test programs. The rest of this README describes the changes we made starting at this commit (the root of the fork). If you want to go through the tutorial yourself, we recommend cloning this repo and then going to this commit. It's the same as the one above, but we have added a script convert_all.sh
to the root directory and improved the makefile and .gitignore
file. Save this README, though, as it was added to the repo after the work it describes.
To being the conversion, we run 3C as follows, using the convert_all.sh
script:
3c -alltypes -warn-root-cause \
-output-dir=out \
./bn.c \
tests/hand_picked.c \
tests/rsa.c \
tests/factorial.c \
tests/load_cmp.c \
tests/test_div_algo.c \
tests/golden.c \
tests/randomized.c \
-- -Wall -Wextra -I.
This is a single command that executes 3c
on bn.c
, the main library file, and on tests/*.c
, the various test files. In the process of considering all of these files together, it will also consider bn.h
, since the files #include
it. There are several options we are passing to 3c
.
-
-alltypes
directs3c
to attempt infer_Array_ptr
and_Nt_array_ptr
types, and their bounds. If you do not pass in-alltypes
then only_Ptr
types are inferred. -
-warn-root-cause
asks3c
to output information about certain "root causes" that prevent pointers from being made checked. This option currently doesn't output anything fortiny-bignum-c
, but it is useful in general. -
-output-dir=out
tells3c
to not rewrite the original.c
or.h
file, but instead to output any changed file in a mirrored directory structure rooted at directoryout
. E.g.,bn.c
is output toout/bn.c
andtests/load_cmp.c
is output toout/tests/load_cmp.c
. -
The
--
is a standard syntax used with Clang LibTooling-based tools (which3c
is). It says that the following options should be passed to theclang
portion of3c
before running the3c
-specific portion. The three options here say to run the typechecker with extra warnings and to consider the current directory for#include
s.As an alternative to the
--
syntax, you can have3c
read theclang
options from a Clang compilation database (in which case the options can be different for each file), but we don't bother to use a compilation database for this small project. If you are not using a compilation database, you should pass--
to tell3c
not to search for one, even if you are not passing anyclang
options after the--
.
After running this command, we will see modified versions of the library files in the out
directory. We can run cp -r out/* .
to move the checked versions over-top of the originals. Now, if we like, we can see the effect of the conversion by doing git diff
. If we wanted to recover overwritten files we could use git reset
.
We commit these changes with comment "First run of 3c".
A lot of changes were made by this conversion command. Having copied the changed files over-top their originals, we can see these changes with the command git diff
. Most of the diffs involve struct bn*
being changed to _Ptr<struct bn>
. Arrays have the _Checked
or _Nt_checked
keyword added to ensure runtime bounds checks, and all checked pointers are initialized.
Now we can try to compile the code using Checked C's clang
by typing make -k CC=clang
(recall that we are assuming that Checked C's clang
is in your PATH
). Unfortunately, the code does not compile; we see errors like this for code in bignum_from_string
:
bn.c:117:12: error: it is not possible to prove argument meets declared bounds for 1st parameter
sscanf(&str[i], SSCANF_FORMAT_STR, &tmp);
^~~~~~~
bn.c:117:12: note: source bounds are an empty range
bn.c:117:12: note: the expected argument bounds use the variable 'i' and there is no relational information involving 'i' and any of the expressions used by the inferred bounds
bn.c:117:12: note: (expanded) expected argument bounds are 'bounds((const char *)&str[i], (const char *)&str[i] + 0)'
bn.c:117:12: note: (expanded) inferred bounds are 'bounds(str, str + 0)'
sscanf(&str[i], SSCANF_FORMAT_STR, &tmp);
^~~~~~~
The Checked C compiler comes with its own Checked C declarations of the most common C library functions. If we look at the type of sscanf
in the header stdio_checked.h
, we see that the input buffer of sscanf
has an interop type:
int sscanf(const char * restrict s : itype(restrict _Nt_array_ptr<const char>),
const char * restrict format : itype(restrict _Nt_array_ptr<const char>), ...);
To a first approximation, the interop type states that callers can treat s
and format
as either an unchecked const char *
or a checked _Nt_array_ptr<const char>
without an explicit cast. Interop types (often abbreviated "itypes") provide valuable flexibility: we gain the ability to treat s
as an _Nt_array_ptr<char>
without having to immediately add casts to all code that treats it as a char *
.
This particular itype makes sense for sscanf
because it stops reading s
at a null byte: we can safely pass any _Nt_array_ptr<const char>
and sscanf
will not read out of bounds. But while bignum_from_string
could safely pass str
itself to sscanf
, the same is not necessarily true of &str[i]
: unless we know something more about str
or i
, &str[i]
might be beyond the null terminator of str
and sscanf
might read out of bounds. Unfortunately, while the compiler knows that str
is nbytes
long (since the declaration of str
to _Nt_array_ptr<char> str : count(nbytes)
), it is unable to guess the loop invariant 0 <= i <= nbytes - (2 * WORD_SIZE)
that is needed to prove that &str[i]
is in bounds. We help the compiler out using a _Dynamic_bounds_cast
. Such a cast is an assertion about a pointer's bounds that the compiler can assume to be true from here on, based on a check it will insert at run-time. We replace the sscanf
call with the following:
_Nt_array_ptr<char> read_pos =
_Dynamic_bounds_cast<_Nt_array_ptr<char>>(str + i, count(0));
_Unchecked { sscanf(read_pos, SSCANF_FORMAT_STR, &tmp); }
Why is the bound `count(0)`?
Do not be confused by the count(0)
: here, and for any _Nt_array_ptr
, it specifies the guaranteed minimum length of the array, not the exact length. _Nt_array_ptr
without a count specified defaults to count(0)
. Checked C code is allowed to read elements beyond this minimum as long as it checks that each element is not a null terminator before reading the next, as sscanf
does.
Now bn.c
compiles, but we (eventually, it takes a while to compile) get three similar errors in tests/randomized.c
:
./tests/randomized.c:39:35: error: call expression not allowed in argument for parameter used in function parameter bounds expression
bignum_from_string(&a, argv[2], strlen(argv[2]));
^~~~~~~~~~~~~~~
./tests/randomized.c:40:35: error: call expression not allowed in argument for parameter used in function parameter bounds expression
bignum_from_string(&b, argv[3], strlen(argv[3]));
^~~~~~~~~~~~~~~
./tests/randomized.c:41:35: error: call expression not allowed in argument for parameter used in function parameter bounds expression
bignum_from_string(&c, argv[4], strlen(argv[4]));
^~~~~~~~~~~~~~~
The problem here is that the compiler can't reason about the strlen
call's connection its argument's length. Since this code pattern occurs 3 times and it's going to take several extra lines of code to make the compiler happy, let's start by encapsulating the operation in a helper function:
bignum_from_string_using_strlen(&a, argv[2]);
bignum_from_string_using_strlen(&b, argv[3]);
bignum_from_string_using_strlen(&c, argv[4]);
For the implementation of bignum_from_string_using_strlen
, we use the following:
void bignum_from_string_using_strlen(_Ptr<struct bn> n, _Nt_array_ptr<char> str) {
int len = strlen(str);
_Nt_array_ptr<char> str2 : count(len) =
_Assume_bounds_cast<_Nt_array_ptr<char>>(str, count(len));
bignum_from_string(n, str2, len);
}
Here the _Assume_bounds_cast
is telling the compiler to simply assume that str2
(an alias for str
) is at least as long as count(len)
. This is not a crazy assumption: This is what strlen
is supposed to do! (The Checked C compiler should be able to verify this; it is a feature in progress.)
Now the code compiles without warnings. You can run the tests by typing make test
. We commit these changes with the message "manual commit: migrate bignum_from_string".
What if I don't want to assume strlen's connection to its parameter's bounds?
Since the compiler does not verify the assumption, this construct is not allowed in a checked region, so we have to mark the function _Unchecked
. This version of the code compiles and passes the tests. However, with a little more work, we can eliminate the unchecked region by scanning the string ourselves in a way that the compiler can verify that the null terminator is where we claim it is:
void bignum_from_string_using_strlen(_Ptr<struct bn> n, _Nt_array_ptr<char> str) {
int i = 0;
_Nt_array_ptr<char> str2 : count(i) = str;
while (str2[i]) {
int j = i + 1 _Where str2 : count(j);
i = j;
}
bignum_from_string(n, str2, i);
}
Each time we check that str2[i]
is not the terminator, we're allowed to increase the guaranteed minimum length of str2
to i + 1
. The loop terminates with i == strlen(str)
, so we call the equivalent of the original bignum_from_string(n, str, strlen(str))
but with the Checked C compiler able to verify the spatial memory safety.
While it's nice to know how to verify the length of an _Nt_array_ptr
at runtime in Checked C, in the real world, we might prefer to use the first implementation of bignum_from_string_using_strlen
with the unchecked region and accept responsibility for reviewing it manually (it's very simple) in exchange for using the C library's much better-optimized strlen
implementation.
While the code compiles and runs, not every pointer is yet checked. Looking at bn.h
and bn.c
, we notice the type of str
in this function prototype is an itype:
void bignum_to_string(_Ptr<struct bn> n, char *str : itype(_Array_ptr<char>) count(8191), int maxsize);
Why? It turns out that str
is used unsafely in the body of bignum_to_string
, so 3C automatically assigned it an itype so the body of bignum_to_string
will continue to compile but 3C can go ahead and start updating calls to bignum_to_string
to pass checked pointers. In this way, 3C localizes unsafe behavior in order to convert as much of the code as possible in each pass.
We'll see later why 3C believes str
is used unsafely inside bignum_to_string
. For now, let's consider the bounds 3C has given it: count(8191)
. It turns out that every caller in this code is always passing in a buffer of that size, so 3C was not wrong. But we can also see from looking at the code that the programmer's intention is that maxsize
indicates the size of str
. If we change 8191
to maxsize
and then try to compile again, we now start to see some errors like this:
./tests/golden.c:260:29: error: argument does not meet declared bounds for 2nd parameter
bignum_to_string(&sa, buf, sizeof(buf));
^~~
./tests/golden.c:260:29: note: destination bounds are wider than the source bounds
./tests/golden.c:260:29: note: destination upper bound is above source upper bound
./tests/golden.c:260:29: note: (expanded) expected argument bounds are 'bounds((char *)buf, (char *)buf + (int)sizeof (buf))'
./tests/golden.c:260:29: note: (expanded) inferred bounds are 'bounds(buf, buf + 8191)'
bignum_to_string(&sa, buf, sizeof(buf));
^~~
Let's look closer at the code flagged in the first error:
char buf _Nt_checked[8192];
// ...
bignum_to_string(&sa, buf, sizeof(buf));
printf(" a = %s \n", buf);
The reason that this is problematic is that buf
has been inferred to be a null-terminated array (_Nt_checked
), which follows special rules in Checked C. (buf
has to be null-terminated because it is passed to printf
, which requires a null-terminated string.) In Checked C, an _Nt_array_ptr : count(n)
points to an array of at least n
elements not counting the null terminator. This makes it safe to convert an _Nt_array_ptr : count(n)
to an _Array_ptr : count(n)
, because the _Array_ptr
cannot be used to overwrite the null terminator. (If the _Nt_array_ptr
count included the null terminator, then conversion to an _Array_ptr
would have to subtract 1 from the count, which would be ugly.) However, this means that when an _Nt_checked
array is declared, Checked C automatically reserves the last element for the null terminator, and the corresponding _Nt_array_ptr
count is one less than the original array size. In our example, buf
becomes an _Nt_array_ptr<char> : count(8191)
. This is not convertible to the _Array_ptr<char> : count(8192)
that we get when we instantiate the declared type of the str
parameter with the specified maxsize = sizeof(buf) = 8192
, hence the compile error.
To find the correct way to fix the error, first we need to confirm what the API of bignum_to_string
actually is: does the maxsize
parameter include the null terminator or not? Either design is valid as long as it is used consistently. Clearly, the call sites assume that maxsize
includes the null terminator. If this were the API, we would just change the declaration of str
to say count(maxsize-1)
, and the errors at the call sites would go away. But examination of the implementation of bignum_to_string
suggests that it assumes that maxsize
(a.k.a. nbytes
) excludes the null terminator: bignum_to_string
asserts that nbytes
is even on the grounds that converting a whole number of bytes to hexadecimal should produce an even number of characters, which only makes sense if nbytes
is the number of characters excluding the null terminator. So Checked C porting has called our attention to an inconsistency in the original C code that presents a potential spatial memory safety violation. If the caller passes an 8192-byte buffer (8191 bytes plus a null terminator) but bignum_to_string
thinks the buffer has space for 8192 bytes plus a null terminator, bignum_to_string
might write the null terminator out of bounds or might just write 8192 non-null bytes, causing the subsequent printf
to read out of bounds.
To resolve the API inconsistency, we can either change the call sites so that maxsize
excludes the null terminator or change the implementation so that maxsize
includes it. Again, either approach would lead to a working program; in general, the decision may depend on the difficulty of updating the call sites as compared to the implementation. In this case, a quick look at both the implementation and callers of bignum_from_string
indicates that its nbytes
parameter excludes the null terminator, so we decide to make bignum_to_string
consistent with bignum_from_string
by changing the call sites so that maxsize
excludes the null terminator. This is a matter of replacing sizeof(buf)
with sizeof(buf)-1
(or whatever the variable is named) at all the call sites. But since bignum_to_string
requires maxsize
to be even, we also have to add 1 to the size of buf
so that sizeof(buf)-1
will be even.
We commit these changes with the message "manual commit: fix bignum_to_string buffer lengths". With this change, the code should compile and run properly (again).
We said above that 3C gave the str
parameter of bignum_to_string
an interop type (itype) because str
was used unsafely inside the function. What exactly is the unsafe use? The easiest way to find out is to manually change the type to be fully checked rather than an itype:
void bignum_to_string(_Ptr<struct bn> n, _Array_ptr<char> str : count(nbytes), int nbytes)
and likewise in bn.h
, and then recompile: this will turn any unsafe use into a compile error:
bn.c:140:13: error: passing '_Array_ptr<char>' to parameter of incompatible type 'char *'
sprintf(&str[i], SPRINTF_FORMAT_STR, n->array[j]);
^~~~~~~
/home/mwh/checkedc-clang/build/lib/clang/12.0.0/include/stdio_checked.h:102:29: note: passing argument to parameter 's' here
int sprintf(char * restrict s,
^
The problem here is a type mismatch in the call to sprintf
. Here's its checked type in stdio_checked.h
:
int sprintf(char * restrict s,
const char * restrict format : itype(restrict _Nt_array_ptr<const char>), ...);
Notice that format
has an itype, so it can be passed either a const char *
(by C code that has not yet been converted) or an _Nt_array_ptr<const char>
(by fully checked code) without a cast. But s
(the output buffer) is just a char *
, so it does not accept checked pointers at all, hence the error when we try to pass &str[i]
(which is a checked pointer because str
is). s
was declared this way because it's not easy to statically bound the amount of data that sprintf
writes to its output buffer (as C programmers should be well aware!).
If we want to verify our code with Checked C, we'll have to use snprintf
instead. Here's its checked declaration:
int snprintf(char * restrict s : itype(restrict _Nt_array_ptr<char>) count(n-1),
size_t n _Where n > 0,
const char * restrict format : itype(restrict _Nt_array_ptr<const char>), ...);
Here s
accepts an _Nt_array_ptr<char> : count(n-1)
(because the n
parameter to snprintf
includes the null terminator but the count
of an _Nt_array_ptr
does not).
Based on this bounds declaration, we should call snprintf(&str[i], nbytes - i + 1, ...)
: Any case that would previously have caused a buffer overflow will instead silently truncate the output. However, silent truncation may still constitute an application bug, so while we're thinking about buffer lengths, it's worth thinking about truncation too. A quick look at the definition of SPRINTF_FORMAT_STR
shows that we expect it to write 2 * WORD_SIZE
characters. The loop condition nbytes > (i + 1)
(equivalent to nbytes >= i + 2
) ensures that we have space for another 2 characters, but we need 2 * WORD_SIZE
(and WORD_SIZE
defaults to 4). So Checked C porting has called our attention to another buffer overflow in the original C code! It wasn't triggered by any of the test cases since they all use large buffers, but we could demonstrate it by using a small buffer size that is a multiple of 2 but not 2 * WORD_SIZE
.
It appears the intent was to exit the loop if the next iteration would overflow the buffer, so to achieve that, we can simply replace nbytes > (i + 1)
with nbytes >= i + 2 * WORD_SIZE
.
(We will just assume output truncation is OK at this point, to keep things moving; a more debuggable design might be to report an error if the output buffer is not big enough.)
After updating the code as follows:
while ((j >= 0) && (nbytes >= i + 2 * WORD_SIZE))
{
snprintf(&str[i], nbytes - i + 1, SPRINTF_FORMAT_STR, n->array[j]);
we still get a similar compile error:
bn.c:140:14: error: passing '_Array_ptr<char>' to parameter of incompatible type 'char *'
snprintf(&str[i], nbytes - i + 1, SPRINTF_FORMAT_STR, n->array[j]);
^~~~~~~
The problem now is that str
is an _Array_ptr<char>
but snprintf
requires either a char *
or an _Nt_array_ptr<char>
. (The error message leaves something to be desired.) When we originally ran 3C, there was nothing in the code to indicate that str
needed to be an _Nt_array_ptr<char>
, so 3C gave it the more general type _Array_ptr<char>
. We can just change the type to _Nt_array_ptr<char>
in both bn.c
and bn.h
, since the call sites are already passing_Nt_array_ptr<char>
. Then we're left with a compiler warning:
bn.c:140:14: warning: cannot prove argument meets declared bounds for 1st parameter [-Wcheck-bounds-decls-unchecked-scope]
snprintf(&str[i], nbytes - i + 1, SPRINTF_FORMAT_STR, n->array[j]);
^~~~~~~
bn.c:140:14: note: (expanded) expected argument bounds are 'bounds((char *)&str[i], (char *)&str[i] + (size_t)nbytes - i + 1 - 1)'
bn.c:140:14: note: (expanded) inferred bounds are 'bounds(str, str + nbytes)'
snprintf(&str[i], nbytes - i + 1, SPRINTF_FORMAT_STR, n->array[j]);
^~~~~~~
Unfortunately, the compiler's support for algebraic simplification of bounds expressions is currently very limited, and it is unable to verify that (char *)str + i + (size_t)nbytes - i + 1 - 1 = str + nbytes
. We can live with this warning, or we can use a _Dynamic_bounds_cast
to give us greater assurance.
How we can use `_Dynamic_bounds_cast` to eliminate this warning
To make the warning go away, we can use a _Dynamic_bounds_cast
. We also have to introduce several temporary variables: We replace the original snprintf
call with the following:
size_t remaining_space = nbytes - i;
_Nt_array_ptr<char> write_pos : count(remaining_space) =
_Dynamic_bounds_cast<_Nt_array_ptr<char>>(str + i, count(remaining_space));
snprintf(write_pos, remaining_space + 1, SPRINTF_FORMAT_STR, n->array[j]);
Now the code once again compiles without warnings and passes the tests.
Another valid approach would have been to use 2 * WORD_SIZE + 1
rather than nbytes - i + 1
as the size argument to snprintf
to make it clearer when looking at the snprintf
call in isolation that truncation cannot occur. Ideally, the Checked C compiler would still be able to verify that snprintf(str + i, 2 * WORD_SIZE + 1, ...)
is safe because of the loop condition that nbytes >= i + 2 * WORD_SIZE
. However, the compiler currently doesn't support this kind of reasoning, so we'd have to use a _Dynamic_bounds_cast
like the one above anyway. If we want to detect truncation, yet another approach would be to use a wrapper function for snprintf
that raises an assertion error on truncation.
At this point the code compiles again, and the tests work properly. We commit these changes with the message "manual commit: migrate bignum_to_string to snprintf".
We've seen that by default, Checked C lets us mix checked and unchecked operations, but its safety guarantee applies only to code in designated checked regions. Ultimately, we'd like to place as much of the code as possible in checked regions to make it easier to see what (if any) unchecked operations remain whose safety we have to reason about manually. We can do this by adding #pragma CHECKED_SCOPE on
at the beginning of each .c
file, which places all subsequent code in a checked region.
Adding #pragma CHECKED_SCOPE on
to the beginning of a file forces all code within it, including code from expanded #include
s, to be considered in a checked regiond. This isn't a problem for us because we're applying it to the entire program at once and all the external headers we use (such as stdio.h
) are compatible with #pragma CHECKED_SCOPE on
. In a header file, it's good practice to localize the designation of a checked region to that file, thus:
#pragma CHECKED_SCOPE push
#pragma CHECKED_SCOPE on
...code...
#pragma CHECKED_SCOPE pop
We apply the above pattern to bn.h
, and just add #pragma CHECKED_SCOPE on
to the top of bn.c
and the files in tests
. Having done so, the code almost compiles: the compiler is now complaining about the _Assume_bounds_cast
we added in bignum_from_string_using_strlen
. We can address this problem by putting the function in an unchecked region by annotating it with _Unchecked
(or we can modify it to use _Dynamic_bounds_cast
instead, as described above). With this, the code compiles and passes the tests again, and we're guaranteed that the entire program is spatially memory safe except for the _Unchecked
blocks (modulo bugs and limitations of the Checked C compiler). We commit these changes with the message "manual commit: put everything in a checked scope except printf, etc.".
Thus concludes our Checked C port of tiny-bignum-c
; future versions of the compiler should make some steps a little better, but overall that wasn't too bad. We even found two spatial memory safety bugs in the original code in the process.
Good luck porting your own programs! We hope you find that 3C and Checked C provide a good degree of assurance of spatial memory safety in relation to the porting work you put in, compared to more ambitious approaches such as porting to a language like Rust. We've pursued this tutorial to the bitter end in order to illustrate as many issues as possible, but in a real port, you can choose to stop when you believe that additional reworking of unchecked code is not justified by the stronger assurance you get for that code.