Table of contents:
For this tutorial, make sure you are in the CBAT Docker container. Instructions for installing the container can be found here.
All commands for this tutorial should be run from this directory:
/path/to/cbat_tools/docs/tutorial
Let us begin by providing a brief overview of what CBAT is, and what it does.
CBAT is a tool that helps you analyze the behavior of binary executables (i.e., compiled programs). So it does not analyze the source code you have written in C, or Rust, or what have you. Rather, CBAT analyzes the assembly-level machine instructions that the computer executes directly when it runs your compiled program.
This is useful, because the executable is the real program. High level programming languages offer lots of different features, but at the end of the day, it all gets compiled down to the same basic machine instructions, and that is what CBAT analyzes.
At its heart, CBAT checks whether a function in your program (or programs) behaves in a certain way. In other words, it can verify that a function in your program has certain properties.
To illustrate the point, imagine the following. Suppose there is a function foo
in your program, and you want to know if it always produces the number 10
as its output. You can ask CBAT to check this for you.
CBAT will explore all logical possibilities, and find if there is any possible way that foo
could produce anything other than 10
. If it finds a way, it will give you an example of inputs you can feed into foo
to make it do what you didn't expect.
It is worth emphasizing that CBAT does not perform a probabilistic or fuzzy sort of check here. CBAT performs a mathematical check of all logical possibilities.
We do make a few simplifying assumptions for the sake of performance, but for the most part, this means that (up to our assumptions) if there is any way to make foo
produce anything other than 10
, CBAT will find it. By contrast, if CBAT cannot find a way, then CBAT has essentially found a logical proof that foo
always produces 10
.
CBAT can analyze functions in a single binary program, but it can also compare functions in two different programs, and check whether their behavior is the same.
This is useful for checking whether a newer, modified version of a program still works the same way as an older version.
For example, suppose you make some improvements to foo
, and you want to know whether your improvements have broken anything. CBAT can analyze both your original and your improved program, and make sure that the modified version of foo
always produces the same output as the original version of foo
.
CBAT is a family of command line tools built on top of Carnegie Mellon's Binary Analysis Platform (BAP). CBAT and BAP do not analyse high-level code in a source language like C or Rust. Rather, they analyze the compiled, "binary" version of such programs.
When you compile (say) a C program or a Rust program, the compiler converts your source code into machine code, which the computer can execute directly.
Machine code is pretty simple. A machine has a bunch of registers that you can stash values in, often with well-known names like RAX
, RDI
, R9
, and so on:
+----------+---------------+ | Register | Slot | +----------+---------------+ | RAX | 0x000000 | <--- values go in the slots +----------+---------------+ | RDI | 0x000023 | +----------+---------------+ | RBP | 0x000000 | +----------+---------------+ | R9 | 0x000000 | +----------+---------------+ | R10 | 0x000000 | +----------+---------------+ | ... | ... | +----------+---------------+
There are also a number of smaller registers, often call flags, which can really only contain a one or zero, to indicate that they are "on" or "off." They often have names like ZF
, OF
, and so on:
+------+--------+ | Flag | On/off | +------+--------+ | ZF | 0x0 | <--- Flag is off +------+--------+ | OF | 0x1 | <--- Flag is on +------+--------+ | AF | 0x0 | +------+--------+ | ... | ... | +------+--------+
Finally, there are regions of memory you can stash values in too. Memory is basically just another array of slots you can put bytes in, and each slot has an address.
+----------+-------+ | Address | Slot | +----------+-------+ | 0x000000 | 0x00 | <--- bytes go in the slots +----------+-------+ | 0x000001 | 0x23 | +----------+-------+ | 0x000002 | 0xaf | +----------+-------+ | 0x000003 | 0xa2 | +----------+-------+ | ... | ... | +----------+-------+
The machine doesn't let you do very much with all of these slots. Basically, you can put values in and out of them, you can compare values in them, and you can do very basic arithmetic on them.
The addresses and values used by the machine are always binary numbers, i.e., sequences of ones and zeros. These are called bitvectors (or sometimes just "binary numbers").
There is also a GOTO instruction, which tells the machine to jump to a different instruction in your program. So this is all really just simple GOTO-style programming, with a bunch of slots to put your (binary) values in.
One of the problems with binary analysis is that every computer architecture is different. The slots are laid out differently, and the machine instructions take different forms. For this reason, programs compiled for, say, x86_64
look different than those compiled for ARM
.
BAP is a big help here. It takes most of these variations in machine instructions, and it lifts them into a unified and simpler form of machine code, which it calls BIR (short for the "BAP Intermediate Representation"). We'll just call it the "IR" for short.
As was noted already, when code written in, say, C is compiled down to machine code, everything gets converted into simple machine instructions which basically shuffle values in and out of the various slots.
Of course, functions in C code take arguments, and they return values. At the level of machine code, these arguments are handled by putting them in slots, and the return values are handled by putting them into slots too.
In x86_64
programs (which is what we'll be looking at), the first argument to a function is always placed in a register called RDI
, and the function's return value is always placed in a register called RAX
.
For example, suppose we have a function in C that returns the value 7
:
int foo(int x) { ... return 7; }
In BAP's IR, the return value would be handled like this:
00000436: subroutine foo... <-- Start of the function foo 00000443: ... 00000450: RAX := 0x07 <-- Put the result in RAX to return 00000458: return... <-- Return to the caller
The numbers at the start of each line are unique identifiers, one for each line. The instructions come after those numbers, to the right. Instruction identifiers in this tutorial may differ from the ones you see when running CBAT interactively. That's okay; different versions or runs of CBAT can produce different numberings.
You can see here that the value 7
is placed in the register RAX
, and then the function returns. This is how the function returns 7. Return values are always placed in RAX
.
Similarly, the argument to a function always goes in a register called RDI
. For example, to call the above foo
function with 3
as the parameter, that would look something like this in BAP's IR:
00000467: RDI := 0x03 <-- Put the argument in RDI 00000476: call foo <-- Call the function
You can see here that the value 3
is placed in the register RDI
, and then the function foo
is called. This is how functions are called with an argument. The argument always goes in RDI
.
(If your function takes more than one argument, the second argument always goes in a register called RSI
, and there are other established slots for the third argument, the fourth argument, and so on).
The important things to remember in our examples is just that the argument to a function always goes in RDI
, and the return value always goes in RAX
.
To make the examples in this tutorial easy to follow, we have provided C code for each program. However, remember that CBAT actually operates on the machine code — no source code is necessary to use CBAT. On occasion, we will pay attention to the machine code, particularly the argument register (RDI
) and the return value register (RAX
).
BAP itself is a framework for analyzing binary programs. It has a command line interface, which you can extend with your own plugins. What BAP does is it takes a binary program, and lets you run a pass over the program, during which time your plugin can dig in and analyze the internals of that program.
Our CBAT tools are implemented primarily as BAP plugins. The particular plugin that we will focus on here is called wp
. If you are curious what that stands for, it is short for "weakest precondition," which is the technical name in the literature for the kind of analysis that wp
performs.
The wp
plugin has two modes of operation. In the first mode, it can make a pass over a single binary program, and analyze a function in that program. In the second mode, it can make a pass over two programs, and compare a function in them. Here we cover the first mode — the second mode will be covered when we reach the first example that uses it.
To analyze a single program, the command you want to invoke has this basic form:
$ bap wp \ <-- The command is wp --func=FUNC \ <-- The function to examine [options] \ <-- Any extra options /path/to/exe <-- The path to a binary program
This will tell CBAT to go into the program at /path/to/exe
, and analyze the function FUNC
that occurs in that program.
There are various [options]
you can specify, to tell CBAT to do different things as it analyzes your program. We will illustrate the most important options throughout the following tutorial.
The wp
tool is quite general. You can tell it to verify that your functions have certain properties, and you can specify your own custom properties that wp
should check for.
As a convenience, we have provided some predefined properties, which are built-in to the tool. One of them is this: wp
can check if it is possible to trigger an assert in a function in your program.
Consider the following simple C program (which lives at tutorial/01/binary/main.c):
#include <assert.h> int main(int argc, char** argv) { if (argc == 0xdeadbeef) { assert(0); } return 0; }
Here we have a main
function, which takes an argument, argc
(we can ignore argv
).
argc
is dead beef (see how it's 0xdeadbeef
?), then the program trips an assert, which causes it to halt.0
, which means success, and it exits cleanly.Visually, here's the control flow:
------------ | main | ------------ | -------------------------- | | argc == 0xdeadbeef otherwise | | V V ---------------- --------------- | assert(0) | | return 0 | ---------------- --------------- | | V V Program Program fails exits cleanly
As you can see from this diagram, if argc
is 0xdeadbeef
, then the program travels down the left branch, where an assert is tripped, and the program fails. Otherwise, it goes down the right branch, where the program returns 0
, and exits cleanly.
Now, this is a C program, and as we noted earlier, wp
does not analyze the C code itself. Rather, wp
analyzes the compiled version of this program. We started with the C code here, just because it is easy to see what is going on. But the real program is what you get when you compile it.
Let's see what the machine instructions look like just for the branching part of the if
statement in our example program. We can ask BAP to show us the machine instructions as it sees them, like this:
$ bap 01/binary/main -d --print-symbol=main
You don't need to do that right now. But if you did, you could look through the output, until you find the relevant instructions, which should look something like this:
00000481: when #5 goto %0000047b <-------- When #5 is 1, go here ---+ 00000879: goto %000006f9 <------------ Otherwise, go here --+ | | | 000006f9: ... <--------------------------------------------+ | 00000722: call @__assert_fail ... <--- trip the assert | | 0000047b: ... <-----------------------------------------------------+ 000004aa: call #52 with noreturn <--- exit cleanly
At the first instruction printed here (numbered 00000481
), you can see that the machine is going to jump to instruction 0000047b
when the value of a virtual variable called #5
is true (i.e., 1
). You can think of a virtual variable as a temporary variable at an unspecified location.
So if we suppose that #5
is 1
, we can see what happens next, by following the machine down to the instruction at 000047b
. There, we can see that the program exits cleanly.
Alternatively, if #5
is not 1
, then the machine doesn't jump at 00000481
, and instead moves down to the next instruction, at 0000879
. What happens there? The machine jumps directly to instruction 0000006f9
, and if we look there, we can see that it trips the assert we are looking for.
Even though these are machine instructions and not C code, there is still a clear control flow here, similar to what we saw for the C code:
------------ | main | ------------ | 000000481 | ---------------------------- | | ~ #5 #5 | | V V ---------------- -------------------- | 00000722: | | 000004aa: | | assert_fail | | call ... (exit) | ---------------- -------------------- | | V V Program Program fails exits clean
Of course, just because there is a branch in the code where an assert is tripped, that doesn't mean the program can actually travel down that branch. Programs can have "dead" branches, which are impossible to travel down:
if true: do_something() else: never_called() # Dead code, never called
What we want to know in our example is whether there is any way that the main
function can actually get to the branch that trips the assert.
We can ask wp
to find out if this is possible. To do that, invoke wp
like this:
$ bap wp \ --func=main \ --trip-asserts \ 01/binary/main
Here we ask wp
to try to trip an assert inside the main
function of the program at 01/binary/main
. When we run this command, the relevant output comes at the end, which looks like this:
SAT! Model: ZF |-> 0x0 SF |-> 0x0 RSP |-> 0x000000003f800081 RSI |-> 0x0000000000000000 RDX |-> 0x0000000000000000 RDI |-> 0x00000000deadbeef RCX |-> 0x0000000000000000 RBP |-> 0x0000000000000000 RAX |-> 0x0000000000000000 R9 |-> 0x0000000000000000 R8 |-> 0x0000000000000000 PF |-> 0x0 OF |-> 0x0 CF |-> 0x0 AF |-> 0x0 mem_orig |-> [ else |-> 0x00] mem_mod = mem_orig
The first thing it says is SAT!
, which means that CBAT was able to find a way to trip the assert.
Next, it says Model
, after which it lists some information. There, you can see a bunch of registers on the left, and a bunch of values on the right (and a mention of memory at the bottom, which we can ignore for now).
What this tells us is how to trip the assert. The idea here is that if we set the registers (and memory) to these particular values at the start of the main
function, then main
will trip the assert.
In this particular example, we can see that most of the registers are set to zero. The one that is interesting here is RDI
. Remember that RDI
is where the argument to the function goes. And here, wp
tells us that this register should be set to 0xdeadbeef
. That is to say, wp
is telling us that if we set RDI
to 0xdeadbeef
, then our function main
will trip the assert. And that of course makes sense, given the control flow that we examined in this function.
It can be instructive to step through the machine instructions, and actually see the function exhibit this behavior.
One of the tools in the CBAT family is an interactive debugger that lets you do just that: step through a program, one machine instruction at a time. The tool is called bildb
(short for "BAP Intermediate Language DeBugger"), and the command to invoke it looks like this:
$ bap 01/binary/main \ --pass=run \ --run-entry-point=main \ --bildb-debug
That will start up the debugger at the main
function of the program at 01/binary/main
.
That will start up bildb
at the main
function in 01/binary/main
. You should see something like this:
BIL Debugger Starting up... Architecture Type: x86_64 Address size: 64 Registers: R10 R11 R12 R13 R14 R15 R8 R9 RAX RBP RBX RCX RDI RDX RSI RSP YMM0 YMM1 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7 YMM8 YMM9 Entering subroutine: [%00000868] main 0000088d: main_argc :: in u32 = RDI 0000088e: main_argv :: in out u64 = RSI 0000088f: main_result :: out u32 = RAX Entering block %00000413 0000041a: #46 := RBP >>> (h for help)
You can see that at startup bildb
prints some information about the architecture and its initialization state, then it enters the function (subroutine) main
, where it stops at the first instruction in the first block, namely:
0000041a: #46 := RBP
The instruction's identifying number is on the left of the colon, and the instruction itself is on the right of the colon.
Below that, bildb
gives you a prompt, where you can type a command:
>>> (h for help)
There are various commands you can enter here, but they are fairly straightforward. For example, to see more than just one instruction, type show 5
and hit enter (to see the nearest +/- 5 lines):
>>> (h for help) show 5 %00000413 -> 0000041a: #46 := RBP 0000041d: RSP := RSP - 8 00000420: mem := mem with [RSP, el]:u64 <- #46 00000427: RBP := RSP 00000435: #47 := RSP 00000438: RSP := RSP - 0x10
Another thing you can do is see which values are stored in registers. For instance, to see the value stored in RBP
, type p RBP
(short for "print RPB
") and hit enter
. You should see something like this:
>>> (h for help) p RBP RBP : 0
This tells you that the RBP
register has the value zero stored in it.
Remember how wp
told us that main
will trip the assert if it begins with RDI
set to 0xdeadbeef
? Let's set that value, and see if we can trip the assert.
To set RDI
to 0xdeadbeef
, type set RDI=0xdeadbeef
and hit enter
. You'll see that bildb
sets the value:
>>> (h for help) set RDI=0xdeadbeef RDI : 0xDEADBEEF
Now you should be able to step through this program, instruction by instruction, and you should end up triggering the assert.
To move to the next instruction, hit s
(for "step"), and hit enter
. That moves to the next instruction:
>>> (h for help) s 0000041d: RSP := RSP - 8
You could keep stepping through this program, one instruction at a time, but you can also just set a breakpoint at the instruction you want, and skip ahead to it.
Let's set a breakpoint at the assert, to see if we reach it. Recall the control flow graph we saw before:
------------ | main | ------------ | 000000481 | ---------------------------- | | ~ #5 #5 | | V V ---------------- -------------------- | 00000722: | | 000004aa: | | assert_fail | | call ... (exit) | ---------------- -------------------- | | V V Program Program fails exits clean
You can see that the assert happens at instruction 00000722
, so let's set a breakpoint there. To set a breakpoint, type b %00000722
(short for "breakpoint at instruction 00000722
"):
>>> (h for help) b %00000722 Breakpoint set at %00000722
Now let's see if we get to that breakpoint. To tell the debugger to move forward, type n
(for "next") and hit enter
. The debugger takes us to the next block:
Entering block %000006f9 000006fe: RCX := 0x4005ED
Type n
(and enter
) to skip forward again, and you will see that you hit the breakpoint at 00000722
:
>>> (h for help) n 00000722: call @__assert_fail with return %0000047b
At this point, you have confirmed that the program does indeed trip the assert if RDI is set to 0xdeadbeef
at the beginning of main
.
You could keep stepping through the program, but there is no need at this point. You can just exit the debugger. To quit the debugger, hit q
and then enter
.
wp
up to bildb
Instead of manually setting the values of registers in bildb
to see the behavior that wp
tells you about, you can have wp
dump its output into a YAML file, and then have bildb
read that in, at startup. To have wp
dump its output like this, run the following command:
$ bap wp \ --func=main \ --trip-asserts \ --bildb-output=init.yml \ 01/binary/main
Notice how you simply added the option --bildb-output=init.yml
to the command. That tells wp
to dump the appropriate output into a file called init.yml
.
Once you have that file at hand, you can start bildb
with it, like this:
$ bap 01/binary/main \ --pass=run \ --run-entry-point=main \ --bildb-debug \ --bildb-init=init.yml
Notice how we added the option --bildb-init=init.yml
. When bildb
starts up, it will read init.yml
, and set the registers to the values listed there. From there, you can set the breakpoint at 00000722
and run through the program exactly as before.
As we saw in the last example, wp
can find a way to trip an assert, if it's possible to do so, and it will tell us how to do it. This can be used to solve puzzles of various kinds.
As an example, consider the 8-Queens chess puzzle. The task is this: on a standard 8x8 chess board, place 8 queens on the board in places where none of them can be captured by any of the others in a single move. Queens can move up/down, sideways, and diagonally, so you have to be careful to place the queens out of each other's paths.
For simplicity, let's do a simpler version: the 4-Rooks puzzle. It's just like the 8-Queens puzzle, except we have 4 rooks, and the board is a 4x4 board. Rooks can move up and down, and sideways.
Since this version of the game is less complex, it is somewhat easy to think up a solution. For example, here is one:
+---+---+---+---+ | R | | | | +---+---+---+---+ | | | R | | +---+---+---+---+ | | R | | | +---+---+---+---+ | | | | R | +---+---+---+---+
As you can see, each of these rooks is out of the path of the others, so none can be captured by any of the others in a single move.
Now suppose that you have a function (in C) that checks solutions to the 4-rooks puzzle. Imagine that the function looks something like this:
/* This function checks if solution is correct. */ int check(int solution) { bool correct = true; // Check if solution is correct... if (correct) { assert(0); } return 0; }
This function takes a solution to the 4-rooks game as an argument, it then checks the solution, and if the solution is correct, it trips an assert.
How do we encode a proposed solution, so that we can pass it into this function as an argument? Well, since machine code really only works with binary numbers, we'll need to encode proposed solutions as a bitvector.
How can we do that? There are 16 squares on a 4x4 chess board, so let's say that we'll represent the board with a 16-bit binary number, where each bit represents one of the squares. Let's also say that, for each bit, if it's 1
, that means there is a rook at that position, and if it's 0
, that means there is no rook at that position.
To encode a proposed solution as a 16 bit binary number, we mark each position on the board with a 0
or 1
to indicate whether there is a rook at that position. For example:
+---+---+---+---+ +---+---+---+---+ | R | | | | | 1 | 0 | 0 | 0 | ==> 1000 +---+---+---+---+ +---+---+---+---+ | | | R | | | 0 | 0 | 1 | 0 | ==> 0010 +---+---+---+---+ ==> +---+---+---+---+ | | R | | | | 0 | 1 | 0 | 0 | ==> 0100 +---+---+---+---+ +---+---+---+---+ | | | | R | | 0 | 0 | 0 | 1 | ==> 0001 +---+---+---+---+ +---+---+---+---+
Then we line up those bits, one after another, into one 16-bit binary number:
solution = 1000 0010 0100 0001 (or 0x4281 in hex)
So, the check
function takes a 16-bit number as the argument, it then checks the proposed solution to see if it is correct, and it trips an assert if it is correct.
The full code for this function can be found at tutorial/02/binary/main.c. The compiled version of this program is at 02/binary/main
. You can try it out by passing it a hex version of a possible solution. For example:
$ ./02/binary/main 0x2212 $ ./02/binary/main 0x8421
Since we have a function that trips an assert if the solution is correct, we can ask wp
to try and find a way to trip that assert. If wp
can find a way to trip the assert, it will give us an example of an input to the function that will cause this behavior.
Let's try it. Here is the command. We only want to analyze the check
function, which we specify with --func=check
:
$ bap wp \ --func=check \ --trip-asserts \ 02/binary/main
After a moment, wp
returns a result. At the end of the output, you should see something like this:
SAT! Model: ZF |-> 0x0 SF |-> 0x0 RSP |-> 0x0000000040000000 RSI |-> 0x0000000000000000 RDX |-> 0x0000000000000000 RDI |-> 0x0000000000004281 <-- Argument to function RCX |-> 0x0000000000000000 RBP |-> 0x0000000000000000 RAX |-> 0x0000000000000000 R9 |-> 0x0000000000000000 R8 |-> 0x0000000000000000 PF |-> 0x0 OF |-> 0x0 CF |-> 0x0 AF |-> 0x0 mem_orig |-> [ else |-> 0x00] mem_mod = mem_orig
Here, we can see that wp
did indeed find a way to trip the assert, and it provided an example of how to do it.
Look at the registers listed under Model
. Remember that the first argument to a function is always stored in RDI
, and here wp
tells us that if we set that to 0x4281
, then our function check
will trip the assert. (There's more than one correct solution; your CBAT installation might find a different solution.)
What is 0x4281
? In binary, it is:
0100 0010 1000 0001
If we break that up into a 4x4 matrix, we can see the chess board:
+---+---+---+---+ +---+---+---+---+ 0100 ==> | 0 | 1 | 0 | 0 | | | R | | | +---+---+---+---+ +---+---+---+---+ 0010 ==> | 0 | 0 | 1 | 0 | | | | R | | +---+---+---+---+ ==> +---+---+---+---+ 1000 ==> | 1 | 0 | 0 | 0 | | R | | | | +---+---+---+---+ +---+---+---+---+ 0001 ==> | 0 | 0 | 0 | 1 | | | | | R | +---+---+---+---+ +---+---+---+---+
We can see that this is indeed a solution to the 4-rooks problem. Since none of the rooks are in each other's way, none of them can be captured by any of the others in a single move.
This technique applies to many different kinds of puzzles or problems whose solutions can be encoded as a binary number. For example, solving a sudoku puzzle, reversing a hash, and so on.
Another predefined property that wp
can check is that it can find null dereferences in a function. To illustrate, consider the following C program, which can be found at tutorial/03/binary/main.c:
#includeint main() { // Allocate a byte of memory, at address `addr` char *addr = malloc(sizeof(char)); // Store the character 'z' at that address *addr = 'z'; }
In this program, we first allocate a byte of memory, and then we take the address of the byte that gets allocated and we store it in the pointer called addr
. Next, we attempt to store the character z
in the slot at that address.
Suppose I run this program, and suppose that the byte of memory I ask for here is successfully allocated. Well, then everything works great. I get back an address for the allocated byte, and I can then store the character z
at that address.
But things needn't turn out that way. It's always possible that malloc
won't be able to allocate the memory I've asked for, in which case it can't return an address to me. If that happens, it will return NULL
(i.e., 0
). And then, when I try to store z
at that address . . . well, there is no address, so this will segfault.
We can ask wp
to find null dereferences like this one. To check this particular main
function, we can use the following command (note that we add --check-null-derefs
as a flag):
$ bap wp \ --func=main \ --check-null-derefs \ 03/binary/main
If you run this, you'll see output that looks something like this:
SAT! Model: ZF |-> 0x0 SF |-> 0x0 RSP |-> 0x0000000040000000 RSI |-> 0x0000000000000000 RDX |-> 0x0000000000000000 RDI |-> 0x0000000000000000 RCX |-> 0x0000000000000000 RBP |-> 0x0000000000000000 RAX |-> 0x0000000000000000 R9 |-> 0x0000000000000000 R8 |-> 0x0000000000000000 PF |-> 0x0 OF |-> 0x0 CF |-> 0x0 AF |-> 0x0 mem_orig |-> [ else |-> 0x00] mem_mod = mem_orig malloc_ret_RAX016 |-> 0x0000000000000000
First, it says SAT!
, indicating that it did in fact find a way to trigger the null dereference. Second, it gives us a Model
, which shows us how to do it. Notice in particular the very last line:
malloc_ret_RAX016 |-> 0x0000000000000000
This is tells us that if malloc
returns 0
, then we'll end up with a null dereference. And that's exactly right. As we saw, if malloc
returns NULL
(i.e., 0
, which indicates that no address could be allocated), then trying to store a character there will segfault.
Note that we could rewrite the program at tutorial/03/binary/main.c so that it checks for the NULL
memory address before trying to store something there. There is a corrected version at tutorial/03/binary/main_fixed.c:
#includeint main() { // Allocate a byte of memory, at address `addr` char *addr = malloc(sizeof(char)); // Don't proceed if we got no address if (addr == NULL) { return 0; } // Store the character 'z' at that address *addr = 'z'; }
Let's have wp
check this version of the function. The compiled program lives at 03/binary/main_fixed
. So:
$ bap wp \ --func=main \ --check-null-derefs \ 03/binary/main_fixed
Now wp
gives back output that looks like this:
Evaluating precondition. Checking precondition with Z3. UNSAT!
This time, wp
says UNSAT
, which is the opposite of SAT
. In other words, wp
has found that there are no null dereferences in main
. And that is correct. There is indeed no possible way to dereference a null pointer in this version of the program, which wp
confirms.
In the previous examples, we have been using wp
to examine a function in a single binary program, but as mentioned earlier, wp
can compare how a function behaves in two different programs, to see if they behave the same.
To analyze two programs, the command you want to invoke has this basic form:
bap wp \ <-- The command is wp --func=FUNC \ <-- The function to examine in both programs [options] \ <-- Any extra options /path/to/exe1 \ <-- The path to the first binary /path/to/exe2 <-- The path to the second binary
One of the predefined comparative properties that wp
can verify is the following: wp
can look at a function that occurs in two programs, and it can figure out if those functions can produce different outputs.
This can be very useful if, say, you optimize a function. You can ask wp
to compare the old unoptimized version with the new optimized version, and tell you if it's possible for the two functions to give you different outputs.
Suppose we are writing a C program, and we need to know if two integers have the same sign (positive or negative). Here is a very straightforward way to write a function to check (you can find this code in tutorial/04/binary/main_1.c:
bool same_signs(int x, int y) { // When x is negative if(x < 0){ // They have the same sign if x is negative too if (y < 0) { return true; } else { return false; } // When x is positive } else{ // They have the same sign if y is positive too if (y >= 0) { return true; } else { return false; } } }
First, we check if x
is less than zero. If it is, we then check if y
is also less than zero. If so, they have the same sign. Alternatively, if x
is greater than zero, then we check if y
is also greater than zero, and again, if so, then they have the same sign.
This code is fine, but it has a lot of if
statements, so it branches a lot, and hence is not the fastest code on the planet. Suppose we want to optimize this, and we ask a clever junior engineer write a new version. Suppose they come up with the following (this version can be found in tutorial/04/binary/main_2.c:
bool same_signs(int x, int y) { return !((x ^ y) < 0); }
This looks promising. It has none of the branches, and it uses XOR (the caret ^
) as a bit trick.
We want to know: is this optimized version really the same as the earlier version. That is to say, does this optimized version really produce the same outputs as the first version?
We can ask wp
to compare these two functions, and tell us if there is any possible way for the second one to produce a different output than the first one (when given the same input).
Remember that the output of a function is always placed in the RAX
register, so what we really want to know is if there is any way for these two functions to put different values in RAX, given the same inputs. And this is something we can ask wp
to check for us.
This is our first example of using wp
to compare two programs. You can ask wp
to check if both versions of the same_signs
function produce the same output in RAX
. Here is the command:
$ bap wp \ --func=same_signs \ --compare-post-reg-values=RAX \ 04/binary/main_1 \ 04/binary/main_2
Note the parameter --compare-post-reg-values=RAX
in this command. That tells wp
to compare the value in RAX
after the functions finish execution.
When wp
finishes its check, it prints out the following:
Evaluating precondition. Checking precondition with Z3. UNSAT!
What this means is that wp
was not able to find a way to make these two function produce different outputs. And remember, wp
does a logical check, so this is telling us that (up to our simplifying assumptions) it is logically impossible for these two functions to produce different outputs.
That's good news. This tells us that the optimized version of the function is indeed equivalent to the original, in the sense that it will always produce exactly the same outputs as the original.
Another predefined property that wp
can check for when comparing two functions is this: it can check that both versions of the functions make the same function calls in the course of their execution.
Let's illustrate this with an example. Imagine an old program that has been running on a submarine for some time now, which processes signals from the user, and dispatches the task to the appropriate handlers.
There are various signals it can receive:
SURFACE
, which tells the computer to surface the submarine.NAV
, which tells the computer to alter the navigation course of the submarine.DEPLOY
, which tells the computer to deploy a payload (e.g., fire a missile).LOG
, which tells the computer to log the current status of the system.These various signals are encoded in an enum
:
/* Different types of signals */ typedef enum {SURFACE, NAV, LOG, DEPLOY} signal_t;
There are handlers for each one:
/* Stubbed handler for the SURFACE signal */ int surface() { int status_code = 1; // Handle surfacing... return status_code; } /* Stubbed handler for the NAV signal */ int alter_course() { int status_code = 2; // Handle navigation... return status_code; } /* Stubbed handler for the LOG signal */ int log_system_status() { int status_code = 3; // Log the system's status... return status_code; } /* Stubbed handler for the DEPLOY signal */ int deploy_payload() { int status_code = 4; // Deploy the payload... return status_code; }
To process the signal, there is a switch statement:
/* Process a signal, and dispatch to an appropriate handler */ int process_signal(signal_t signal) { int status_code = 0; switch (signal) { case SURFACE: status_code = surface(); break; case NAV: status_code = alter_course(); case LOG: status_code = log_system_status(); break; case DEPLOY: status_code = deploy_payload(); break; } return status_code; }
The switch statement here simply dispatches the task to the appropriate handler, depending on which signal it receives, and then it returns the resulting status code.
But notice that there is no break
statement in the NAV
case. The programmer wanted to have a log happen after every NAV
signal, so they cleverly left off the break
here. That causes execution to fall through from the NAV
case to the LOG
case. Hence, every time this program handles a NAV
signal, it does that, but then it falls through to LOG
, and so it also logs the system status.
The full code for this example can be found at tutorial/05/binary/main_1.c.
Suppose now that, many years later, we have decided that we do not need to have logging in this program anymore, so we apply an automated patch that strips the logging code out.
The patch removes the LOG
from the enum, so now we just have SURFACE
, NAV
, and DEPLOY
:
/* Different types of signals */ typedef enum {SURFACE, NAV, DEPLOY} signal_t;
It also removes the log_status()
handler, so now we just have surface()
, alter_course()
, and deploy_payload()
:
/* Stubbed handler for the SURFACE signal */ int surface() { int status_code = 1; // Handle surfacing... return status_code; } /* Stubbed handler for the NAV signal */ int alter_course() { int status_code = 2; // Handle navigation... return status_code; } /* Stubbed handler for the DEPLOY signal */ int deploy_payload() { int status_code = 4; // Deploy the payload... return status_code; }
And the patch removes the LOG
case from the switch statement, so now we only have the cases for SURFACE
, NAV
, and DEPLOY
:
/* Process a signal, and dispatch to an appropriate handler */ int process_signal(signal_t signal) { int status_code = 0; switch (signal) { case SURFACE: status_code = surface(); break; case NAV: status_code = alter_course(); case DEPLOY: status_code = deploy_payload(); break; } return status_code; }
Of course, stripping out that LOG
case introduced a bug. Before the patch, every time a NAV
signal was processed, the NAV
case would fall through to the LOG
case, and log the system's status. But now, in the patched version, there is no LOG
case, so every time a NAV
signal gets processed, it falls right on through to the DEPLOY
case, and deploys the payload!
The full code for the patched version can be found at tutorial/05/binary/main_2.c.
We can ask wp
to analyze both versions of the process_signal
function, and check that they call the same functions. Then wp
will find out if there is any way that the original and patched versions differ with respect to function calls.
We can ask wp
to compare both binaries and find out if both versions of the process_signal
functions differ, with respect to the function calls they make. Here is the command:
$ bap wp \ --func=process_signal \ --compare-func-calls \ 05/binary/main_1 \ 05/binary/main_2
Notice that we added the flag --compare-func-calls
. This tells wp
to check the function calls in both versions of the process_signal
function.
The output looks something like this:
SAT! Model: ZF |-> 0x0 SF |-> 0x0 RSP |-> 0x000000003f800084 RSI |-> 0x0000000000000000 RDX |-> 0x0000000000000000 RDI |-> 0x0000000000000001 RCX |-> 0x0000000000000000 RBP |-> 0x0000000000000000 RAX |-> 0x0000000000000000 R9 |-> 0x0000000000000000 R8 |-> 0x0000000000000000 PF |-> 0x0 OF |-> 0x0 CF |-> 0x0 AF |-> 0x0 mem_orig |-> [ else |-> 0x00] mem_mod = mem_orig
This says SAT
, which means wp
did indeed find a way to make the older and patched versions of these functions make different function calls.
Note the Model
, which tells us that this violation occurs when RDI
is set to 0x01
. So if the value 0x01
is stored in RDI
when these functions start, then the second version of the function will call fewer functions than the first version.
Given the code that we looked at above, we can see that this is right. Remember that RDI
always holds the first argument to a function, which in the case of process_signals
is the signal:
/* Process a signal, and dispatch to an appropriate handler */ int process_signal(signal_t signal) { <---------- RDI holds the signal ... }
What is the value 0x01
? Well, a signal is an enum
, which we can think of as an indexed array:
/* Different types of signals */ typedef enum {SURFACE, NAV, LOG, DEPLOY} signal_t; <-- items indexed 0, 1, 2, 3 or typedef enum {SURFACE, NAV, DEPLOY} signal_t; <-- items indexed 0, 1, 2
Hence, index 0x00
refers to the first option (SURFACE
), 0x01
refers to the second option (NAV
), and so on.
With that in mind, we can make sense of what wp
is telling us here. It's telling us that if we call process_signal
with the NAV
signal (i.e., with RDI
holding the index 0x01
), then the functions will behave differently, in the sense that they will make different functions calls.
Which function calls do they differ on, exactly? We can find out by asking wp
to print out its refuted goals. To do that, we add the parameter --show=refuted-goals
when we run wp
, like this:
$ bap wp \ --func=process_signal \ --compare-func-calls \ --show=refuted-goals \ 05/binary/main_1 \ 05/binary/main_2
Then wp
prints output that looks something like this:
SAT! Model: ZF |-> 0x0 SF |-> 0x0 RSP |-> 0x000000003f800084 RSI |-> 0x0000000000000000 RDX |-> 0x0000000000000000 RDI |-> 0x0000000000000001 RCX |-> 0x0000000000000000 RBP |-> 0x0000000000000000 RAX |-> 0x0000000000000000 R9 |-> 0x0000000000000000 R8 |-> 0x0000000000000000 PF |-> 0x0 OF |-> 0x0 CF |-> 0x0 AF |-> 0x0 mem_orig |-> [ else |-> 0x00] mem_mod = mem_orig Refuted goals: deploy_payload not called in modified: Z3 Expression: false
Notice at the bottom:
Refuted goals: deploy_payload not called in modified
What this says is deploy_payload
is called in the modified version of the function and not in the original version, when the function is called with the argument NAV
(i.e., when RDI
is 0x01
at the start of the function).
(Why the double negation in "Refuted goals: deploy_payload not called in modified"? This is because wp
proves what it proves by assuming that the same functions are not called, and then it tries to falsify that assumption.)
In practice, it is difficult for humans to catch these kinds of bugs, especially when the patches are applied automatically, and to many different portions of a program all at once. wp
can be an important help here.
So far we have been looking at predefined properties that are built in to wp
. Let's turn now to defining some custom properties.
To define a custom property of a function in your program, you basically describe what you want the state of your program to look like, after the function finishes execution. Then wp
will check if that description holds true of your program. If wp
can find a way to make it false, it will find it, and provide you with some example inputs that will make your function exhibit that behavior.
Let's do a simple example. Consider this toy C function (which can be found at tutorial/06/binary/main.c:
int main() { // Return my lucky number. return 7; }
As a simple example of a custom property, let us say that this function should always produce 7
for its output. More exactly, let us say that when this function terminates, the value in RAX
will always be 7
(since RAX
is where the output of a function is always placed).
To encode this property, wp
can parse SMTLIB
expressions. SMTLIB
is a special lisp-like language that was designed explicitly for stating custom properties like this.
To encode our property as an SMTLIB
expression, we first need to know the name of the register, which of course we do know:
RAX
Then, we need to know the value that we want to say should be in it. The registers for x86_64
programs hold 64-bit binary numbers, so we want to encode 7 as a 64-bit number. For SMTLIB
, we simply make it a 64-bit hex number, like this:
#x0000000000000007
Next, we want to say that these are equal, which we write like this:
(= RAX #x0000000000000007)
Notice that this is lisp-like. We put the equal sign up front, then we list the two arguments after it, and then we wrap the whole thing in parentheses.
Finally, we want to assert that this holds, so we wrap the whole thing in an assert
, like this:
(assert (= RAX #x0000000000000007))
That is our complete SMTLIB
expression. It asserts that the value in RAX
is the 64-bit number 7
.
Now we can tell wp
to check that this holds for our function. To do that, we ask wp
to analyze our main
function just as we have done before, but we will specify our custom property with the following parameter:
--postcond='(assert (= RAX #x0000000000000007))'
This tells wp
to check that the specified property holds after the function executes. Note that we enclose the SMTLIB
expression in single quotes. This is just to force bash to treat it as a literal string and not perform any shell expansion.
Here is the full command to run:
$ bap wp \ --func=main \ --show=refuted-goals \ --postcond='(assert (= RAX #x0000000000000007))' \ 06/binary/main
When wp
runs this, it will try to falsify this property. That is, it will explore all logical possibilities (up to our simplifying assumptions), and find a way to make our main
function put something other than 7
in RAX
.
wp
can find a way to falsify our property, it will return SAT
and provide an example of how to make main
produce a value other than 7
.UNSAT
, and that means the property holds. It means that RAX
does indeed always contain the 64-bit number 7 at the end of our main
function's execution.When you run the above command, wp
produces output like this:
Evaluating precondition. Checking precondition with Z3. UNSAT!
So, in this case, wp
could not find a way to falsify this property, and hence the property holds. And indeed, that makes sense, because as we can see from the code, our function always returns the number 7
.
Let's have wp
check a different property, one that we know is false. For example, let's have wp
check whether RAX
always ends up with the value 3
in it. To express this property, we would write an SMTLIB
expression just like the previous one, except we'll use the number 3
instead of 7
:
(assert (= RAX #x0000000000000003))
Now have wp
check this:
$ bap wp \ --func=main \ --show=refuted-goals \ --postcond='(assert (= RAX #x0000000000000003))' \ 06/binary/main
This time, wp
outputs something that looks like this:
SAT! Model: RSP |-> 0x000000003f800081 RSI |-> 0x0000000000000000 RDX |-> 0x0000000000000000 RDI |-> 0x0000000000000000 RCX |-> 0x0000000000000000 RBP |-> 0x0000000000000000 RAX |-> 0x0000000000000000 R9 |-> 0x0000000000000000 R8 |-> 0x0000000000000000 mem_orig |-> [ else |-> 0x00] mem_mod = mem_orig Refuted goals: (= RAX0 #x0000000000000003): Concrete values: = #x0000000000000007 #x0000000000000003 Z3 Expression: = #x0000000000000007 #x0000000000000003
This time, wp
comes back with SAT
, meaning that it could find a way to falsify our assertion, and it shows us how to do it. In the model
, most of the values are zero, which makes sense. Since our main
function always returns 7
, the registers can start with pretty much any values (or no values), and main
will not produce 3
in RAX
.
Consider the following C program (the source code can be seen at tutorial/07/binary/main.c):
int main(int argc, char **argv) { if (argc < 10) { return 7; } else { return 254; } }
In this program, you can see that if the number of arguments (argc
) is less than 10
, the main
function returns 7
. Otherwise it returns 254
.
Suppose we want to assert that at the end of this function RAX
will always produce 7
. We could have wp
check that, just as we did before:
$ bap wp \ --func=main \ --show=refuted-goals \ --postcond='(assert (= RAX #x0000000000000007))' \ 07/binary/main
When you run that, you'll see that wp
produces output that looks something like this:
SAT! Model: ZF |-> 0x0 SF |-> 0x0 RSP |-> 0x000000003f800081 RSI |-> 0x0000000000000000 RDX |-> 0x0000000000000000 RDI |-> 0x0000000000400000 RCX |-> 0x0000000000000000 RBP |-> 0x0000000000000000 RAX |-> 0x0000000000000000 R9 |-> 0x0000000000000000 R8 |-> 0x0000000000000000 PF |-> 0x0 OF |-> 0x0 CF |-> 0x0 AF |-> 0x0 mem_orig |-> [ else |-> 0x00] mem_mod = mem_orig Refuted goals: (= RAX0 #x0000000000000007): Concrete values: = #x00000000000000fe #x0000000000000007 Z3 Expression: = #x00000000000000fe #x0000000000000007
Here, wp
tells us that it can falsify our assertion, and it provides an example of how to do it. Notice the RDI
register. If it is set to 0x400000
, then our main
function will put something other than 7
in RAX
. (Your CBAT installation might produce a different example.)
Of course, that makes sense, since we can see in the C code that if the number of arguments is 10
or more, it'll return 254
, and 0x400000
is certainly bigger than 10
.
Let's make our custom property a little more robust. Let's suppose that we want to assert that main
will put 7
in RAX
, given some particular input argument. For example, let's say that, if the input argument is 5
, then RAX
will be 7
.
We can do that by asserting a precondition for the function. In this case, we want the precondition to be that the argument to our main
function, which is stored in the RDI
register, is 5
. Here is how we would express that in SMTLIB
:
(assert (= RDI #x0000000000000005))
And now we can add this as a precondition to our check, using the --precond
parameter:
$ bap wp \ --func=main \ --show=refuted-goals \ --precond='(assert (= RDI #x0000000000000005))' \ --postcond='(assert (= RAX #x0000000000000007))' \ 07/binary/main
Now wp
will check that the postcondition holds, given that the precondition holds. In this case, it will check that, if RDI
is 5
at the start of the function, then RAX
will be 7
at the end of the function.
When you run this, wp
will output something like this:
Evaluating precondition. Checking precondition with Z3. UNSAT!
In other words, wp
was not able to find a way to falsify our assertions, and hence our assertions hold.
In the last two examples, we looked at specifying custom properties that wp
can check for a single binary program. We can also specify custom properties that wp
can check to compare two binary programs.
To illustrate, consider the same_signs
function we looked at before. Here is the first version of the function (which can be seen at tutorial/08/binary/main_1.c:
bool same_signs(int x, int y) { // When x is negative if(x < 0){ // They have the same sign if x is negative too if (y < 0) { return true; } else { return false; } // When x is positive } else{ // They have the same sign if y is positive too if (y >= 0) { return true; } else { return false; } } }
And here's the optimized version (which can be seen at tutorial/08/binary/main_2.c:
bool same_signs(int x, int y) { return !((x ^ y) < 0); }
Before we asked wp
to check that these two functions produce the same output in RAX
, using the --compare-post-reg-values=RAX
parameter. But we can specify this same thing as a custom property.
What we want to say is that, when the function same_signs
finishes executing, the values in RAX
are exactly the same. More exactly, we want to say that at the end of the original function's execution, RAX
is the same as it is at the end of the modified function's execution.
To explicitly refer to a register in the original function, we can append _orig
to the register name, like this:
RAX_orig
Similarly, to explicitly refer to a register in the modified version of the function, we can append _mod
, like this:
RAX_mod
Then, we can formulate an SMTLIB
expression asserting that the two are equal:
(assert (= RAX_orig RAX_mod))
To have wp
check this, we call wp
in comparison mode, with our custom postcondition:
$ bap wp \ --func=same_signs \ --show=refuted-goals \ --postcond='(assert (= RAX_orig RAX_mod))' \ 08/binary/main_1 \ 08/binary/main_2
When you run this, wp
outputs the following:
Evaluating precondition. Checking precondition with Z3. UNSAT!
In other words, wp
could not find a way to falsify our assertion. Hence, the property we've asserted here holds. RAX
in the original and the modified program will indeed always be equal.
This tutorial has introduced many command line options for CBAT. A complete reference for the command line interface, including many options not described int the tutorial, can be found here. This page also includes a guide to common smtlib expressions for use in your custom properties.
As we noted at the outset, CBAT is a family of tools designed to verify the behavior of binary programs.
At the heart of the CBAT toolset is wp
. As we have seen, wp
is a tool that verifies whether functions behave in specified ways. wp
can be used to verify the behavior of a function in a single binary program, but it can also be used to verify the behavior of a function in two binaries.
Because of its ability to compare two programs, wp
can be used to check patches and modifications that you might make to your binary programs. Patches are meant to change some aspects of a program, while preserving certain other behaviors. With wp
, you can check whether the patched version of your program preserves the desired behavior that is present in the original version.