CBAT Tutorial

Table of contents:

For this tutorial, make sure you are in the CBAT Docker container. Instructions for installing the container can be found here.

All commands for this tutorial should be run from this directory:

  /path/to/cbat_tools/docs/tutorial

Overview

Let us begin by providing a brief overview of what CBAT is, and what it does.

What does CBAT analyze?

CBAT is a tool that helps you analyze the behavior of binary executables (i.e., compiled programs). So it does not analyze the source code you have written in C, or Rust, or what have you. Rather, CBAT analyzes the assembly-level machine instructions that the computer executes directly when it runs your compiled program.

This is useful, because the executable is the real program. High level programming languages offer lots of different features, but at the end of the day, it all gets compiled down to the same basic machine instructions, and that is what CBAT analyzes.

What does CBAT do?

At its heart, CBAT checks whether a function in your program (or programs) behaves in a certain way. In other words, it can verify that a function in your program has certain properties.

To illustrate the point, imagine the following. Suppose there is a function foo in your program, and you want to know if it always produces the number 10 as its output. You can ask CBAT to check this for you.

CBAT will explore all logical possibilities, and find if there is any possible way that foo could produce anything other than 10. If it finds a way, it will give you an example of inputs you can feed into foo to make it do what you didn't expect.

It is worth emphasizing that CBAT does not perform a probabilistic or fuzzy sort of check here. CBAT performs a mathematical check of all logical possibilities.

We do make a few simplifying assumptions for the sake of performance, but for the most part, this means that (up to our assumptions) if there is any way to make foo produce anything other than 10, CBAT will find it. By contrast, if CBAT cannot find a way, then CBAT has essentially found a logical proof that foo always produces 10.

Comparing programs

CBAT can analyze functions in a single binary program, but it can also compare functions in two different programs, and check whether their behavior is the same.

This is useful for checking whether a newer, modified version of a program still works the same way as an older version.

For example, suppose you make some improvements to foo, and you want to know whether your improvements have broken anything. CBAT can analyze both your original and your improved program, and make sure that the modified version of foo always produces the same output as the original version of foo.

Binary Programs

CBAT is a family of command line tools built on top of Carnegie Mellon's Binary Analysis Platform (BAP). CBAT and BAP do not analyse high-level code in a source language like C or Rust. Rather, they analyze the compiled, "binary" version of such programs.

When you compile (say) a C program or a Rust program, the compiler converts your source code into machine code, which the computer can execute directly.

Machine code is pretty simple. A machine has a bunch of registers that you can stash values in, often with well-known names like RAX, RDI, R9, and so on:

+----------+---------------+
| Register | Slot          |
+----------+---------------+
| RAX      | 0x000000      | <--- values go in the slots
+----------+---------------+
| RDI      | 0x000023      |
+----------+---------------+
| RBP      | 0x000000      |
+----------+---------------+
| R9       | 0x000000      |
+----------+---------------+
| R10      | 0x000000      |
+----------+---------------+
| ...      | ...           |
+----------+---------------+

There are also a number of smaller registers, often call flags, which can really only contain a one or zero, to indicate that they are "on" or "off." They often have names like ZF, OF, and so on:

+------+--------+
| Flag | On/off |
+------+--------+
| ZF   | 0x0    | <--- Flag is off
+------+--------+
| OF   | 0x1    | <--- Flag is on
+------+--------+
| AF   | 0x0    |
+------+--------+
| ...  | ...    |
+------+--------+

Finally, there are regions of memory you can stash values in too. Memory is basically just another array of slots you can put bytes in, and each slot has an address.

+----------+-------+
| Address  | Slot  |
+----------+-------+
| 0x000000 | 0x00  | <--- bytes go in the slots
+----------+-------+
| 0x000001 | 0x23  |
+----------+-------+
| 0x000002 | 0xaf  |
+----------+-------+
| 0x000003 | 0xa2  |
+----------+-------+
| ...      | ...   |
+----------+-------+

The machine doesn't let you do very much with all of these slots. Basically, you can put values in and out of them, you can compare values in them, and you can do very basic arithmetic on them.

The addresses and values used by the machine are always binary numbers, i.e., sequences of ones and zeros. These are called bitvectors (or sometimes just "binary numbers").

There is also a GOTO instruction, which tells the machine to jump to a different instruction in your program. So this is all really just simple GOTO-style programming, with a bunch of slots to put your (binary) values in.

One of the problems with binary analysis is that every computer architecture is different. The slots are laid out differently, and the machine instructions take different forms. For this reason, programs compiled for, say, x86_64 look different than those compiled for ARM.

BAP is a big help here. It takes most of these variations in machine instructions, and it lifts them into a unified and simpler form of machine code, which it calls BIR (short for the "BAP Intermediate Representation"). We'll just call it the "IR" for short.

Function arguments and return values

As was noted already, when code written in, say, C is compiled down to machine code, everything gets converted into simple machine instructions which basically shuffle values in and out of the various slots.

Of course, functions in C code take arguments, and they return values. At the level of machine code, these arguments are handled by putting them in slots, and the return values are handled by putting them into slots too.

In x86_64 programs (which is what we'll be looking at), the first argument to a function is always placed in a register called RDI, and the function's return value is always placed in a register called RAX.

For example, suppose we have a function in C that returns the value 7:

int foo(int x) {
  ...
    return 7;
}

In BAP's IR, the return value would be handled like this:

00000436: subroutine foo... <-- Start of the function foo
00000443: ...
00000450: RAX := 0x07       <-- Put the result in RAX to return 
00000458: return...         <-- Return to the caller

The numbers at the start of each line are unique identifiers, one for each line. The instructions come after those numbers, to the right.

You can see here that the value 7 is placed in the register RAX, and then the function returns. This is how the function returns 7. Return values are always placed in RAX.

Similarly, the argument to a function always goes in a register called RDI. For example, to call the above foo function with 3 as the parameter, that would look something like this in BAP's IR:

00000467: RDI := 0x03        <-- Put the argument in RDI
00000476: call foo           <-- Call the function

You can see here that the value 3 is placed in the register RDI, and then the function foo is called. This is how functions are called with an argument. The argument always goes in RDI.

(If your function takes more than one argument, the second argument always goes in a register called RSI, and there are other established slots for the third argument, the fourth argument, and so on).

The important things to remember in our examples is just that the argument to a function always goes in RDI, and the return value always goes in RAX.

To make the examples in this tutorial easy to follow, we have provided C code for each program. However, remember that CBAT actually operates on the machine code — no source code is necessary to use CBAT. On occasion, we will pay attention to the machine code, particularly the argument register (RDI) and the return value register (RAX).

CBAT Usage

BAP itself is a framework for analyzing binary programs. It has a command line interface, which you can extend with your own plugins. What BAP does is it takes a binary program, and lets you run a pass over the program, during which time your plugin can dig in and analyze the internals of that program.

Our CBAT tools are implemented primarily as BAP plugins. The particular plugin that we will focus on here is called wp. If you are curious what that stands for, it is short for "weakest precondition," which is the technical name in the literature for the kind of analysis that wp performs.

The wp plugin has two modes of operation. In the first mode, it can make a pass over a single binary program, and analyze a function in that program. In the second mode, it can make a pass over two programs, and compare a function in them. Here we cover the first mode — the second mode will be covered when we reach the first example that uses it.

Analyzing one program

To analyze a single program, the command you want to invoke has this basic form:

    $ bap wp \              <-- The command is wp
        --func=FUNC \       <-- The function to examine
        [options] \         <-- Any extra options
        /path/to/exe        <-- The path to a binary program

This will tell CBAT to go into the program at /path/to/exe, and analyze the function FUNC that occurs in that program.

There are various [options] you can specify, to tell CBAT to do different things as it analyzes your program. We will illustrate the most important options throughout the following tutorial.

Tripping asserts

The wp tool is quite general. You can tell it to verify that your functions have certain properties, and you can specify your own custom properties that wp should check for.

As a convenience, we have provided some predefined properties, which are built-in to the tool. One of them is this: wp can check if it is possible to trigger an assert in a function in your program.

Consider the following simple C program (which lives at tutorial/01/binary/main.c):

#include <assert.h>

int main(int argc, char** argv) {

    if (argc == 0xdeadbeef) {
        assert(0);
    }
    
    return 0;

}

Here we have a main function, which takes an argument, argc (we can ignore argv).

Visually, here's the control flow:

                   ------------
                   |   main   |
                   ------------
                        |
           --------------------------
           |                        |
    argc == 0xdeadbeef          otherwise
           |                        |
           V                        V
     ----------------         ---------------
     |  assert(0)   |         |   return 0  |
     ----------------         ---------------
           |                        |
           V                        V
        Program                  Program
         fails                exits cleanly

As you can see from this diagram, if argc is 0xdeadbeef, then the program travels down the left branch, where an assert is tripped, and the program fails. Otherwise, it goes down the right branch, where the program returns 0, and exits cleanly.

Now, this is a C program, and as we noted earlier, wp does not analyze the C code itself. Rather, wp analyzes the compiled version of this program. We started with the C code here, just because it is easy to see what is going on. But the real program is what you get when you compile it.

Let's see what the machine instructions look like just for the branching part of the if statement in our example program. We can ask BAP to show us the machine instructions as it sees them, like this:

    $ bap 01/binary/main -d --print-symbol=main 

You don't need to do that right now. But if you did, you could look through the output, until you find the relevant instructions, which should look something like this:

00000481: when #5 goto %0000047b  <-------- When #5 is 1, go here ---+
00000879: goto %000006f9 <------------ Otherwise, go here --+        |
                                                            |        |
000006f9: ...  <--------------------------------------------+        |
00000722: call @__assert_fail ...  <--- trip the assert              |
                                                                     |
0000047b: ...  <-----------------------------------------------------+
000004aa: call #52 with noreturn  <--- exit cleanly

At the first instruction printed here (numbered 00000481), you can see that the machine is going to jump to instruction 0000047b when #5 is true (i.e., 1).

So if we suppose that #5 is 1, we can see what happens next, by following the machine down to the instruction at 000047b. There, we can see that the program exits cleanly.

Alternatively, if #5 is not 1, then the machine doesn't jump at 00000481, and instead moves down to the next instruction, at 0000879. What happens there? The machine jumps directly to instruction 0000006f9, and if we look there, we can see that it trips the assert we are looking for.

Even though these are machine instructions and not C code, there is still a clear control flow here, similar to what we saw for the C code:

                   ------------
                   |   main   |
                   ------------
                        |
                    000000481
                        |
           ----------------------------
           |                          |
         ~ #5                        #5
           |                          |
           V                          V
     ----------------         --------------------
     |  00000722:   |         |    000004aa:     |
     | assert_fail  |         |  call ... (exit) |
     ----------------         --------------------
           |                         |
           V                         V
        Program                   Program
         fails                  exits clean

Of course, just because there is a branch in the code where an assert is tripped, that doesn't mean the program can actually travel down that branch. Programs can have "dead" branches, which are impossible to travel down:

if true:
    do_something()
else:
    never_called() # Dead code, never called

What we want to know in our example is whether there is any way that the main function can actually get to the branch that trips the assert.

We can ask wp to find out if this is possible. To do that, invoke wp like this:

    $ bap wp \
          --func=main \
          --trip-asserts \
          01/binary/main

Here we ask wp to try to trip an assert inside the main function of the program at 01/binary/main. When we run this command, the relevant output comes at the end, which looks like this:

SAT!

Model:
  ZF  |->  0x0
  SF  |->  0x0
  RSP |->  0x000000003f800081
  RSI |->  0x0000000000000000
  RDX |->  0x0000000000000000
  RDI |->  0x00000000deadbeef
  RCX |->  0x0000000000000000
  RBP |->  0x0000000000000000
  RAX |->  0x0000000000000000
  R9  |->  0x0000000000000000
  R8  |->  0x0000000000000000
  PF  |->  0x0
  OF  |->  0x0
  CF  |->  0x0
  AF  |->  0x0
  mem_orig |-> [
    else |-> 0x00]
  mem_mod = mem_orig

The first thing it says is SAT!, which means that CBAT was able to find a way to trip the assert.

Next, it says Model, after which it lists some information. There, you can see a bunch of registers on the left, and a bunch of values on the right (and a mention of memory at the bottom, which we can ignore for now).

What this tells us is how to trip the assert. The idea here is that if we set the registers (and memory) to these particular values at the start of the main function, then main will trip the assert.

In this particular example, we can see that most of the registers are set to zero. The one that is interesting here is RDI. Remember that RDI is where the argument to the function goes. And here, wp tells us that this register should be set to 0xdeadbeef. That is to say, wp is telling us that if we set RDI to 0xdeadbeef, then our function main will trip the assert. And that of course makes sense, given the control flow that we examined in this function.

BAP's IR

It can be instructive to step through the machine instructions, and actually see the function exhibit this behavior.

One of the tools in the CBAT family is an interactive debugger that lets you do just that: step through a program, one machine instruction at a time. The tool is called bildb (short for "BAP Intermediate Language DeBugger"), and the command to invoke it looks like this:

    $ bap 01/binary/main \
          --pass=run \
          --run-entry-point=main \
          --bildb-debug

That will start up the debugger at the main function of the program at 01/binary/main.

That will start up bildb at the main function in 01/binary/main. You should see something like this:

BIL Debugger
Starting up...

Architecture
Type: x86_64
Address size: 64
Registers:
R10    R11    R12    R13    R14    R15    R8     R9     RAX    RBP    RBX    
RCX    RDI    RDX    RSI    RSP    YMM0   YMM1   YMM10  YMM11  YMM12  YMM13  
YMM14  YMM15  YMM2   YMM3   YMM4   YMM5   YMM6   YMM7   YMM8   YMM9   

Entering subroutine: [%00000868] main
0000088d: main_argc :: in u32 = RDI
0000088e: main_argv :: in out u64 = RSI
0000088f: main_result :: out u32 = RAX
Entering block %00000413
0000041a: #46 := RBP
>>> (h for help)

You can see that at startup bildb prints some information about the architecture and its initialization state, then it enters the function (subroutine) main, where it stops at the first instruction in the first block, namely:

0000041a: #46 := RBP

The instruction's identifying number is on the left of the colon, and the instruction itself is on the right of the colon.

Below that, bildb gives you a prompt, where you can type a command:

>>> (h for help)

There are various commands you can enter here, but they are fairly straightforward. For example, to see more than just one instruction, type show 5 and hit enter (to see the nearest +/- 5 lines):

>>> (h for help) show 5
   %00000413
-> 0000041a: #46 := RBP
   0000041d: RSP := RSP - 8
   00000420: mem := mem with [RSP, el]:u64 <- #46
   00000427: RBP := RSP
   00000435: #47 := RSP
   00000438: RSP := RSP - 0x10

Another thing you can do is see which values are stored in registers. For instance, to see the value stored in RBP, type p RBP (short for "print RPB") and hit enter. You should see something like this:

>>> (h for help) p RBP
RBP   : 0

This tells you that the RBP register has the value zero stored in it.

Remember how wp told us that main will trip the assert if it begins with RDI set to 0xdeadbeef? Let's set that value, and see if we can trip the assert.

To set RDI to 0xdeadbeef, type set RDI=0xdeadbeef and hit enter. You'll see that bildb sets the value:

>>> (h for help) set RDI=0xdeadbeef
RDI   : 0xDEADBEEF

Now you should be able to step through this program, instruction by instruction, and you should end up triggering the assert.

To move to the next instruction, hit s (for "step"), and hit enter. That moves to the next instruction:

>>> (h for help) s
0000041d: RSP := RSP - 8

You could keep stepping through this program, one instruction at a time, but you can also just set a breakpoint at the instruction you want, and skip ahead to it.

Let's set a breakpoint at the assert, to see if we reach it. Recall the control flow graph we saw before:

                   ------------
                   |   main   |
                   ------------
                        |
                    000000481
                        |
           ----------------------------
           |                          |
         ~ #5                        #5
           |                          |
           V                          V
     ----------------         --------------------
     |  00000722:   |         |    000004aa:     |
     | assert_fail  |         |  call ... (exit) |
     ----------------         --------------------
           |                         |
           V                         V
        Program                   Program
         fails                  exits clean

You can see that the assert happens at instruction 00000722, so let's set a breakpoint there. To set a breakpoint, type b %00000722 (short for "breakpoint at instruction 00000722"):

>>> (h for help) b %00000722
Breakpoint set at %00000722

Now let's see if we get to that breakpoint. To tell the debugger to move forward, type n (for "next") and hit enter. The debugger takes us to the next block:

Entering block %000006f9
000006fe: RCX := 0x4005ED

Type n (and enter) to skip forward again, and you will see that you hit the breakpoint at 00000722:

>>> (h for help) n
00000722: call @__assert_fail with return %0000047b

At this point, you have confirmed that the program does indeed trip the assert if RDI is set to 0xdeadbeef at the beginning of main.

You could keep stepping through the program, but there is no need at this point. You can just exit the debugger. To quit the debugger, hit q and then enter.

Hooking wp up to bildb

Instead of manually setting the values of registers in bildb to see the behavior that wp tells you about, you can have wp dump its output into a YAML file, and then have bildb read that in, at startup. To have wp dump its output like this, run the following command:

    $ bap wp \
          --func=main \
          --trip-asserts \
          --bildb-output=init.yml \
          01/binary/main

Notice how you simply added the option --bildb-output=init.yml to the command. That tells wp to dump the appropriate output into a file called init.yml.

Once you have that file at hand, you can start bildb with it, like this:

    $ bap 01/binary/main \
          --pass=run \
          --run-entry-point=main \
          --bildb-debug \
          --bildb-init=init.yml

Notice how we added the option --bildb-init=init.yml. When bildb starts up, it will read init.yml, and set the registers to the values listed there. From there, you can set the breakpoint at 00000722 and run through the program exactly as before.

4-Rooks

As we saw in the last example, wp can find a way to trip an assert, if it's possible to do so, and it will tell us how to do it. This can be used to solve puzzles of various kinds.

As an example, consider the 8-Queens chess puzzle. The task is this: on a standard 8x8 chess board, place 8 queens on the board in places where none of them can be captured by any of the others in a single move. Queens can move up/down, sideways, and diagonally, so you have to be careful to place the queens out of each other's paths.

For simplicity, let's do a simpler version: the 4-Rooks puzzle. It's just like the 8-Queens puzzle, except we have 4 rooks, and the board is a 4x4 board. Rooks can move up and down, and sideways.

Since this version of the game is less complex, it is somewhat easy to think up a solution. For example, here is one:

    +---+---+---+---+
    | R |   |   |   |
    +---+---+---+---+
    |   |   | R |   |
    +---+---+---+---+
    |   | R |   |   |
    +---+---+---+---+
    |   |   |   | R |
    +---+---+---+---+

As you can see, each of these rooks is out of the path of the others, so none can be captured by any of the others in a single move.

Now suppose that you have a function (in C) that checks solutions to the 4-rooks puzzle. Imagine that the function looks something like this:

/* This function checks if solution is correct. */
int check(int solution) {

    bool correct = true;

  // Check if solution is correct...

  if (correct) {
    assert(0);
  }

  return 0;
}

This function takes a solution to the 4-rooks game as an argument, it then checks the solution, and if the solution is correct, it trips an assert.

How do we encode a proposed solution, so that we can pass it into this function as an argument? Well, since machine code really only works with binary numbers, we'll need to encode proposed solutions as a bitvector.

How can we do that? There are 16 squares on a 4x4 chess board, so let's say that we'll represent the board with a 16-bit binary number, where each bit represents one of the squares. Let's also say that, for each bit, if it's 1, that means there is a rook at that position, and if it's 0, that means there is no rook at that position.

To encode a proposed solution as a 16 bit binary number, we mark each position on the board with a 0 or 1 to indicate whether there is a rook at that position. For example:

    +---+---+---+---+           +---+---+---+---+
    | R |   |   |   |           | 1 | 0 | 0 | 0 |    ==>   1000
    +---+---+---+---+           +---+---+---+---+
    |   |   | R |   |           | 0 | 0 | 1 | 0 |    ==>   0010
    +---+---+---+---+    ==>    +---+---+---+---+
    |   | R |   |   |           | 0 | 1 | 0 | 0 |    ==>   0100
    +---+---+---+---+           +---+---+---+---+
    |   |   |   | R |           | 0 | 0 | 0 | 1 |    ==>   0001
    +---+---+---+---+           +---+---+---+---+

Then we line up those bits, one after another, into one 16-bit binary number:

solution = 1000 0010 0100 0001 (or 0x4281 in hex)

So, the check function takes a 16-bit number as the argument, it then checks the proposed solution to see if it is correct, and it trips an assert if it is correct.

The full code for this function can be found at tutorial/02/binary/main.c. The compiled version of this program is at 02/binary/main. You can try it out by passing it a hex version of a possible solution. For example:

    $ ./02/binary/main 0x2212
    $ ./02/binary/main 0x8421

Since we have a function that trips an assert if the solution is correct, we can ask wp to try and find a way to trip that assert. If wp can find a way to trip the assert, it will give us an example of an input to the function that will cause this behavior.

Let's try it. Here is the command. We only want to analyze the check function, which we specify with --func=check:

    $ bap wp \
          --func=check \
          --trip-asserts \
          02/binary/main

After a moment, wp returns a result. At the end of the output, you should see something like this:

SAT!

Model:
  ZF  |->  0x0
  SF  |->  0x0
  RSP |->  0x0000000040000000
  RSI |->  0x0000000000000000
  RDX |->  0x0000000000000000
  RDI |->  0x0000000000004281  <-- Argument to function
  RCX |->  0x0000000000000000
  RBP |->  0x0000000000000000
  RAX |->  0x0000000000000000
  R9  |->  0x0000000000000000
  R8  |->  0x0000000000000000
  PF  |->  0x0
  OF  |->  0x0
  CF  |->  0x0
  AF  |->  0x0
  mem_orig |-> [
    else |-> 0x00]
  mem_mod = mem_orig

Here, we can see that wp did indeed find a way to trip the assert, and it provided an example of how to do it.

Look at the registers listed under Model. Remember that the first argument to a function is always stored in RDI, and here wp tells us that if we set that to 0x4281, then our function check will trip the assert.

What is 0x4281? In binary, it is:

0100 0010 1000 0001

If we break that up into a 4x4 matrix, we can see the chess board:

                 +---+---+---+---+           +---+---+---+---+
    0100   ==>   | 0 | 1 | 0 | 0 |           |   | R |   |   |
                 +---+---+---+---+           +---+---+---+---+
    0010   ==>   | 0 | 0 | 1 | 0 |           |   |   | R |   |
                 +---+---+---+---+    ==>    +---+---+---+---+
    1000   ==>   | 1 | 0 | 0 | 0 |           | R |   |   |   |
                 +---+---+---+---+           +---+---+---+---+
    0001   ==>   | 0 | 0 | 0 | 1 |           |   |   |   | R |
                 +---+---+---+---+           +---+---+---+---+

We can see that this is indeed a solution to the 4-rooks problem. Since none of the rooks are in each other's way, none of them can be captured by any of the others in a single move.

This technique applies to many different kinds of puzzles or problems whose solutions can be encoded as a binary number. For example, solving a sudoku puzzle, reversing a hash, and so on.

Find a Null Dereference

Another predefined property that wp can check is that it can find null dereferences in a function. To illustrate, consider the following C program, which can be found at tutorial/03/binary/main.c:

#include 

int main() {

    // Allocate a byte of memory, at address `addr`
    char *addr = malloc(sizeof(char));

    // Store the character 'z' at that address
    *addr = 'z';

}

In this program, we first allocate a byte of memory, and then we take the address of the byte that gets allocated and we store it in the pointer called addr. Next, we attempt to store the character z in the slot at that address.

Suppose I run this program, and suppose that the byte of memory I ask for here is successfully allocated. Well, then everything works great. I get back an address for the allocated byte, and I can then store the character z at that address.

But things needn't turn out that way. It's always possible that malloc won't be able to allocate the memory I've asked for, in which case it can't return an address to me. If that happens, it will return NULL (i.e., 0). And then, when I try to store z at that address . . . well, there is no address, so this will segfault.

We can ask wp to find null dereferences like this one. To check this particular main function, we can use the following command (note that we add --check-null-derefs as a flag):

    $ bap wp \
          --func=main \
          --check-null-derefs \
          03/binary/main

If you run this, you'll see output that looks something like this:

SAT!

Model:
  ZF  |->  0x0
  SF  |->  0x0
  RSP |->  0x0000000040000000
  RSI |->  0x0000000000000000
  RDX |->  0x0000000000000000
  RDI |->  0x0000000000000000
  RCX |->  0x0000000000000000
  RBP |->  0x0000000000000000
  RAX |->  0x0000000000000000
  R9  |->  0x0000000000000000
  R8  |->  0x0000000000000000
  PF  |->  0x0
  OF  |->  0x0
  CF  |->  0x0
  AF  |->  0x0
  mem_orig |-> [
    else |-> 0x00]
  mem_mod = mem_orig
  malloc_ret_RAX016 |->  0x0000000000000000

First, it says SAT!, indicating that it did in fact find a way to trigger the null dereference. Second, it gives us a Model, which shows us how to do it. Notice in particular the very last line:

malloc_ret_RAX016 |->  0x0000000000000000

This is tells us that if malloc returns 0, then we'll end up with a null dereference. And that's exactly right. As we saw, if malloc returns NULL (i.e., 0, which indicates that no address could be allocated), then trying to store a character there will segfault.

Note that we could rewrite the program at tutorial/03/binary/main.c so that it checks for the NULL memory address before trying to store something there. There is a corrected version at tutorial/03/binary/main_fixed.c:

#include 

int main() {

    // Allocate a byte of memory, at address `addr`
    char *addr = malloc(sizeof(char));

    // Don't proceed if we got no address
    if (addr == NULL) { return 0; }

    // Store the character 'z' at that address
    *addr = 'z';

}

Let's have wp check this version of the function. The compiled program lives at 03/binary/main_fixed. So:

    $ bap wp \
          --func=main \
          --check-null-derefs \
          03/binary/main_fixed

Now wp gives back output that looks like this:

Evaluating precondition.
Checking precondition with Z3.

UNSAT!

This time, wp says UNSAT, which is the opposite of SAT. In other words, wp has found that there are no null dereferences in main. And that is correct. There is indeed no possible way to dereference a null pointer in this version of the program, which wp confirms.

Comparing Function Outputs

In the previous examples, we have been using wp to examine a function in a single binary program, but as mentioned earlier, wp can compare how a function behaves in two different programs, to see if they behave the same.

Analyzing two programs

To analyze two programs, the command you want to invoke has this basic form:

    bap wp \                    <-- The command is wp
        --func=FUNC \           <-- The function to examine in both programs
        [options] \             <-- Any extra options
        /path/to/exe1 \         <-- The path to the first binary
        /path/to/exe2           <-- The path to the second binary

Example: Verifying an optimization

One of the predefined comparative properties that wp can verify is the following: wp can look at a function that occurs in two programs, and it can figure out if those functions can produce different outputs.

This can be very useful if, say, you optimize a function. You can ask wp to compare the old unoptimized version with the new optimized version, and tell you if it's possible for the two functions to give you different outputs.

Suppose we are writing a C program, and we need to know if two integers have the same sign (positive or negative). Here is a very straightforward way to write a function to check (you can find this code in tutorial/04/binary/main_1.c:

bool same_signs(int x, int y) {

    // When x is negative
    if(x < 0){

        // They have the same sign if x is negative too
        if (y < 0) { return true; }
        else { return false; }

    // When x is positive
    } else{

        // They have the same sign if y is positive too
        if (y >= 0) { return true; }
        else { return false; }

    }

}

First, we check if x is less than zero. If it is, we then check if y is also less than zero. If so, they have the same sign. Alternatively, if x is greater than zero, then we check if y is also greater than zero, and again, if so, then they have the same sign.

This code is fine, but it has a lot of if statements, so it branches a lot, and hence is not the fastest code on the planet. Suppose we want to optimize this, and we ask a clever junior engineer write a new version. Suppose they come up with the following (this version can be found in tutorial/04/binary/main_2.c:

bool same_signs(int x, int y) {
    return !((x ^ y) < 0);
}

This looks promising. It has none of the branches, and it uses XOR (the caret ^) as a bit trick.

We want to know: is this optimized version really the same as the earlier version. That is to say, does this optimized version really produce the same outputs as the first version?

We can ask wp to compare these two functions, and tell us if there is any possible way for the second one to produce a different output than the first one (when given the same input).

Remember that the output of a function is always placed in the RAX register, so what we really want to know is if there is any way for these two functions to put different values in RAX, given the same inputs. And this is something we can ask wp to check for us.

This is our first example of using wp to compare two programs. You can ask wp to check if both versions of the same_signs function produce the same output in RAX. Here is the command:

    $ bap wp \
          --func=same_signs \
          --compare-post-reg-values=RAX \
          04/binary/main_1 \
          04/binary/main_2

Note the parameter --compare-post-reg-values=RAX in this command. That tells wp to compare the value in RAX after the functions finish execution.

When wp finishes its check, it prints out the following:

Evaluating precondition.
Checking precondition with Z3.

UNSAT!

What this means is that wp was not able to find a way to make these two function produce different outputs. And remember, wp does a logical check, so this is telling us that (up to our simplifying assumptions) it is logically impossible for these two functions to produce different outputs.

That's good news. This tells us that the optimized version of the function is indeed equivalent to the original, in the sense that it will always produce exactly the same outputs as the original.

Comparing Function Calls

Another predefined property that wp can check for when comparing two functions is this: it can check that both versions of the functions make the same function calls in the course of their execution.

Let's illustrate this with an example. Imagine an old program that has been running on a submarine for some time now, which processes signals from the user, and dispatches the task to the appropriate handlers.

There are various signals it can receive:

These various signals are encoded in an enum:

/* Different types of signals */
typedef enum {STATUS, NAV, LOG, DEPLOY} signal_t;

There are handlers for each one:

/* Stubbed handler for the SURFACE signal */
int surface() {
    int status_code = 1;
    // Handle surfacing...
    return status_code;
}

/* Stubbed handler for the NAV signal */
int alter_course() {
    int status_code = 2;
    // Handle navigation...
    return status_code;
}

/* Stubbed handler for the LOG signal */
int log_system_status() {
    int status_code = 3;
    // Log the system's status...
    return status_code;
}

/* Stubbed handler for the DEPLOY signal */
int deploy_payload() {
    int status_code = 4;
    // Deploy the payload...
    return status_code;
}

To process the signal, there is a switch statement:

/* Process a signal, and dispatch to an appropriate handler */
int process_signal(signal_t signal) {

    int status_code = 0;

    switch (signal) {
        case SURFACE:
            status_code = surface();
            break;

        case NAV:
            status_code = alter_course();

        case LOG:
            status_code = log_system_status();
            break;

        case DEPLOY:
            status_code = deploy_payload();
            break;
    }

    return status_code;

}

The switch statement here simply dispatches the task to the appropriate handler, depending on which signal it receives, and then it returns the resulting status code.

But notice that there is no break statement in the NAV case. The programmer wanted to have a log happen after every NAV signal, so they cleverly left off the break here. That causes execution to fall through from the NAV case to the LOG case. Hence, every time this program handles a NAV signal, it does that, but then it falls through to LOG, and so it also logs the system status.

The full code for this example can be found at tutorial/05/binary/main_1.c.

Suppose now that, many years later, we have decided that we do not need to have logging in this program anymore, so we apply an automated patch that strips the logging code out.

The patch removes the LOG from the enum, so now we just have SURFACE, NAV, and DEPLOY:

/* Different types of signals */
typedef enum {SURFACE, NAV, DEPLOY} signal_t;

It also removes the log_status() handler, so now we just have surface(), alter_course(), and deploy_payload():

/* Stubbed handler for the SURFACE signal */
int surface() {
    int status_code = 1;
    // Handle surfacing...
    return status_code;
}

/* Stubbed handler for the NAV signal */
int alter_course() {
    int status_code = 2;
    // Handle navigation...
    return status_code;
}

/* Stubbed handler for the DEPLOY signal */
int deploy_payload() {
    int status_code = 4;
    // Deploy the payload...
    return status_code;
}

And the patch removes the LOG case from the switch statement, so now we only have the cases for SURFACE, NAV, and DEPLOY:

/* Process a signal, and dispatch to an appropriate handler */
int process_signal(signal_t signal) {

    int status_code = 0;

    switch (signal) {
        case SURFACE:
            status_code = surface();
            break;

        case NAV:
            status_code = alter_course();

        case DEPLOY:
            status_code = deploy_payload();
            break;
    }

    return status_code;

}

Of course, stripping out that LOG case introduced a bug. Before the patch, every time a NAV signal was processed, the NAV case would fall through to the LOG case, and log the system's status. But now, in the patched version, there is no LOG case, so every time a NAV signal gets processed, it falls right on through to the DEPLOY case, and deploys the payload!

The full code for the patched version can be found at tutorial/05/binary/main_2.c.

We can ask wp to analyze both versions of the process_signal function, and check that they call the same functions. Then wp will find out if there is any way that the original and patched versions differ with respect to function calls.

e can ask wp to compare both binaries and find out if both versions of the process_signal functions differ, with respect to the function calls they make. Here is the command:

    $ bap wp \
          --func=process_signal \
          --compare-func-calls \
          05/binary/main_1 \
          05/binary/main_2

Notice that we added the flag --compare-func-calls. This tells wp to check the function calls in both versions of the process_signal function.

The output looks something like this:

SAT!

Model:
    ZF  |->  0x0
    SF  |->  0x0
    RSP |->  0x000000003f800084
    RSI |->  0x0000000000000000
    RDX |->  0x0000000000000000
    RDI |->  0x0000000000000001
    RCX |->  0x0000000000000000
    RBP |->  0x0000000000000000
    RAX |->  0x0000000000000000
    R9  |->  0x0000000000000000
    R8  |->  0x0000000000000000
    PF  |->  0x0
    OF  |->  0x0
    CF  |->  0x0
    AF  |->  0x0
    mem_orig |-> [
        else |-> 0x00]
    mem_mod = mem_orig

This says SAT, which means wp did indeed find a way to make the older and patched versions of these functions make different function calls.

Note the Model, which tells us that this violation occurs when RDI is set to 0x01. So if the value 0x01 is stored in RDI when these functions start, then the second version of the function will call fewer functions than the first version.

Given the code that we looked at above, we can see that this is right. Remember that RDI always holds the first argument to a function, which in the case of process_signals is the signal:

/* Process a signal, and dispatch to an appropriate handler */
int process_signal(signal_t signal) {   <---------- RDI holds the signal

    ...

}

What is the value 0x01? Well, a signal is an enum, which we can think of as an indexed array:

/* Different types of signals */
typedef enum {SURFACE, NAV, LOG, DEPLOY} signal_t; <-- items indexed 0, 1, 2, 3

    or 

typedef enum {SURFACE, NAV, DEPLOY} signal_t;      <-- items indexed 0, 1, 2

Hence, index 0x00 refers to the first option (SURFACE), 0x01 refers to the second option (NAV), and so on.

With that in mind, we can make sense of what wp is telling us here. It's telling us that if we call process_signal with the NAV signal (i.e., with RDI holding the index 0x01), then the functions will behave differently, in the sense that they will make different functions calls.

Which function calls do they differ on, exactly? We can find out by asking wp to print out its refuted goals. To do that, we add the parameter --show=refuted-goals when we run wp, like this:

    $ bap wp \
        --func=process_signal \
        --compare-func-calls \
        --show=refuted-goals \
        05/binary/main_1 \
        05/binary/main_2

Then wp prints output that looks something like this:

SAT!

Model:
    ZF  |->  0x0
    SF  |->  0x0
    RSP |->  0x000000003f800084
    RSI |->  0x0000000000000000
    RDX |->  0x0000000000000000
    RDI |->  0x0000000000000001
    RCX |->  0x0000000000000000
    RBP |->  0x0000000000000000
    RAX |->  0x0000000000000000
    R9  |->  0x0000000000000000
    R8  |->  0x0000000000000000
    PF  |->  0x0
    OF  |->  0x0
    CF  |->  0x0
    AF  |->  0x0
    mem_orig |-> [
        else |-> 0x00]
    mem_mod = mem_orig

Refuted goals:
deploy_payload not called in modified:
    Z3 Expression: false

Notice at the bottom:

Refuted goals:
deploy_payload not called in modified

What this says is deploy_payload is called in the modified version of the function and not in the original version, when the function is called with the argument NAV (i.e., when RDI is 0x01 at the start of the function).

(Why the double negation in "Refuted goals: deploy_payload not called in modified"? This is because wp proves what it proves by assuming that the same functions are not called, and then it tries to falsify that assumption.)

In practice, it is difficult for humans to catch these kinds of bugs, especially when the patches are applied automatically, and to many different portions of a program all at once. wp can be an important help here.

Custom Postcondition (One Binary)

So far we have been looking at predefined properties that are built in to wp. Let's turn now to defining some custom properties.

To define a custom property of a function in your program, you basically describe what you want the state of your program to look like, after the function finishes execution. Then wp will check if that description holds true of your program. If wp can find a way to make it false, it will find it, and provide you with some example inputs that will make your function exhibit that behavior.

Let's do a simple example. Consider this toy C function (which can be found at tutorial/06/binary/main.c:

int main() {

    // Return my lucky number.
    return 7;

}

As a simple example of a custom property, let us say that this function should always produce 7 for its output. More exactly, let us say that when this function terminates, the value in RAX will always be 7 (since RAX is where the output of a function is always placed).

To encode this property, wp can parse SMTLIB expressions. SMTLIB is a special lisp-like language that was designed explicitly for stating custom properties like this.

To encode our property as an SMTLIB expression, we first need to know the name of the register, which of course we do know:

RAX

Then, we need to know the value that we want to say should be in it. The registers for x86_64 programs hold 64-bit binary numbers, so we want to encode 7 as a 64-bit number. For SMTLIB, we simply make it a 64-bit hex number, like this:

#x0000000000000007

Next, we want to say that these are equal, which we write like this:

(= RAX #x0000000000000007)

Notice that this is lisp-like. We put the equal sign up front, then we list the two arguments after it, and then we wrap the whole thing in parentheses.

Finally, we want to assert that this holds, so we wrap the whole thing in an assert, like this:

(assert (= RAX #x0000000000000007))

That is our complete SMTLIB expression. It asserts that the value in RAX is the 64-bit number 7.

Now we can tell wp to check that this holds for our function. To do that, we ask wp to analyze our main function just as we have done before, but we will specify our custom property with the following parameter:

--postcond='(assert (= RAX #x0000000000000007))'

This tells wp to check that the specified property holds after the function executes. Note that we enclose the SMTLIB expression in single quotes. This is just to force bash to treat it as a literal string and not perform any shell expansion.

Here is the full command to run:

    $ bap wp \
          --func=main \
          --show=refuted-goals \
          --postcond='(assert (= RAX #x0000000000000007))' \
          06/binary/main

When wp runs this, it will try to falsify this property. That is, it will explore all logical possibilities (up to our simplifying assumptions), and find a way to make our main function put something other than 7 in RAX.

When you run the above command, wp produces output like this:

Evaluating precondition.
Checking precondition with Z3.

UNSAT!

So, in this case, wp could not find a way to falsify this property, and hence the property holds. And indeed, that makes sense, because as we can see from the code, our function always returns the number 7.

Let's have wp check a different property, one that we know is false. For example, let's have wp check whether RAX always ends up with the value 3 in it. To express this property, we would write an SMTLIB expression just like the previous one, except we'll use the number 3 instead of 7:

(assert (= RAX #x0000000000000003))

Now have wp check this:

    $ bap wp \
        --func=main \
        --show=refuted-goals \
        --postcond='(assert (= RAX #x0000000000000003))' \
        06/binary/main

This time, wp outputs something that looks like this:

SAT!

Model:
  RSP |->  0x000000003f800081
  RSI |->  0x0000000000000000
  RDX |->  0x0000000000000000
  RDI |->  0x0000000000000000
  RCX |->  0x0000000000000000
  RBP |->  0x0000000000000000
  RAX |->  0x0000000000000000
  R9  |->  0x0000000000000000
  R8  |->  0x0000000000000000
  mem_orig |-> [
    else |-> 0x00]
  mem_mod = mem_orig

Refuted goals:
(= RAX0 #x0000000000000003):
  Concrete values: = #x0000000000000007 #x0000000000000003 
  Z3 Expression: = #x0000000000000007 #x0000000000000003

This time, wp comes back with SAT, meaning that it could find a way to falsify our assertion, and it shows us how to do it. In the model, most of the values are zero, which makes sense. Since our main function always returns 7, the registers can start with pretty much any values (or no values), and main will not produce 3 in RAX.

Custom Pre and Postcondition (One Binary)

Consider the following C program (the source code can be seen at tutorial/07/binary/main.c):

int main(int argc, char **argv) {
    if (argc < 10) {
        return 7;
    } else {
        return 254;
    }
}

In this program, you can see that if the number of arguments (argc) is less than 10, the main function returns 7. Otherwise it returns 254.

Suppose we want to assert that at the end of this function RAX will always produce 7. We could have wp check that, just as we did before:

    $ bap wp \
          --func=main \
          --show=refuted-goals \
          --postcond='(assert (= RAX #x0000000000000007))' \
          07/binary/main

When you run that, you'll see that wp produces output that looks something like this:

SAT!

Model:
  ZF  |->  0x0
  SF  |->  0x0
  RSP |->  0x000000003f800081
  RSI |->  0x0000000000000000
  RDX |->  0x0000000000000000
  RDI |->  0x0000000000400000
  RCX |->  0x0000000000000000
  RBP |->  0x0000000000000000
  RAX |->  0x0000000000000000
  R9  |->  0x0000000000000000
  R8  |->  0x0000000000000000
  PF  |->  0x0
  OF  |->  0x0
  CF  |->  0x0
  AF  |->  0x0
  mem_orig |-> [
    else |-> 0x00]
  mem_mod = mem_orig

Refuted goals:
(= RAX0 #x0000000000000007):
  Concrete values: = #x00000000000000fe #x0000000000000007 
  Z3 Expression: = #x00000000000000fe #x0000000000000007

Here, wp tells us that it can falsify our assertion, and it provides an example of how to do it. Notice the RDI register. If it is set to 0x400000, then our main function will put something other than 7 in RAX.

Of course, that makes sense, since we can see in the C code that if the number of arguments is 10 or more, it'll return 254, and 0x400000 is certainly bigger than 10.

Let's make our custom property a little more robust. Let's suppose that we want to assert that main will put 7 in RAX, given some particular input argument. For example, let's say that, if the input argument is 5, then RAX will be 7.

We can do that by asserting a precondition for the function. In this case, we want the precondition to be that the argument to our main function, which is stored in the RDI register, is 5. Here is how we would express that in SMTLIB:

(assert (= RDI #x0000000000000005))

And now we can add this as a precondition to our check, using the --precond parameter:

    $ bap wp \
          --func=main \
          --show=refuted-goals \
          --precond='(assert (= RDI #x0000000000000005))' \
          --postcond='(assert (= RAX #x0000000000000007))' \
          07/binary/main

Now wp will check that the postcondition holds, given that the precondition holds. In this case, it will check that, if RDI is 5 at the start of the function, then RAX will be 7 at the end of the function.

When you run this, wp will output something like this:

Evaluating precondition.
Checking precondition with Z3.

UNSAT!

In other words, wp was not able to find a way to falsify our assertions, and hence our assertions hold.

Custom Postcondition (Two Binaries)

In the last two examples, we looked at specifying custom properties that wp can check for a single binary program. We can also specify custom properties that wp can check to compare two binary programs.

To illustrate, consider the same_signs function we looked at before. Here is the first version of the function (which can be seen at tutorial/08/binary/main_1.c:

bool same_signs(int x, int y) {

    // When x is negative
    if(x < 0){

        // They have the same sign if x is negative too
        if (y < 0) { return true; }
        else { return false; }

    // When x is positive
    } else{

        // They have the same sign if y is positive too
        if (y >= 0) { return true; }
        else { return false; }

    }

}

And here's the optimized version (which can be seen at tutorial/08/binary/main_2.c:

bool same_signs(int x, int y) {
    return !((x ^ y) < 0);
}

Before we asked wp to check that these two functions produce the same output in RAX, using the --compare-post-reg-values=RAX parameter. But we can specify this same thing as a custom property.

What we want to say is that, when the function same_signs finishes executing, the values in RAX are exactly the same. More exactly, we want to say that at the end of the original function's execution, RAX is the same as it is at the end of the modified function's execution.

To explicitly refer to a register in the original function, we can append _orig to the register name, like this:

RAX_orig

Similarly, to explicitly refer to a register in the modified version of the function, we can append _mod, like this:

RAX_mod

Then, we can formulate an SMTLIB expression asserting that the two are equal:

(assert (= RAX_orig RAX_mod))

To have wp check this, we call wp in comparison mode, with our custom postcondition:

    $ bap wp \
          --func=same_signs \
          --show=refuted-goals \
          --postcond='(assert (= RAX_orig RAX_mod))' \
          08/binary/main_1 \
          08/binary/main_2

When you run this, wp outputs the following:

Evaluating precondition.
Checking precondition with Z3.

UNSAT!

In other words, wp could not find a way to falsify our assertion. Hence, the property we've asserted here holds. RAX in the original and the modified program will indeed always be equal.

Reference Guide

This tutorial has introduced many command line options for CBAT. A complete reference for the command line interface, including many options not described int the tutorial, can be found here. This page also includes a guide to common smtlib expressions for use in your custom properties.

Conclusion

As we noted at the outset, CBAT is a family of tools designed to verify the behavior of binary programs.

At the heart of the CBAT toolset is wp. As we have seen, wp is a tool that verifies whether functions behave in specified ways. wp can be used to verify the behavior of a function in a single binary program, but it can also be used to verify the behavior of a function in two binaries.

Because of its ability to compare two programs, wp can be used to check patches and modifications that you might make to your binary programs. Patches are meant to change some aspects of a program, while preserving certain other behaviors. With wp, you can check whether the patched version of your program preserves the desired behavior that is present in the original version.