This post has been long overdue, with breakpoints having been implemented in caesar for a while. What I thought was a simple feature quickly turned into a rabbit hole of debugging mach system calls and looking at meaningless hex values. Just a foreword, this post will go into detail regarding this topic as I’ve personally had to look through many different sources to gain an understanding of how mach exceptions work. I’m hoping to condense all of my knowledge into this post for anyone looking to learn more about how MacOS works under the hood.

Introduction

In theory, setting a breakpoint is simple.

Read an instruction in memory
Replace it with a trap instruction (on ARM, this would be brk #0)
Handle the exception when it fires
Replace overwritten instruction with original one

On MacOS, each of these steps involves dealing with the mach kernel API, for which the documentation can only be described as subpar. Apple’s own docs only confirm that such functions exist and their signature (which can be seen in any editor with LSP support already). For anyone wishing to use the API, I recommend exclusively using the mach kernel interface reference here, as this actually describes what each function does and what the parameters mean.

With my small “rant” now over, let’s delve into the details.

Mach Internals Overview

Before starting work on the implementation, I had to familiarise myself with a few different terms as Apple likes to think different.

Task: Mach equivalent of a process. This is a container for resources but has no execution context
Thread: The actual execution unit. Tasks contain one or more threads
Port: A kernel managed message queue used for IPC. Everything in mach communicates via ports, they are like file descriptors but for passing messages
Port Rights: Capabilities that allow you to send to/receive from a port
Message: Data sent between ports. For example, exceptions are delivered as messages to an exception port

Launching a process

First, I needed to launch my target process. I did this using posix_spawn so that I can start the process in suspended mode, where it will wait for a resume signal in dyld_start before continuing.

  pid_t pid = 0;
  int status = 0;
  posix_spawnattr_t attr = nullptr;

  status = posix_spawnattr_init(&attr);
  if (status != 0) return -1;

  status = posix_spawnattr_setflags(&attr, POSIX_SPAWN_START_SUSPENDED);
  if (status != 0) return -1;

  status = posix_spawn(&pid, file_path, nullptr, &attr, argList, nullptr);
  posix_spawnattr_destroy(&attr);
  if (status != 0) return -1;

Controlling a process

I then had to get a handle to the process so that I can control it. This was done using task_for_pid like so.

  const kern_return_t kr = task_for_pid(mach_task_self(), pid, task);
  if (kr != KERN_SUCCESS) {
    error(mach_error_string(kr));
  }

Seems trivial so far, right? When I first tried running this code, I’d get the following error (os/kern) failure. This is due to the fact the binary is not allowed to get a handle to the process as it is unsigned. To fix this, I first needed to acquire the right permissions from the OS. This is done via entitlements which are applied through the codesign utility. I used the following entitlements file which asks for debugger permissions, which is the same way lldb works.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>com.apple.security.cs.debugger</key>
    <true/>
</dict>
</plist>

The binary can then be signed like so

codesign --entitlements Entitlements.plist --force -s - caesar

MacOS debugger permissions prompt

Now I can get a handle to the process and control it without any errors. The next step is to set up the exception ports so we can start responding to the mach messages being sent when an exception is triggered.

Mach Exception Ports

Before setting up the exception ports, I first saved the current exception ports of the target so that they can be restored if I want to detach from the target whilst its still running. In order to save the current ports, I created the following struct to store them in.

struct MachExcPorts {
  mach_msg_type_number_t excp_type_count{};
  std::array<exception_mask_t, EXC_TYPES_COUNT> masks{};
  std::array<mach_port_t, EXC_TYPES_COUNT> ports{};
  std::array<exception_behavior_t, EXC_TYPES_COUNT> behaviours{};
  std::array<thread_state_flavor_t, EXC_TYPES_COUNT> flavours{};
};

After this, I call task_get_exception_ports like so in order to populate the struct fields. The bitmasks here indicate for which exception types you with to handle. In previous versions of the API, you were able to use EXC_MASK_ALL, however, that now triggers the exception handler for EXC_RESOURCE. This is a signal sent when a process exceeds resource limits and its mainly used for performance monitoring, which is of no interest to me.

MachExcPorts savedPorts{};
task_get_exception_ports(
    task,
    EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC |
        EXC_MASK_EMULATION | EXC_MASK_SOFTWARE | EXC_MASK_BREAKPOINT |
        EXC_MASK_SYSCALL | EXC_MASK_MACH_SYSCALL | EXC_MASK_RPC_ALERT |
        EXC_MASK_CRASH,
    savedPorts.masks.data(), &savedPorts.excp_type_count,
    savedPorts.ports.data(), savedPorts.behaviours.data(),
    savedPorts.flavours.data());

I then allocated a mach port to the debugger itself so that it can receive messages on the exception port. This is what will allow the debugger to receive the exceptions through the dedicated handler functions.

kern_return_t kr = mach_port_allocate(mach_task_self(),
                                      MACH_PORT_RIGHT_RECEIVE, &exc_port);
  if (kr != KERN_SUCCESS) {
    error(mach_error_string(kr));
  }

After this, I had to give the exception port the capability to send mach messages back in order to reply to the exception. Through this, we are able to set new state and inspect the current cpu state as it was when the exception triggered.

kr = mach_port_insert_right(mach_task_self(), exc_port, exc_port,
                            MACH_MSG_TYPE_MAKE_SEND);
if (kr != KERN_SUCCESS) {
  error(mach_error_string(kr));
}

Lastly, I set the target task’s exception port to the one created earlier so that the target knows through which port to communicate with the debugger for exceptions.

kr = task_set_exception_ports(
    task,
    EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC |
        EXC_MASK_EMULATION | EXC_MASK_SOFTWARE | EXC_MASK_BREAKPOINT |
        EXC_MASK_SYSCALL | EXC_MASK_MACH_SYSCALL | EXC_MASK_RPC_ALERT |
        EXC_MASK_CRASH,
    exc_port, EXCEPTION_STATE_IDENTITY | MACH_EXCEPTION_CODES,
    ARM_THREAD_STATE64);

if (kr != KERN_SUCCESS) {
  error(mach_error_string(kr));
}

One argument to take note of is EXCEPTION_STATE_IDENTITY | MACH_EXCEPTION_CODES as this dictates which specific exception handler will be called (there are 3 of them). Here, I’ve used EXCEPTION_STATE_IDENTITY to trigger the handler which will provide me with the most information when an exception occurs. Other values that could be used here include.

EXCEPTION_DEFAULT - This provides minimal info with just exception type and codes
EXCEPTION_STATE - This includes thread state (registers) but no thread/task ports
EXCEPTION_STATE_IDENTITY - This includes both thread state and thread/task ports

I’ve also used MACH_EXCEPTION_CODES so I can receive 64-bit exception codes as opposed to 32-bit exception codes.

Lastly, I call ptrace with PT_ATTACH_EXC like so in order to receive exceptions that the target binary triggers.

ptrace(PT_ATTACHEXC, pid, nullptr, 0);

Mach Exception Handlers

Now, I had to set up the actual exception handlers themselves. This is done via the mig utility which generates the headers needed with the signatures of the exception handler functions as well as other mach IPC functions. In ordered to generate the headers, I ran the following command pointing to the file containing the definitions which is located under the current Xcode installation. Despite the version number in the path, this will not affect compatibility as mach messages and exception handling have been part of the kernel since its early days.

mig /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX26.2.sdk/usr/include/mach/mach_exc.defs

Running this command will generate the following files which need to be copied along with the other source files.

mach_exc.h
mach_excServer.c
mach_excUser.c

For my cases, I only needed mach_exc.h and mach_excServer.c as I only need to set up the receiving end as my debugger only only be replying to exception, not sending new ones. I also renamed mach_excServer.c to mach_excSercer.h before including it in my build as to avoid any linter errors from using #include on source files.

I then needed to define the actual implementation of the exception handlers. There are 3 of them which need to be defined and can be found in mach_excServer.h as catch_mach_exception_raise, catch_mach_exception_raise_state, and catch_mach_exception_raise_state_identity. The difference between them is how many parameters they have as they include the additional information requested in the exception handler type bitmask.

kern_return_t catch_mach_exception_raise(mach_port_t excPort, mach_port_t threadPort, mach_port_t taskPort,
  exception_type_t excType, mach_exception_data_t codes, mach_msg_type_number_t numCodes);

kern_return_t catch_mach_exception_raise_state(mach_port_t excPort, exception_type_t exc, const mach_exception_data_t code,
  mach_msg_type_number_t codeCnt, int* flavour, const thread_state_t oldState,
  mach_msg_type_number_t oldStateCnt, thread_state_t newState, mach_msg_type_number_t* newStateCnt);

kern_return_t catch_mach_exception_raise_state_identity(mach_port_t excPort, mach_port_t thread, mach_port_t task,
  exception_type_t exc, mach_exception_data_t code, mach_msg_type_number_t codeCnt, int* flavour,
  thread_state_t oldState, mach_msg_type_number_t oldStateCnt, thread_state_t newState,
  mach_msg_type_number_t* newStateCnt);

As noted earlier, my debugger will handle target exceptions through catch_mach_exception_raise_state_identity. The other functions can just have an instant return as they will not be called.

extern "C" {
  kern_return_t catch_mach_exception_raise(...) {
    return KERN_FAILURE;
  }

  kern_return_t catch_mach_exception_raise_state(...) {
    return KERN_FAILURE;
  }

  kern_return_t catch_mach_exception_raise_state_identity(...) {
    task_suspend(task);

    memcpy(newState, oldState, sizeof(arm_thread_state64_t));

    std::cout << exceptionReason(exc, codeCnt, code);
    std::cout << formatRegisterOutput(oldState);

    if (exc == EXC_BREAKPOINT) {
      restorePrevIns(oldArmState->pc - getAslrSlide());
    }

    *newStateCnt = oldStateCnt;
    return KERN_SUCCESS;
  }
}

An important detail to note is that all the handlers are under an extern "C" block as the functions need to link with the mig generated code in mach_excServer.h.

Inside the handler, I first suspend the task so that I can safely control it. I then copy oldState into newState as the handler will automatically use the contents of newState to change the thread state before resuming execution. After, I print the exception reason to make sure that the target stopped because of a breakpoint. Another detail to note here is that mach exceptions can also be unix signals but not the other way around as MacOS is based on BSD. The way to check for this is to see if exc == EXC_SOFTWARE && codeCnt >= 2 && code[0] == EXC_SOFT_SIGNAL and then you can deal with the exception as if it were a unix signal. Next, I print the state of the registers which can be accessed via oldState. Then, I check if a breakpoint was hit and then restore the previous instruction so that execution can resume as normal when the task gets resumed. Finally, I set the state count of the new state to the same as the old state (seems pointless to me but then again I don’t really understand this parameter) and return KERN_SUCCESS to indicate that we have dealt with the exception.

Writing Breakpoints

In terms of writing the breakpoint, I first needed to read an instruction at an arbitrary address within the target. In order to do this, I had to calculate the ASLR slide as most if not all programs are compiled with ASLR enabled, meaning their place in memory is never at a fixed, pre-determined location at runtime. In my case, since my debugger does not support attaching to live a process yet, I have to calculate it by reading the headers of the memory regions and checking if they have the executable bit set, indicating that they contain the executable instructions of the target binary. If not, we skip over to the next region and run the same check again.

Calculating ASLR offset

void readAslrSlideFromRegions() {
  mach_vm_address_t addr = 0;
  mach_vm_size_t size = 0;
  vm_region_basic_info_data_64_t info;
  mach_msg_type_number_t infoCnt = VM_REGION_BASIC_INFO_COUNT_64;
  mach_port_t objName = MACH_PORT_NULL;

  while (true) {
    kern_return_t kr = mach_vm_region(
        task, &addr, &size, VM_REGION_BASIC_INFO_64,
        reinterpret_cast<vm_region_info_t>(&info), &infoCnt, &objName);
    if (kr != KERN_SUCCESS) break;

    if ((info.protection & VM_PROT_EXECUTE) != 0) {
      ...
    }
    addr += size;
  }

  error("Could not determine ASLR slide!\n");
}

Once we do land in a region which is executable, I read 4 bytes of memory at the current address and try to see if that’s the mach-o magic number. If it is, that means we have found the actual base address of the target in memory. All that’s left is to deduct the preferred base address, which is 0x100000000 for a 64-bit mach-o binary.

vm_offset_t headerBuf{};
auto sz = static_cast<mach_msg_type_number_t>(sizeof(u32));
kr = mach_vm_read(task, addr, sizeof(u32), &headerBuf, &sz);
if (kr == KERN_SUCCESS) {
  const u32 magic = *reinterpret_cast<u32*>(headerBuf);
  mach_vm_deallocate(mach_task_self(), headerBuf, sizeof(u32));
  if (magic == MH_MAGIC_64) {
    aslr_slide = addr - 0x100000000;
    return;
  }
}

Reading and Writing Instructions

To read the instruction at any address, I have to use mach_vm_read to read the current instruction which will be pointed to by origBuf. Then, I had to cast it to a uint32_t as all ARM instructions are fixed size (32-bit) which makes this part of the process easy.

const u64 actual = addr + aslr_slide;
auto sz = static_cast<mach_msg_type_number_t>(sizeof(u32));
vm_offset_t origBuf = 0;
kern_return_t kr = mach_vm_read(task, actual, sizeof(u32), &origBuf, &sz);
if (kr != KERN_SUCCESS) {
  error(std::format("Error reading memory: {}!\n", mach_error_string(kr)));
}

const u32 origIns = *reinterpret_cast<u32*>(origBuf);
mach_vm_deallocate(mach_task_self(), origBuf, sizeof(u32));

After this, I then had to change the memory protection for the specific instruction we want to overwrite so that we can write the breakpoint instruction there. After that, I reset the memory protections so that the breakpoint can actually be executed.

u32 brk = 0xD4200000;

kr = mach_vm_protect(m_task, actual, sizeof(u32), FALSE,
                     VM_PROT_READ | VM_PROT_WRITE | VM_PROT_COPY);
if (kr != KERN_SUCCESS) {
  error(std::format("Error changing memory protection: {}!\n", mach_error_string(kr)));
}

kr = mach_vm_write(m_task, actual, reinterpret_cast<vm_offset_t>(&brk),
                   sizeof(u32));
if (kr != KERN_SUCCESS) {
  error(std::format("Error writing memory: {}!\n", mach_error_string(kr)));
}

kr = mach_vm_protect(m_task, actual, sizeof(u32), FALSE,
                     VM_PROT_READ | VM_PROT_EXECUTE);
if (kr != KERN_SUCCESS) {
  error(std::format("Error re-protecting memory: {}!\n", mach_error_string(kr)));
}

Lastly, I had to implement a way to restore the original instruction so that execution can resume normally after the breakpoint is hit. This is done exactly in the same way as writing the breakpoint instruction but instead using the saved original instruction.

Mach Exception Handling

With that done, I now needed to focus on the listener which handles mach message requests and replies. This has to be set up on another thread which will only run the event loop when the target is stopped (which happens to be when a mach exception is thrown). One caveat here is that I used a timeout of 100ms for the messages so that I can check if the target process is still alive as mach_msg is blocking. This is due to the fact that if the process exits normally, no message is sent via the exception port.

mach_msg_return_t ret = 0;
__RequestUnion__mach_exc_subsystem msgBuf{};
__ReplyUnion__mach_exc_subsystem rplBuf{};

auto* msg = reinterpret_cast<mach_msg_header_t*>(&msgBuf);
auto* rpl = reinterpret_cast<mach_msg_header_t*>(&rplBuf);

while (State::RUNNING) {
  ret = mach_msg(msg, MACH_RCV_MSG | MACH_RCV_TIMEOUT, 0,
                 sizeof(__RequestUnion__mach_exc_subsystem), exc_port, 100,
                 MACH_PORT_NULL);
  if (ret == MACH_RCV_TIMED_OUT) {
    int status = 0;
    if (waitpid(pid, &status, WNOHANG) > 0) {
      state = State::EXITED;
      if (WIFEXITED(status))
        std::cout << "Target exited with code " << WEXITSTATUS(status)
                  << '\n';
      else if (WIFSIGNALED(status))
        std::cout << "Target killed with signal " << WTERMSIG(status) << '\n';
    }
    continue;
  }
  assert(ret == MACH_MSG_SUCCESS && "Did not receive mach message");

  mach_exc_server(msg, rpl);

  ret = mach_msg(rpl, MACH_SEND_MSG, rpl->msgh_size, 0, MACH_PORT_NULL, 0,
                 MACH_PORT_NULL);
  assert(ret == MACH_MSG_SUCCESS && "Did not send mach message");
}

Another detail to note here is that the main thread blocks while the target is running, allowing for the exception handling to finish first before allowing for another command to be typed in.

Setting a breakpoint

Now with that all done, I am able to finally set a breakpoint like so:

Setting a breakpoint

I can confirm that indeed I hit a breakpoint because of the exception reason being EXC_BREAKPOINT. Additionally, I can confirm that the breakpoint was hit as the program counter is the address provided for the breakpoint with the ASLR offset added. To resume execution, all that is required is two ptrace calls followed by task_resume like so. The thread port that’s used in the ptrace call can be extracted from the mach exception handler function as its passed via the thread parameter. PT_THUPDATE updates the thread state after the exception is handled and PT_CONTINUE signals that execution should continue.

ptrace(PT_THUPDATE, pid, reinterpret_cast<caddr_t>(thread_port), 0);
ptrace(PT_CONTINUE, pid, reinterpret_cast<caddr_t>(1), 0);
task_resume(task);

Conclusion

With all of this in place, my debugger can now set breakpoints, catch them via Mach exceptions, and resume execution. While basic, this covers the core of what any debugger needs to control program execution.

There’s still plenty to improve, mainly persistent breakpoints that survive multiple hits, attaching to running processes, and proper symbol resolution to name a few. But understanding how Mach exceptions and the kernel API work was the hardest part, and hopefully this post helps anyone else trying to navigate Apple’s sparse documentation.

The full source code is available on https://github.com/f1ammable/caesar for anyone wishing to take a deeper look. As always, if you wish to get in touch with me, you can reach me at dm.leirbag@tulas