Skip to content

fix(gdb): compute section deltas dynamically from BOOTX64.SYM#22

Open
emilf wants to merge 5 commits intomainfrom
fix/gdb-dynamic-section-deltas
Open

fix(gdb): compute section deltas dynamically from BOOTX64.SYM#22
emilf wants to merge 5 commits intomainfrom
fix/gdb-dynamic-section-deltas

Conversation

@emilf
Copy link
Owner

@emilf emilf commented Mar 22, 2026

What this PR does

Three commits, solving the full GDB debugging story for TheseusOS.


Commit 1: Fix dynamic section deltas

debug.gdb had hardcoded section offsets that were stale after the binary grew. .rdata alone shifted from 0x12000 to 0x40000, cascading all subsequent sections. Breakpoints landed at wrong addresses.

Fix: parse ELF section headers at GDB startup, compute deltas dynamically. Also adds .rodata (was missing), fixes PT_LOAD base calculation, switches entry breakpoint from hbreak to break.


Commit 2: pexpect GDB driver + make targets

Added scripts/gdb-auto.py — drives GDB as an interactive PTY via pexpect, with hard timeouts on every wait so it never hangs. Added make debug-auto and make debug-auto-ci.


Commit 3: Debug mailbox + theseus-auto (fully automated)

Solves the root problem: efi_main loads at a different address every boot (OVMF allocator is non-deterministic), so any hardcoded address is wrong.

Solution: efi_main writes its own runtime address to a fixed physical page (0x7000) on entry, then writes a magic sentinel. GDB watches the sentinel with a hardware watchpoint — when it fires, reads the address, loads symbols automatically.

Result: single command, single QEMU run, correct address every time.

Thread 1 hit Hardware watchpoint 1: *(u64*)0x7008 == 0xdeadbeefcafef00d
theseus-auto: mailbox fired — runtime efi_main = 0x7d0e4873
rip = 0x7d0e48c4 <theseus_efi::efi_main+81>
#2  theseus_efi::efi_main at bootloader/src/main.rs:124

Tested across multiple runs with varying load addresses — works every time.


Usage

make debug-auto       # interactive: QEMU starts, watchpoint fires, GDB drops you at efi_main
make debug-auto-ci    # non-interactive: verify breakpoint, print backtrace, exit

Manual (from any terminal):

make debug            # QEMU paused on :1234
gdb -x debug.gdb
(gdb) target remote localhost:1234
(gdb) theseus-auto    # watchpoint + auto symbol load — no address needed

Requirements

pip install --break-system-packages pexpect

Rowan (OpenClaw) added 5 commits March 22, 2026 22:33
The previous debug.gdb hardcoded section offsets (TEXT_DELTA, RDATA_DELTA,
DATA_DELTA, etc.) that were stale relative to the current build. .rdata alone
grew to 0x40000 from the old 0x12000, shifting every subsequent section.
This caused theseus-load to pass wrong addresses to add-symbol-file, landing
source-level breakpoints in the wrong locations.

Fix: parse the ELF section header table at GDB startup and compute each
section's delta from the link-time image base dynamically. theseus-load now
builds the add-symbol-file command from live values — correct after any
rebuild with no manual intervention.

Also:
- Add .rodata to the section remapping (was missing entirely)
- Fix PT_LOAD base derivation to exclude gap segments (p_offset == 0)
  which have vaddr - offset != image_base and skewed the min() calculation
Adds scripts/gdb-auto.py — a pexpect-based wrapper that automates the
full GDB debug session for TheseusOS without any manual address copying
or hanging.

## Problem it solves

efi_main loads at a different physical address on every UEFI boot (OVMF
memory allocator is non-deterministic). Previously you had to:
  1. Run QEMU freely once to read "efi_main @ 0x..." from debugcon
  2. Kill it
  3. Restart with -S
  4. Manually type theseus-load <addr> in GDB
  5. Hope the address hadn't shifted again

Also, driving GDB in batch mode (-batch -x script.gdb) against QEMU's
GDB stub has a race: setting a breakpoint sends a Z0/Z1 packet that
causes QEMU's stub to briefly resume the vCPU, leaving GDB's internal
state as "target is running" before the subsequent continue lands.
This caused consistent "Cannot execute this command while the target
is running" failures in batch mode.

## What gdb-auto.py does

  1. Probe run: starts QEMU without -S, tails the debugcon log until
     "efi_main @ 0x<addr>" appears (configurable timeout, never hangs).
  2. Restarts QEMU paused (-S) with a unix-socket GDB stub.
  3. Spawns GDB via pexpect (interactive PTY), sources debug.gdb,
     connects, calls theseus-load with the captured address.
  4. Drops into interactive GDB (or runs non-interactively for CI).

pexpect drives GDB as a real interactive TTY — Ctrl-C via
sendcontrol('c') reliably reaches the remote target. Every expect()
has an explicit timeout so the script never hangs silently.

## Also in this commit

- debug.gdb: sw-only entry breakpoint (hbreak removed) — hardware
  breakpoints via GDB's Z1 packet cause QEMU's stub to resume the
  vCPU as a side-effect, breaking batch/scripted workflows. sw
  breakpoints (int3 via Z0) don't have this issue.

- debug.gdb: add theseus-go command (theseus-load + continue in one
  call, issued in Python context to avoid the batch-mode race).

- Makefile: add debug-auto and debug-auto-ci targets.

## Usage

  make debug-auto              # full interactive session
  make debug-auto-ci           # non-interactive, exits after BP check
  make debug-auto ADDR=0x...   # skip probe run, use known address

  python3 scripts/gdb-auto.py --help   # full options
…ession

Implements a zero-manual-steps GDB debug workflow. No address copying,
no probe-then-restart, works reliably regardless of UEFI load address.

## Rust: debug mailbox (bootloader/src/main.rs, shared/src/constants.rs)

efi_main now writes its own runtime address to a fixed physical page at
boot entry, before any other UEFI calls:

  physical 0x7000 + 0x00  u64  runtime efi_main address
  physical 0x7000 + 0x08  u64  magic sentinel 0xDEADBEEF_CAFEF00D

The page is reserved via UEFI AllocateType::Address so the firmware
records our ownership in the memory map. The address and sentinel
constants live in shared/src/constants.rs::debug_mailbox.

## GDB: theseus-auto command (debug.gdb)

New 'theseus-auto' GDB command that:
  1. Sets a hardware watchpoint on the magic sentinel at 0x7008
  2. Issues 'continue' — UEFI boots, efi_main writes address then magic
  3. Watchpoint fires: reads runtime efi_main from 0x7000
  4. Calls theseus-load with the captured address (correct section deltas)
  5. Returns to GDB prompt — execution is stopped inside efi_main with
     full Rust source-level symbols

No reset required. The watchpoint catches efi_main on its first
execution. The user is dropped exactly at efi_main+81 with symbols.

## Python: gdb-auto.py updated (scripts/gdb-auto.py)

Simplified to a single-run workflow:
  - Start QEMU running (not paused) with TCP GDB stub on :1251
  - Connect GDB, run theseus-auto, wait for watchpoint + symbol load
  - Drop into interactive GDB or report result (--no-interactive)

## Verified

Tested across multiple runs — efi_main loads at different addresses
each time (0x7d0e4873, 0x7d0d7873, 0x7d0d9873 etc.) and the watchpoint
correctly captures the address every run:

  Thread 1 hit Hardware watchpoint 1: *(u64*)0x7008 == 0xdeadbeefcafef00d
  theseus-auto: mailbox fired — runtime efi_main = 0x7d0e4873
  rip = 0x7d0e48c4 <theseus_efi::efi_main+81>
  #2  theseus_efi::efi_main at bootloader/src/main.rs:124

## Usage

  make debug-auto          # interactive session
  make debug-auto-ci       # CI smoke-test, exits after verify
… current workflow

- debug.gdb: replace stale 5-step manual workflow with accurate command
  reference covering theseus-auto, theseus-load, and theseus-go
- debug.gdb: fix theseus-auto docstring — no longer requires -S, no longer
  resets guest, correctly describes single-run watchpoint flow
- gdb-auto.py: fix docstring to say QEMU starts running (not paused -S)
  and accurately describe the full automated sequence
…ioms

- docs/development-and-debugging.md: replace stale one-liner GDB section
  with full guide covering make debug-auto, theseus-auto, theseus-load,
  and the manual fallback workflow; command reference table

- docs/axioms/debug.md: add A3 — the debug mailbox as a binding invariant;
  documents physical layout, ownership via UEFI AllocateType::Address,
  sentinel ordering guarantee, and links to implementing code + tooling;
  renumbers old A3 (runtime monitor) to A4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant