Skip to content

docs: roadmap + Phase 1 detailed plan#23

Open
emilf wants to merge 6 commits intomainfrom
docs/roadmap-and-phase1-plan
Open

docs: roadmap + Phase 1 detailed plan#23
emilf wants to merge 6 commits intomainfrom
docs/roadmap-and-phase1-plan

Conversation

@emilf
Copy link
Owner

@emilf emilf commented Mar 23, 2026

What's in here

Two new documentation files produced during a planning session:

docs/roadmap.md

Full project roadmap from current state (Phase 0, which is largely done) through to POSIX compliance. Covers:

  • Phase 0 summary of what's already built
  • Phases 1–10: CPU hardening → scheduling → user mode → syscalls → VFS → ELF loader → libc port → core userspace tools → networking → polish
  • Rough effort estimates per phase
  • Open design questions (fork vs posix_spawn, static musl vs dynamic linking, SMP timing, etc.)

docs/plans/phase1-cpu-platform.md

Detailed breakdown of Phase 1 with specs and completion criteria for every leaf task. Grounded in a full code audit of the current kernel — not guesswork.

Key findings from the audit:

  • TSS/IST already donegdt.rs has 4×16KiB IST stacks, TSS loaded, IDT wired. Marked [x] DONE.
  • x2APIC mode detection already exists — only MSR accessors are missing
  • driver_data casts defined but unused — clean break available now
  • PCI enumeration completely disconnected from DriverManager — xHCI hardcoded
  • No post-boot map/unmap APImapping.rs is boot-only

Suggested implementation order is in the plan. Open to edits on any of the specs or completion criteria.

Rowan (OpenClaw) added 6 commits March 22, 2026 22:33
The previous debug.gdb hardcoded section offsets (TEXT_DELTA, RDATA_DELTA,
DATA_DELTA, etc.) that were stale relative to the current build. .rdata alone
grew to 0x40000 from the old 0x12000, shifting every subsequent section.
This caused theseus-load to pass wrong addresses to add-symbol-file, landing
source-level breakpoints in the wrong locations.

Fix: parse the ELF section header table at GDB startup and compute each
section's delta from the link-time image base dynamically. theseus-load now
builds the add-symbol-file command from live values — correct after any
rebuild with no manual intervention.

Also:
- Add .rodata to the section remapping (was missing entirely)
- Fix PT_LOAD base derivation to exclude gap segments (p_offset == 0)
  which have vaddr - offset != image_base and skewed the min() calculation
Adds scripts/gdb-auto.py — a pexpect-based wrapper that automates the
full GDB debug session for TheseusOS without any manual address copying
or hanging.

## Problem it solves

efi_main loads at a different physical address on every UEFI boot (OVMF
memory allocator is non-deterministic). Previously you had to:
  1. Run QEMU freely once to read "efi_main @ 0x..." from debugcon
  2. Kill it
  3. Restart with -S
  4. Manually type theseus-load <addr> in GDB
  5. Hope the address hadn't shifted again

Also, driving GDB in batch mode (-batch -x script.gdb) against QEMU's
GDB stub has a race: setting a breakpoint sends a Z0/Z1 packet that
causes QEMU's stub to briefly resume the vCPU, leaving GDB's internal
state as "target is running" before the subsequent continue lands.
This caused consistent "Cannot execute this command while the target
is running" failures in batch mode.

## What gdb-auto.py does

  1. Probe run: starts QEMU without -S, tails the debugcon log until
     "efi_main @ 0x<addr>" appears (configurable timeout, never hangs).
  2. Restarts QEMU paused (-S) with a unix-socket GDB stub.
  3. Spawns GDB via pexpect (interactive PTY), sources debug.gdb,
     connects, calls theseus-load with the captured address.
  4. Drops into interactive GDB (or runs non-interactively for CI).

pexpect drives GDB as a real interactive TTY — Ctrl-C via
sendcontrol('c') reliably reaches the remote target. Every expect()
has an explicit timeout so the script never hangs silently.

## Also in this commit

- debug.gdb: sw-only entry breakpoint (hbreak removed) — hardware
  breakpoints via GDB's Z1 packet cause QEMU's stub to resume the
  vCPU as a side-effect, breaking batch/scripted workflows. sw
  breakpoints (int3 via Z0) don't have this issue.

- debug.gdb: add theseus-go command (theseus-load + continue in one
  call, issued in Python context to avoid the batch-mode race).

- Makefile: add debug-auto and debug-auto-ci targets.

## Usage

  make debug-auto              # full interactive session
  make debug-auto-ci           # non-interactive, exits after BP check
  make debug-auto ADDR=0x...   # skip probe run, use known address

  python3 scripts/gdb-auto.py --help   # full options
…ession

Implements a zero-manual-steps GDB debug workflow. No address copying,
no probe-then-restart, works reliably regardless of UEFI load address.

## Rust: debug mailbox (bootloader/src/main.rs, shared/src/constants.rs)

efi_main now writes its own runtime address to a fixed physical page at
boot entry, before any other UEFI calls:

  physical 0x7000 + 0x00  u64  runtime efi_main address
  physical 0x7000 + 0x08  u64  magic sentinel 0xDEADBEEF_CAFEF00D

The page is reserved via UEFI AllocateType::Address so the firmware
records our ownership in the memory map. The address and sentinel
constants live in shared/src/constants.rs::debug_mailbox.

## GDB: theseus-auto command (debug.gdb)

New 'theseus-auto' GDB command that:
  1. Sets a hardware watchpoint on the magic sentinel at 0x7008
  2. Issues 'continue' — UEFI boots, efi_main writes address then magic
  3. Watchpoint fires: reads runtime efi_main from 0x7000
  4. Calls theseus-load with the captured address (correct section deltas)
  5. Returns to GDB prompt — execution is stopped inside efi_main with
     full Rust source-level symbols

No reset required. The watchpoint catches efi_main on its first
execution. The user is dropped exactly at efi_main+81 with symbols.

## Python: gdb-auto.py updated (scripts/gdb-auto.py)

Simplified to a single-run workflow:
  - Start QEMU running (not paused) with TCP GDB stub on :1251
  - Connect GDB, run theseus-auto, wait for watchpoint + symbol load
  - Drop into interactive GDB or report result (--no-interactive)

## Verified

Tested across multiple runs — efi_main loads at different addresses
each time (0x7d0e4873, 0x7d0d7873, 0x7d0d9873 etc.) and the watchpoint
correctly captures the address every run:

  Thread 1 hit Hardware watchpoint 1: *(u64*)0x7008 == 0xdeadbeefcafef00d
  theseus-auto: mailbox fired — runtime efi_main = 0x7d0e4873
  rip = 0x7d0e48c4 <theseus_efi::efi_main+81>
  #2  theseus_efi::efi_main at bootloader/src/main.rs:124

## Usage

  make debug-auto          # interactive session
  make debug-auto-ci       # CI smoke-test, exits after verify
… current workflow

- debug.gdb: replace stale 5-step manual workflow with accurate command
  reference covering theseus-auto, theseus-load, and theseus-go
- debug.gdb: fix theseus-auto docstring — no longer requires -S, no longer
  resets guest, correctly describes single-run watchpoint flow
- gdb-auto.py: fix docstring to say QEMU starts running (not paused -S)
  and accurately describe the full automated sequence
…ioms

- docs/development-and-debugging.md: replace stale one-liner GDB section
  with full guide covering make debug-auto, theseus-auto, theseus-load,
  and the manual fallback workflow; command reference table

- docs/axioms/debug.md: add A3 — the debug mailbox as a binding invariant;
  documents physical layout, ownership via UEFI AllocateType::Address,
  sentinel ordering guarantee, and links to implementing code + tooling;
  renumbers old A3 (runtime monitor) to A4
- docs/roadmap.md: full project roadmap from current state to POSIX
  compliance, covering Phases 0-10 with rough effort estimates and
  open design questions (fork vs spawn, static vs dynamic linking, etc.)

- docs/plans/phase1-cpu-platform.md: detailed Phase 1 breakdown with
  specs and completion criteria for each leaf task, grounded in a full
  code audit of the current kernel. Covers memory subsystem refactor,
  driver formalization, APIC calibration, x2APIC abstraction, and
  CPUID centralization. Notes TSS/IST as already done.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant