Skip to content

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Feb 4, 2026

Summary

Adds macOS support to hypeman using Apple's Virtualization.framework via the Code-Hex/vz library.

Key changes:

  • Platform abstraction: Split Linux-specific code (resources, devices, network, vmm, ingress) into _linux.go files with stub _darwin.go counterparts
  • vz hypervisor: New lib/hypervisor/vz/ package implementing Hypervisor and VMStarter interfaces
  • vz-shim subprocess: New cmd/vz-shim/ binary that hosts VMs in a separate process, allowing VMs to survive hypeman restarts (mirrors cloud-hypervisor architecture)
  • vz vsock proxy: Uses Cloud Hypervisor-compatible text protocol (CONNECT {port}\nOK {port}\n) over Unix socket
  • ClientFactory pattern: New hypervisor.NewClient() for uniform hypervisor client creation across all VMM types
  • Guest system updates: hvc0 console support, cross-compiled binary embedding for darwin builds
  • OCI image handling: Pull linux/arm64 images regardless of host platform

Architecture (vz-shim subprocess model):

┌─────────────────────────────────────────────────────────────┐
│                        hypeman                               │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ vz.Starter   │    │ vz.Client    │    │ VsockDialer  │  │
│  │ (spawn shim) │    │ (HTTP→shim)  │    │ (vsock proxy)│  │
│  └──────┬───────┘    └──────┬───────┘    └──────┬───────┘  │
└─────────┼───────────────────┼───────────────────┼──────────┘
          │                   │                   │
          │ spawn             │ vz.sock           │ vz.vsock
          ▼                   ▼                   ▼
┌─────────────────────────────────────────────────────────────┐
│                       vz-shim (PID survives restart)        │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ HTTP Server  │    │ Vsock Proxy  │    │ vz.VM        │  │
│  │ (control API)│    │ (CH protocol)│    │ (actual VM)  │  │
│  └──────────────┘    └──────────────┘    └──────────────┘  │
└─────────────────────────────────────────────────────────────┘

Platform comparison:

Aspect Linux (CH) Linux (QEMU) macOS (vz)
Process model External process External process vz-shim subprocess
Networking Bridge + TAP Bridge + TAP Built-in NAT (192.168.64.0/24)
Vsock Unix socket (text protocol) AF_VSOCK (kernel) Unix socket (text protocol)
Persistence Survives restart Survives restart Survives restart
Snapshot Yes Yes No (Linux guests not supported)

Snapshot limitation

Virtualization.framework does not support save/restore for Linux guest VMs - only macOS guests work. This is an undocumented Apple limitation confirmed by other projects:

The snapshot infrastructure is implemented in vz-shim for potential future macOS guest support, but the capability is correctly disabled for Linux guests.

vz-shim API:

The shim exposes a Cloud Hypervisor-compatible HTTP API on Unix socket:

  • GET /api/v1/vm.info - VM state and configuration
  • PUT /api/v1/vm.pause - Pause VM
  • PUT /api/v1/vm.resume - Resume VM
  • PUT /api/v1/vm.shutdown - Graceful shutdown
  • PUT /api/v1/vm.power-button - ACPI power button
  • PUT /api/v1/vm.snapshot - Save VM state (infrastructure for macOS guests)
  • GET /api/v1/vmm.ping - Health check
  • PUT /api/v1/vmm.shutdown - Terminate shim

Vsock proxy uses same text-based handshake as Cloud Hypervisor:

  • Client sends: CONNECT {port}\n
  • Server responds: OK {port}\n
  • Then bidirectional data flow

CI Considerations

Current CI uses self-hosted Linux runners with KVM. For macOS:

  • GitHub-hosted macOS runners (macos-14) support Apple Silicon but may lack virtualization entitlements
  • Self-hosted macOS runners would be needed for full VM testing
  • Recommendation: Add a build-only job on macos-14 to verify compilation, defer VM tests to self-hosted

Example addition to test.yml:

  build-macos:
    runs-on: macos-14
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v6
        with:
          go-version: '1.25'
      - name: Build (macOS)
        run: make build-darwin

Test plan

  • hypeman resources - verified resource detection
  • hypeman pull - verified linux/arm64 image pull
  • hypeman run - verified VM creation and boot
  • hypeman exec - verified command execution in VM
  • hypeman ps - verified instance listing
  • hypeman build - verified Dockerfile→VM image build
  • hypeman ingress - verified external access to VM services
  • hypeman rm - verified instance cleanup
  • VM persistence - verified VM survives hypeman restart
  • VM reconnection - verified hypeman reconnects to running VM after restart
  • Snapshot limitation - verified and documented (Apple framework limitation)
  • Linux regression testing (CI will cover)

Note

High Risk
Large cross-cutting, platform-level change that introduces a new hypervisor backend and OS-specific build/runtime paths; failures could impact VM lifecycle, networking, image/build behavior, and production stability if macOS support codepaths are exercised.

Overview
Enables experimental native macOS (Darwin/arm64) support by introducing a new vz hypervisor implementation backed by a codesigned vz-shim subprocess (Cloud Hypervisor-compatible control API + vsock proxy), and wiring hypervisor client creation through a new hypervisor.RegisterClientFactory/hypervisor.NewClient path.

Refactors multiple subsystems for cross-platform compilation: adds Darwin stubs and Linux build tags for networking (NAT/no bridge), device passthrough (unsupported), embedded ingress binaries (Linux-only; macOS uses caddy from PATH), and resource detection (sysctl/statfs on macOS). Instance creation adjusts kernel console args (hvc0 for vz) and vsock socket paths.

Improves the build system by splitting make build/dev/test into OS-specific targets, adding macOS hot-reload config/signing helpers and CI jobs, cross-compiling embedded guest binaries for Linux, and adding .env.darwin.example docs. Build pipeline behavior changes include: Linux-guest OCI image resolution regardless of host OS, ext4 disk image sector alignment for vz, build log streaming from builder VMs over vsock, and background auto-build of the builder image via a configurable DOCKER_SOCKET with a “retry shortly” guard while the image is prepared.

Written by Cursor Bugbot for commit 9985f40. This will update automatically on new commits. Configure here.

rgarcia and others added 8 commits February 2, 2026 17:09
Move Linux-specific resource detection (CPU, memory, disk, network) and
device management (discovery, mdev, vfio) into _linux.go files. Add stub
_darwin.go files that return empty/unsupported results for macOS.

This is a pure refactoring with no functional changes on Linux. Prepares
the codebase for macOS support where these Linux-specific features
(cgroups, sysfs, VFIO) are not available.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move Linux bridge/TAP networking into bridge_linux.go. Add bridge_darwin.go
stub since macOS vz uses built-in NAT networking. Extract shared IP allocation
logic to ip.go.

Move VMM binary detection (cloud-hypervisor, qemu paths) into binaries_linux.go.
Add binaries_darwin.go that returns empty paths since vz is in-process.

No functional changes on Linux.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move ingress binary embedding into platform-specific files. Update build
tags on architecture-specific files to also include OS constraint.

Replace checkKVMAccess() with platform-agnostic checkHypervisorAccess():
- Linux: checks /dev/kvm access (existing behavior)
- macOS: verifies ARM64 arch and Virtualization.framework availability

No functional changes on Linux.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ions

Add GetVsockDialer() to instance manager interface. This abstraction handles
the difference between:
- Linux: socket-based vsock (AF_VSOCK or Unix socket proxy)
- macOS vz: in-process vsock via VirtualMachine object

Update API handlers (exec, cp, instances) and build manager to use
GetVsockDialer() instead of directly creating vsock connections.

Add DialVsock() method to VsockDialer interface for explicit dialing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement Hypervisor and VMStarter interfaces using github.com/Code-Hex/vz/v3
library for Apple's Virtualization.framework.

Key differences from Linux hypervisors:
- In-process: VMs run within hypeman process (no separate PID)
- NAT networking: Uses vz built-in NAT (192.168.64.0/24)
- Direct vsock: Connects via VirtualMachine object, not socket files
- Snapshot support: Available on macOS 14+ ARM64

Registers vz starter on macOS via init() in hypervisor_darwin.go.
Linux hypervisor_linux.go is a no-op placeholder.

Co-authored-by: Cursor <cursoragent@cursor.com>
Guest init changes:
- Add hvc0 serial console support (vz uses hvc0, not ttyS0)
- Prioritize /dev/hvc0 for console output in logger and mount

Binary embedding:
- Add darwin-specific embed files for cross-compiled linux/arm64 binaries
- Guest init and agent binaries are embedded when building on macOS

OCI image handling:
- Add vmPlatform() to return linux/arm64 for VM images regardless of host
- Fixes image pull on macOS which would otherwise request darwin/arm64

Instance lifecycle:
- Track active hypervisors for vz (needed for in-process VM references)
- Handle vz-specific cleanup in delete (no PID to kill)
- Support vz in instance queries

Co-authored-by: Cursor <cursoragent@cursor.com>
Build system:
- Add macOS targets to Makefile (build-darwin, run, sign)
- Add .air.darwin.toml for live reload on macOS
- Add vz.entitlements for Virtualization.framework code signing
- Add .env.darwin.example with macOS-specific configuration

Documentation:
- Update DEVELOPMENT.md with macOS setup instructions
- Update README.md to mention macOS support
- Update lib/hypervisor/README.md with vz implementation details
- Update lib/instances/README.md for multi-hypervisor support
- Update lib/network/README.md with platform comparison

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Changes the vz hypervisor from in-process to subprocess model, allowing
VMs to survive hypeman restarts. Mirrors the cloud-hypervisor architecture.

Key changes:
- Add cmd/vz-shim binary that hosts vz VMs in a subprocess
- Shim exposes HTTP API on Unix socket for VM control (matching CH pattern)
- Shim exposes vsock proxy on separate Unix socket using CH protocol
- Update vz starter to spawn shim subprocess instead of in-process VM
- Add vz.Client implementing Hypervisor interface via HTTP to shim
- Update VsockDialer to use Unix socket proxy instead of in-process VM
- Add hypervisor.ClientFactory for uniform hypervisor client creation
- Remove activeHypervisors tracking (no longer needed)
- Simplify vsock_darwin.go (vz now uses same socket pattern as other hypervisors)
- Update Makefile to build and sign vz-shim binary

Co-authored-by: Cursor <cursoragent@cursor.com>
…ation

Add snapshot save/restore infrastructure to vz-shim:
- Snapshot endpoint in shim server (vm.snapshot)
- RestoreVM implementation in starter (loads config from metadata.json)
- Snapshot method in client (adapts directory path to file path)

Document Virtualization.framework limitation:
- Linux guest VMs cannot be reliably saved/restored
- Only macOS guests support this functionality
- This is an undocumented Apple limitation confirmed by Tart and UTM projects
- References: Tart #1177, #796; UTM #6654

The infrastructure is in place for potential future macOS guest support
while correctly disabling snapshot capability for Linux guests.

Also improves MAC address handling and error logging in vm.go.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Makefile: prepend e2fsprogs sbin to PATH for dev-darwin and run-darwin
  targets so mkfs.ext4 is found without requiring shell profile changes
- manager.go: don't cache failed image builds, clean up failed build
  directory to allow retries after fixing the underlying issue
@cursor

This comment has been minimized.

Add -registry-push flag to gen-jwt that adds repo_access claims to the
JWT, enabling push permissions for specific repositories. This is needed
to push the builder image to Hypeman's internal registry during local
development.

Also add documentation for the builder image setup workflow in
DEVELOPMENT.md, covering the full process from building the image to
configuring BUILDER_IMAGE.
Instead of collecting all logs and sending them at the end of a build,
stream log lines incrementally from the builder agent to the manager
via vsock "log" messages. The manager appends each log line to the
build log file immediately, enabling real-time log streaming to clients
via the SSE endpoint.

Changes:
- Add streamingLogWriter in builder agent with channel-based streaming
- Add markClosed() mechanism to prevent panic from writes after channel close
- Handle "log" message type in manager's waitForResult
- Remove redundant final log save since logs are now streamed incrementally
- Remove unused datasize import (CI failure)
- Fix network devices overwritten in loop: collect all devices then set once
- Use exec.Command instead of CommandContext so vz-shim survives context cancel
- Map unknown VM states to StateShutdown instead of StateRunning
- Add stale unix socket cleanup before net.Listen in vz-shim
- Clarify Intel Mac rejection is intentional (kernel panics, no nested virt)
- Merge identical vsock_darwin.go/vsock_linux.go into single vsock.go
- Remove dead dialBuilderVsock code and bufferedConn type

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cursor

This comment has been minimized.

rgarcia and others added 2 commits February 8, 2026 10:30
- Hold closedMu RLock through channel send in streamingLogWriter to
  prevent panic from sending on a closed channel
- Remove unused macOSNetworkConfig, GetMacOSNetworkConfig, IsMacOS,
  and ErrRateLimitNotSupported from bridge_darwin.go (vz uses
  framework-level NAT, rate limiting will never apply)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The tmp dir fallback constructed the same path as the initial check,
making it unreachable dead code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cursor

This comment has been minimized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 8, 2026

✱ Stainless preview builds

This PR will update the hypeman SDKs with the following commit message.

feat: add macOS support via Virtualization.framework (vz)

Edit this comment to update it. It will appear in the SDK's changelogs.

hypeman-typescript studio · code · diff

Your SDK built successfully.
generate ⚠️build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/hypeman-typescript/c5f4a2777fc57ada941611eec78e8480413ca298/dist.tar.gz
hypeman-go studio · code · diff

Your SDK built successfully.
generate ⚠️lint ✅test ✅

go get github.com/stainless-sdks/hypeman-go@213b748abb9350688d287881c472b9cb50f110ed
hypeman-openapi studio · code · diff

Your SDK built successfully.
generate ⚠️

hypeman-cli studio · conflict

⏳ These are partial results; builds are still running.


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-02-10 01:35:04 UTC

- install.sh: OS branching for macOS (launchd, codesign, Docker socket
  auto-detection, arm64 check, ~/Library paths)
- uninstall.sh: macOS support (launchctl, vz-shim cleanup)
- Makefile: platform-aware build/test targets (Darwin vs Linux)
- Build manager: auto-build builder image on startup via background
  goroutine with atomic readiness gate and DOCKER_SOCKET config
- CI: test-darwin job on self-hosted macOS ARM64 runner with per-run
  DATA_DIR isolation; e2e-install job for install/uninstall cycle
- e2e-install-test.sh: platform-agnostic install → verify → uninstall
- DEVELOPMENT.md: document DOCKER_SOCKET and builder auto-build

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lock

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…discovery

- Add //go:build linux to vm_metrics and vmm test files
- Change t.Fatal to t.Skip for /dev/kvm checks across all test files
- Skip TestNetworkResource_Allocated on non-Linux (no rate limiting)
- test-darwin uses go list to discover only compilable packages
- Use separate concurrency groups per CI job

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cursor

This comment has been minimized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cursor

This comment has been minimized.

rgarcia and others added 2 commits February 8, 2026 15:29
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

// Wait for one direction to close
<-done
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vsock proxy exits after single direction closes

Medium Severity

The vsock proxy in handleVsockConnection waits for only one direction to close (<-done), then returns — closing both connections via defer. The second goroutine may still be copying data in the other direction. This can cause premature truncation of data still in flight (e.g., a response being written back after the client closes its write side). Waiting for both directions to finish would ensure complete data transfer.

Fix in Cursor Fix in Web

@cursor

This comment has been minimized.

rgarcia and others added 11 commits February 8, 2026 15:44
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The find_release_with_artifact function returns 1 when no darwin CLI
artifact exists, which under set -e kills the script before the
empty-check can handle it gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
launchd doesn't perform shell expansion, so ~ in DATA_DIR causes
"mkdir ~: read-only file system" when the service starts. Also fix
JWT_SECRET sed pattern to match the darwin template's default value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The uninstall script needs sudo on macOS when binaries were installed
to /usr/local/bin with elevated privileges. Also ensures the tilde
expansion fix is included.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The service requires Caddy for the ingress manager, and on macOS it's
not embedded - it must be in PATH.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
launchd doesn't inherit the user's PATH, so Homebrew-installed
binaries like caddy aren't found. Add standard Homebrew paths to
the plist's EnvironmentVariables.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous check used /v1/instances with a fake Bearer token which
would be rejected, and / which returns 404. Use the unauthenticated
/health endpoint instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The busybox image at queue position 0 can finish quickly, causing
alpine to transition from pending to pulling before the second
CreateImage call. Accept pulling as a valid idempotent status.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove snapshot/restore code from vz client, starter, and shim
- Extract doPut/doGet HTTP helpers in vz client
- Embed vz-shim binary via //go:embed (matching guest-agent pattern)
- Extract ShimConfig/DiskConfig/NetworkConfig to shared shimconfig package
- Use VsockSocket path directly instead of deriving from control socket
- Set console=hvc0 for vz instances, vz.vsock as vsock socket name
- Rename additionalStarters to platformStarters, move registration to init
- Split vm_metrics collector into platform-specific files (darwin stubs)
- Add context.Context to downloadKernelHeaders in initrd.go
- Extract shared disk resource code to disk.go, platform-specific NewDiskResource
- Consolidate guest_agent_binary/init_binary (remove needless darwin split)
- Remove dead sysfs constants from discovery_darwin.go
- Clean up verbose/unnecessary comments across multiple files
- Revert unnecessary queryHypervisorState extraction in query.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The //go:embed directive expects the binary at
lib/hypervisor/vz/vz-shim/vz-shim. Update build-vz-shim target to
copy the binary there after building, add it to .gitignore, and
clean it up in make clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issue.

go func() {
io.Copy(conn, guestConn)
done <- struct{}{}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vsock proxy reads from raw conn instead of reader

Low Severity

In handleVsockConnection, the handshake is read using a bufio.NewReader(conn), but the bidirectional proxy uses io.Copy(guestConn, conn) reading from the raw conn. Any data buffered by the bufio.Reader beyond the handshake line would be silently lost. The client-side implementations in both lib/hypervisor/vz/vsock.go and lib/hypervisor/cloudhypervisor/vsock.go correctly use a bufferedConn wrapper for this exact reason.

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Feb 10, 2026

Bugbot Autofix prepared fixes for 1 of the 1 bugs found in the latest run.

  • ✅ Fixed: Vsock proxy reads from raw conn instead of reader
    • Changed io.Copy(guestConn, conn) to io.Copy(guestConn, reader) so that any data buffered by the bufio.Reader during handshake parsing is not lost in the bidirectional proxy.

Create PR

Or push these changes by commenting:

@cursor push 692e2e5f1c
Preview (692e2e5f1c)
diff --git a/cmd/vz-shim/server.go b/cmd/vz-shim/server.go
--- a/cmd/vz-shim/server.go
+++ b/cmd/vz-shim/server.go
@@ -246,7 +246,7 @@
 	done := make(chan struct{}, 2)
 
 	go func() {
-		io.Copy(guestConn, conn)
+		io.Copy(guestConn, reader)
 		done <- struct{}{}
 	}()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant