Skip to content

feat: autoresearch-at-home integration — GPU marketplace + optimized inference selling #264

@bussyjd

Description

@bussyjd

Vision

Three-sided marketplace on obol-stack powered by autoresearch and its distributed fork autoresearch-at-home:

  1. GPU contributors sell compute time to the autoresearch swarm, paid via x402
  2. Researchers run distributed experiments across GPU workers, discovering them via ERC-8004
  3. Service builders take autoresearch-optimized models and sell apps/inference on top via x402

Context

Autoresearch is Andrej Karpathy's autonomous LLM optimization framework — an AI agent iterates on train.py (architecture, hyperparams, optimizer), trains for 5 minutes per experiment, measures val_bpb (bits per byte), keeps improvements, reverts failures. ~12 experiments/hour, ~100 overnight.

Autoresearch-at-home is the SETI@home-style fork: multiple agents on different GPUs collaborating through a shared coordination layer (currently Ensue). Adds claiming, result publishing, global best tracking, collective intelligence.

Obol-stack already has the payment and discovery infrastructure: ServiceOffer CRD, x402 ForwardAuth, ERC-8004 registration, buy-side sidecar. The integration connects autoresearch's GPU demand with obol-stack's payment rails.

User Journeys

Journey 1: GPU Contributor (earn money)

Run worker_api.py on bare metal GPU
→ obol sell http gpu-worker --upstream localhost:8080 --per-hour 0.50
→ x402 gates your GPU → researchers pay per-experiment
→ Worker registered on ERC-8004 with OASF skill: machine_learning/model_optimization

Journey 2: Researcher (optimize models)

coordinate.py discover → find GPU workers on 8004scan
→ coordinate.py loop train.py → submit experiments through x402
→ Collect best val_bpb model → publish.py → sell optimized inference
→ Provenance (val_bpb, train hash, param count) flows into registration metadata

Journey 3: Service Builder (build apps on optimized models) ⚠️ GAP

Take autoresearch-optimized model → build web app (CV enhancer, code reviewer, etc.)
→ obol sell http my-app --upstream localhost:3000 --per-request 0.05
→ Users hit frontend, pay via x402, get the service

What's Implemented (this branch)

Phase 1: Provenance + Skills

  • spec.provenance field on ServiceOffer CRD (framework, metric, experimentId, trainHash, paramCount)
  • --provenance-file flag on obol sell inference and obol sell http
  • Provenance injected into .well-known/agent-registration.json by monetize.py
  • New embedded skill: autoresearch (SKILL.md + publish.py + references)
  • New embedded skill: autoresearch-coordinator (SKILL.md + coordinate.py + references)

Phase 2: GPU Marketplace

  • worker_api.py — Flask HTTP API wrapping train.py (POST /experiment, GET /health, GET /status, GET /best)
  • Dockerfile.worker — CUDA 12.4 container for the worker
  • Coordinator reimplements Ensue's THINK/CLAIM/RUN/PUBLISH using 8004scan discovery + x402 payments
  • GPU workers sold via existing obol sell http (no new CRD type needed)

Design Decision: Bare Metal GPU

k3d doesn't support GPU passthrough. Workers run on the host, obol-stack proxies via --upstream http://host.k3d.internal:<port>.

The Gap: App-on-Top-of-Inference (Journey 3)

Today you can sell raw inference (obol sell inference) or gate any HTTP service (obol sell http). But there's no scaffolding for the common pattern:

"I want to build a web app that uses an LLM internally and charge users per-use via x402"

For example, a CV enhancer service:

  • Frontend: upload form for resumes
  • Backend: calls the in-cluster LiteLLM with an autoresearch-optimized model
  • Payment: x402 gates the whole service per-request

What's missing:

  1. App template / scaffold — no obol app create that generates a web app skeleton with LLM backend wired up
  2. Internal LLM routing — the app needs to call LiteLLM internally (no x402 on internal calls) while the app itself is x402-gated externally
  3. Frontend payment UX — no x402 payment widget/SDK for browser-based payment flows (today x402 is API-to-API)
  4. Deployment pattern — no documented pattern for "deploy my app container into the cluster and gate it"

Proposed solution direction:

  • obol app create <name> --template inference-app — scaffolds a Next.js/Flask app with LiteLLM client pre-configured
  • App deployed into cluster with internal access to litellm.llm.svc:4000 (no payment on internal calls)
  • obol sell http <name> --upstream <app-svc> gates the external-facing endpoint
  • x402 browser SDK or payment redirect flow for end-user UX

This needs proper spec work — filing separately or expanding here once we have a concrete prototype (starting with a CV enhancer).

Phase 3 (Future, not in scope)

  • On-chain experiment registry smart contract (experiments, results, bounties)
  • GPU metering sidecar (x402-meter) for time-based billing
  • IPFS/Filecoin model artifact storage (content-addressed by hash)
  • Frontend: leaderboard page, experiment dashboard, GPU marketplace view

Ensue → obol-stack Mapping

Ensue concept obol-stack replacement
join_hub() obol sell http (register GPU as ServiceOffer)
claim_experiment() POST /experiment to discovered worker via x402
publish_result() ERC-8004 setMetadata("provenance", {...})
pull_best_config() Query 8004scan API for best val_bpb
ask_swarm() Query registered agents' .well-known metadata
Leaderboard 8004scan filtered by machine_learning/model_optimization skill

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions