feat: real user flow validation scripts + heartbeat timing fixes#282
feat: real user flow validation scripts + heartbeat timing fixes#282
Conversation
c040301 to
e9683b1
Compare
Introduces pi-autoresearch integration with 10 flow scripts that exercise every documented user journey from getting-started.md and monetize-inference.md. Flow scripts (flows/): - flow-01: prerequisites (Docker, Ollama, obol binary) - flow-02: stack init + up + verify (getting-started §1-2) - flow-03: LLM inference chain (getting-started §3a-3d) - flow-04: agent init + inference (getting-started §4-5) - flow-05: network add/remove (getting-started §6) - flow-06: sell setup — pricing, ServiceOffer, heartbeat wait (monetize §1.1-1.4) - flow-07: sell verify — tunnel, routes, 402, metrics (monetize §1.5-1.7) - flow-08: buy — discovery, 402 parse, EIP-712 payment, settlement (monetize §2) - flow-09: lifecycle — stop, delete, cleanup verification (monetize §4) - flow-10: Anvil fork + x402-rs facilitator setup (monetize §3) Bug fixes found by the flow scripts: - ensureHeartbeatActive() — heartbeat stuck at 30m default after obol agent init - patchHeartbeatAfterSync() — SyncAgentBaseURL helmfile sync was resetting heartbeat - validateRPCEndpoint() — obol network add was accepting invalid URLs - Doc corrections: LiteLLM auth, eRPC path, /skill.md vs /.well-known Test account keys are derived at runtime from the Anvil mnemonic via `cast wallet private-key` — no private keys in source. Closes #280
e9683b1 to
e56ae6d
Compare
There was a problem hiding this comment.
i'm a little unsure of the accuracy of the flows, at least they imply features i dont expect. I also think we probably need the heartbeat to be agent writable. I tried to get my obolclaw to modify key facts about itself and it was unable to (tuning down a 2 minute heartbeat was one). I don't know if giving it the auth to write to a config map is worth it, if instead we can get these files all mounted into the openclaw pv, and then obol cli just messes with the files on the host (e.g. how obol model sync can change the openclaw agent definition).
To that end, i might pull some of the fixes in separately to this pr, so as not to delay waiting for feedback. (the fact that the flow tests aren't in the user hot path makes me less worried about them being imperfect)
There was a problem hiding this comment.
I'm a little confused by some of the changes in this file (but its a docs so nbd
| # Use qwen3.5:9b — it is configured in LiteLLM's model_list (FLOW_MODEL qwen3:0.6b | ||
| # is only registered in Ollama directly; the x402 sell/buy flows use it via that path) |
There was a problem hiding this comment.
qwen3:0.6b, should we have a qwen3.5:?
| out=$("$OBOL" kubectl exec -n llm deployment/litellm -c litellm -- \ | ||
| python3 -c " | ||
| import urllib.request | ||
| r = urllib.request.urlopen('http://ollama.llm.svc.cluster.local:11434/api/tags', timeout=10) |
There was a problem hiding this comment.
Do we have internal ollama code? maybe we should internalise a llama.cpp pod
| fail "LiteLLM inference failed — ${out:0:300}" | ||
| fi | ||
|
|
||
| # §3d: Tool-call passthrough |
There was a problem hiding this comment.
Tool call at the litellm layer seems like it might confuse things
| func ensureHeartbeatActive(cfg *config.Config, u *ui.UI) error { | ||
| namespace := fmt.Sprintf("openclaw-%s", DefaultInstanceID) | ||
| kubectlBin := filepath.Join(cfg.BinDir, "kubectl") | ||
| kubeconfigPath := filepath.Join(cfg.ConfigDir, "kubeconfig.yaml") | ||
| env := append(os.Environ(), fmt.Sprintf("KUBECONFIG=%s", kubeconfigPath)) | ||
|
|
||
| // Read current ConfigMap. | ||
| getCmd := exec.Command(kubectlBin, | ||
| "get", "configmap", "openclaw-config", | ||
| "-n", namespace, | ||
| "-o", "jsonpath={.data.openclaw\\.json}") | ||
| getCmd.Env = env | ||
| var outBuf bytes.Buffer | ||
| getCmd.Stdout = &outBuf | ||
| if err := getCmd.Run(); err != nil { | ||
| return fmt.Errorf("read openclaw-config: %w", err) | ||
| } | ||
|
|
||
| var cfgJSON map[string]interface{} | ||
| if err := json.Unmarshal(outBuf.Bytes(), &cfgJSON); err != nil { | ||
| return fmt.Errorf("parse openclaw.json: %w", err) | ||
| } | ||
|
|
||
| // Check whether heartbeat is already present. | ||
| agents, _ := cfgJSON["agents"].(map[string]interface{}) | ||
| defaults, _ := agents["defaults"].(map[string]interface{}) | ||
| _, alreadySet := defaults["heartbeat"] | ||
| if alreadySet { | ||
| u.Success("Heartbeat config already active") | ||
| return nil | ||
| } | ||
|
|
||
| // Inject heartbeat. | ||
| if agents == nil { | ||
| agents = map[string]interface{}{} | ||
| cfgJSON["agents"] = agents | ||
| } | ||
| if defaults == nil { | ||
| defaults = map[string]interface{}{} | ||
| agents["defaults"] = defaults | ||
| } | ||
| defaults["heartbeat"] = map[string]interface{}{ | ||
| "every": "5m", | ||
| "target": "none", | ||
| } | ||
|
|
||
| patched, err := json.MarshalIndent(cfgJSON, "", " ") | ||
| if err != nil { | ||
| return fmt.Errorf("marshal patched config: %w", err) | ||
| } | ||
|
|
||
| applyPayload := map[string]interface{}{ | ||
| "apiVersion": "v1", | ||
| "kind": "ConfigMap", | ||
| "metadata": map[string]interface{}{ | ||
| "name": "openclaw-config", | ||
| "namespace": namespace, | ||
| }, | ||
| "data": map[string]string{ | ||
| "openclaw.json": string(patched), | ||
| }, | ||
| } | ||
| applyRaw, _ := json.Marshal(applyPayload) | ||
|
|
||
| applyCmd := exec.Command(kubectlBin, | ||
| "apply", "-f", "-", | ||
| "--server-side", "--field-manager=helm", "--force-conflicts") | ||
| applyCmd.Env = env | ||
| applyCmd.Stdin = bytes.NewReader(applyRaw) | ||
| var applyErr bytes.Buffer | ||
| applyCmd.Stderr = &applyErr | ||
| if err := applyCmd.Run(); err != nil { | ||
| return fmt.Errorf("patch heartbeat config: %w\n%s", err, applyErr.String()) | ||
| } | ||
|
|
||
| // OpenClaw watches for ConfigMap file changes and hot-reloads config. | ||
| // No pod restart is needed: the running pod will detect the update within | ||
| // ~30-60s and apply [reload] config hot reload, switching the heartbeat | ||
| // interval to 5m immediately without losing the running pod or its state. | ||
| u.Success("Heartbeat config injected — OpenClaw hot reload will activate it (every 5m)") | ||
| return nil |
There was a problem hiding this comment.
Mounting openclaw stuff like this as a config map means its read only to the runtime. This has ux problems because openclaw can't tune stuff for itself. If instead we do this with files on the host maybe they'll be mutable by the openclaw service?
| { | ||
| "workingDir": "/Users/bussyjd/Development/Obol_Workbench/obol-stack/.worktrees/autoresearch" | ||
| } |
| ### Session 1 (baseline → 61/61) | ||
|
|
||
| **Baseline: 44/57** — 13 failures across all flows. | ||
|
|
There was a problem hiding this comment.
Should we move this to ./flows or somewhere other than top level if its not for users?
| @@ -0,0 +1,48 @@ | |||
| #!/bin/bash | |||
Summary
Introduces pi-autoresearch integration for automated real user flow validation, plus Go bug fixes and doc corrections discovered during the overnight run.
Commit 1: Flow scripts + autoresearch harness
10 bash flow scripts that exercise every documented user journey from
docs/getting-started.mdanddocs/guides/monetize-inference.md:All commands use the real
obolbinary — nogo run, no directkubectl, no pod IP access.Commit 2: Bug fixes found by the flow scripts
Timing (closes #280):
ensureHeartbeatActive()— heartbeat was stuck at 30m default afterobol agent initpatchHeartbeatAfterSync()—SyncAgentBaseURLhelmfile sync was resetting heartbeat configreadCurrentAgentBaseURL()— skip sync when tunnel URL unchanged (idempotency)CLI validation:
validateRPCEndpoint()inobol network add— reject invalid URLsDoc corrections:
/rpc/evm/{chainId}(local-only, not via tunnel)/skill.md(always available) vs/.well-known(requires--register)Discovery method
Overnight pi-autoresearch session: 44 → 133 steps passing across 50+ experiments. The agent autonomously ran flow scripts, diagnosed failures, fixed Go code and flow scripts, and re-validated — all while
go build/go testacted as backpressure.Test plan
go build ./...passesgo test ./internal/{agent,tunnel,network,openclaw,model}/passesbash flows/flow-01-prerequisites.shthroughflow-09-lifecycle.shon fresh cluster