Skip to content

feat: real user flow validation scripts + heartbeat timing fixes#282

Open
bussyjd wants to merge 1 commit intomainfrom
feat/autoresearch-flows
Open

feat: real user flow validation scripts + heartbeat timing fixes#282
bussyjd wants to merge 1 commit intomainfrom
feat/autoresearch-flows

Conversation

@bussyjd
Copy link
Collaborator

@bussyjd bussyjd commented Mar 19, 2026

Summary

Introduces pi-autoresearch integration for automated real user flow validation, plus Go bug fixes and doc corrections discovered during the overnight run.

Commit 1: Flow scripts + autoresearch harness

10 bash flow scripts that exercise every documented user journey from docs/getting-started.md and docs/guides/monetize-inference.md:

flows/
├── lib.sh                     # Shared harness (step/pass/fail/poll, Anvil credentials)
├── flow-01-prerequisites.sh   # Docker, Ollama, obol binary
├── flow-02-stack-init-up.sh   # getting-started §1-2
├── flow-03-inference.sh       # getting-started §3a-3d (LiteLLM, tool-calls)
├── flow-04-agent.sh           # getting-started §4-5 (agent init, inference)
├── flow-05-network.sh         # getting-started §6 (network add/remove)
├── flow-06-sell-setup.sh      # monetize §1.1-1.4 (pricing, ServiceOffer, heartbeat)
├── flow-07-sell-verify.sh     # monetize §1.5-1.7 (tunnel, routes, 402, metrics)
├── flow-08-buy.sh             # monetize §2 (discovery, EIP-712 payment, settlement)
├── flow-09-lifecycle.sh       # monetize §4 (stop, delete, cleanup)
└── flow-10-anvil-facilitator.sh  # monetize §3 (Anvil fork, x402-rs facilitator)

All commands use the real obol binary — no go run, no direct kubectl, no pod IP access.

Commit 2: Bug fixes found by the flow scripts

Timing (closes #280):

  • ensureHeartbeatActive() — heartbeat was stuck at 30m default after obol agent init
  • patchHeartbeatAfterSync()SyncAgentBaseURL helmfile sync was resetting heartbeat config
  • readCurrentAgentBaseURL() — skip sync when tunnel URL unchanged (idempotency)

CLI validation:

  • validateRPCEndpoint() in obol network add — reject invalid URLs

Doc corrections:

  • getting-started §3c-3d: add LiteLLM Bearer auth header (was returning 401)
  • monetize §1.6: correct eRPC path to /rpc/evm/{chainId} (local-only, not via tunnel)
  • monetize §2.1: clarify /skill.md (always available) vs /.well-known (requires --register)

Discovery method

Overnight pi-autoresearch session: 44 → 133 steps passing across 50+ experiments. The agent autonomously ran flow scripts, diagnosed failures, fixed Go code and flow scripts, and re-validated — all while go build/go test acted as backpressure.

Test plan

  • go build ./... passes
  • go test ./internal/{agent,tunnel,network,openclaw,model}/ passes
  • pi-autoresearch: 133/133 steps, 50+ runs
  • Manual: bash flows/flow-01-prerequisites.sh through flow-09-lifecycle.sh on fresh cluster

@bussyjd bussyjd force-pushed the feat/autoresearch-flows branch from c040301 to e9683b1 Compare March 19, 2026 10:03
Introduces pi-autoresearch integration with 10 flow scripts that exercise
every documented user journey from getting-started.md and monetize-inference.md.

Flow scripts (flows/):
- flow-01: prerequisites (Docker, Ollama, obol binary)
- flow-02: stack init + up + verify (getting-started §1-2)
- flow-03: LLM inference chain (getting-started §3a-3d)
- flow-04: agent init + inference (getting-started §4-5)
- flow-05: network add/remove (getting-started §6)
- flow-06: sell setup — pricing, ServiceOffer, heartbeat wait (monetize §1.1-1.4)
- flow-07: sell verify — tunnel, routes, 402, metrics (monetize §1.5-1.7)
- flow-08: buy — discovery, 402 parse, EIP-712 payment, settlement (monetize §2)
- flow-09: lifecycle — stop, delete, cleanup verification (monetize §4)
- flow-10: Anvil fork + x402-rs facilitator setup (monetize §3)

Bug fixes found by the flow scripts:
- ensureHeartbeatActive() — heartbeat stuck at 30m default after obol agent init
- patchHeartbeatAfterSync() — SyncAgentBaseURL helmfile sync was resetting heartbeat
- validateRPCEndpoint() — obol network add was accepting invalid URLs
- Doc corrections: LiteLLM auth, eRPC path, /skill.md vs /.well-known

Test account keys are derived at runtime from the Anvil mnemonic via
`cast wallet private-key` — no private keys in source.

Closes #280
@bussyjd bussyjd force-pushed the feat/autoresearch-flows branch from e9683b1 to e56ae6d Compare March 19, 2026 10:09
@bussyjd bussyjd requested a review from OisinKyne March 19, 2026 10:20
@bussyjd bussyjd changed the title feat: autoresearch flow scripts + heartbeat/CLI/doc fixes feat: real user flow validation scripts + heartbeat timing fixes Mar 19, 2026
Copy link
Contributor

@OisinKyne OisinKyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a little unsure of the accuracy of the flows, at least they imply features i dont expect. I also think we probably need the heartbeat to be agent writable. I tried to get my obolclaw to modify key facts about itself and it was unable to (tuning down a 2 minute heartbeat was one). I don't know if giving it the auth to write to a config map is worth it, if instead we can get these files all mounted into the openclaw pv, and then obol cli just messes with the files on the host (e.g. how obol model sync can change the openclaw agent definition).

To that end, i might pull some of the fixes in separately to this pr, so as not to delay waiting for feedback. (the fact that the flow tests aren't in the user hot path makes me less worried about them being imperfect)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by some of the changes in this file (but its a docs so nbd

Comment on lines +44 to +45
# Use qwen3.5:9b — it is configured in LiteLLM's model_list (FLOW_MODEL qwen3:0.6b
# is only registered in Ollama directly; the x402 sell/buy flows use it via that path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qwen3:0.6b, should we have a qwen3.5:?

out=$("$OBOL" kubectl exec -n llm deployment/litellm -c litellm -- \
python3 -c "
import urllib.request
r = urllib.request.urlopen('http://ollama.llm.svc.cluster.local:11434/api/tags', timeout=10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have internal ollama code? maybe we should internalise a llama.cpp pod

fail "LiteLLM inference failed — ${out:0:300}"
fi

# §3d: Tool-call passthrough
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool call at the litellm layer seems like it might confuse things

Comment on lines +147 to +227
func ensureHeartbeatActive(cfg *config.Config, u *ui.UI) error {
namespace := fmt.Sprintf("openclaw-%s", DefaultInstanceID)
kubectlBin := filepath.Join(cfg.BinDir, "kubectl")
kubeconfigPath := filepath.Join(cfg.ConfigDir, "kubeconfig.yaml")
env := append(os.Environ(), fmt.Sprintf("KUBECONFIG=%s", kubeconfigPath))

// Read current ConfigMap.
getCmd := exec.Command(kubectlBin,
"get", "configmap", "openclaw-config",
"-n", namespace,
"-o", "jsonpath={.data.openclaw\\.json}")
getCmd.Env = env
var outBuf bytes.Buffer
getCmd.Stdout = &outBuf
if err := getCmd.Run(); err != nil {
return fmt.Errorf("read openclaw-config: %w", err)
}

var cfgJSON map[string]interface{}
if err := json.Unmarshal(outBuf.Bytes(), &cfgJSON); err != nil {
return fmt.Errorf("parse openclaw.json: %w", err)
}

// Check whether heartbeat is already present.
agents, _ := cfgJSON["agents"].(map[string]interface{})
defaults, _ := agents["defaults"].(map[string]interface{})
_, alreadySet := defaults["heartbeat"]
if alreadySet {
u.Success("Heartbeat config already active")
return nil
}

// Inject heartbeat.
if agents == nil {
agents = map[string]interface{}{}
cfgJSON["agents"] = agents
}
if defaults == nil {
defaults = map[string]interface{}{}
agents["defaults"] = defaults
}
defaults["heartbeat"] = map[string]interface{}{
"every": "5m",
"target": "none",
}

patched, err := json.MarshalIndent(cfgJSON, "", " ")
if err != nil {
return fmt.Errorf("marshal patched config: %w", err)
}

applyPayload := map[string]interface{}{
"apiVersion": "v1",
"kind": "ConfigMap",
"metadata": map[string]interface{}{
"name": "openclaw-config",
"namespace": namespace,
},
"data": map[string]string{
"openclaw.json": string(patched),
},
}
applyRaw, _ := json.Marshal(applyPayload)

applyCmd := exec.Command(kubectlBin,
"apply", "-f", "-",
"--server-side", "--field-manager=helm", "--force-conflicts")
applyCmd.Env = env
applyCmd.Stdin = bytes.NewReader(applyRaw)
var applyErr bytes.Buffer
applyCmd.Stderr = &applyErr
if err := applyCmd.Run(); err != nil {
return fmt.Errorf("patch heartbeat config: %w\n%s", err, applyErr.String())
}

// OpenClaw watches for ConfigMap file changes and hot-reloads config.
// No pod restart is needed: the running pod will detect the update within
// ~30-60s and apply [reload] config hot reload, switching the heartbeat
// interval to 5m immediately without losing the running pod or its state.
u.Success("Heartbeat config injected — OpenClaw hot reload will activate it (every 5m)")
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mounting openclaw stuff like this as a config map means its read only to the runtime. This has ux problems because openclaw can't tune stuff for itself. If instead we do this with files on the host maybe they'll be mutable by the openclaw service?

Comment on lines +1 to +3
{
"workingDir": "/Users/bussyjd/Development/Obol_Workbench/obol-stack/.worktrees/autoresearch"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines +174 to +177
### Session 1 (baseline → 61/61)

**Baseline: 44/57** — 13 failures across all flows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to ./flows or somewhere other than top level if its not for users?

@@ -0,0 +1,48 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subdirectory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: heartbeat timing bugs — reset on sell, missing activation, ConfigMap race

2 participants