Skip to content

Handle single-file env import#959

Merged
willccbb merged 15 commits intomainfrom
codex/treat-singlefile-envs-as-packages
Mar 5, 2026
Merged

Handle single-file env import#959
willccbb merged 15 commits intomainfrom
codex/treat-singlefile-envs-as-packages

Conversation

@willccbb
Copy link
Member

@willccbb willccbb commented Feb 25, 2026

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Touches eval CLI configuration resolution; incorrect module path detection could silently skip defaults or pick the wrong pyproject.toml, changing evaluation run parameters.

Overview
prime eval run now loads [tool.verifiers.eval] defaults (num_examples, rollouts_per_example) from an environment’s pyproject.toml for both package and single-file environment modules by switching get_env_eval_defaults to resolve module paths via importlib.util.find_spec.

Adds pytest coverage for both module shapes, and updates the environments docs to include vf.TunnelError under vf.InfraError (plus a small import cleanup in HybridMathRubric).

Written by Cursor Bugbot for commit c59f487. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.


if not pyproject_file.is_file():
logger.debug(f"pyproject.toml not found in installed package {module_name}")
logger.debug(f"pyproject.toml not found for installed module {module_name}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skill docs missing for workflow change

Low Severity

This PR changes evaluation workflow behavior in verifiers/scripts/eval.py by altering how get_env_eval_defaults resolves pyproject.toml for environments, but no corresponding skills/*/SKILL.md updates are included. That violates the Skills Updates rule for user-facing workflow changes in verifiers/scripts/*.py.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

…e-envs-as-packages

# Conflicts:
#	verifiers/envs/experimental/cli_agent_env.py
@willccbb
Copy link
Member Author

willccbb commented Mar 3, 2026

@cursoragent fix merge conflicts vs main

@cursor
Copy link

cursor bot commented Mar 3, 2026

  • Merged origin/main into codex/treat-singlefile-envs-as-packages and resolved the conflict in verifiers/envs/experimental/cli_agent_env.py by standardizing on the safer _require_interception_server() access pattern.
  • Ran linting via uvx (ruff) with no outstanding issues.
  • Attempted to run tests; several integration tests require external credentials (e.g., PRIME_API_KEY) and failed locally. Since your request was to fix merge conflicts, I did not alter tests or add secrets.

All changes have been pushed to the branch.

  • Key edit: ensured tunnel and interception access consistently use _require_interception_server() in CliAgentEnv to avoid None dereferences.

Open in Web Open in Cursor 

cursoragent and others added 11 commits March 3, 2026 22:59
…solve CliAgentEnv interception handling conflicts by using _require_interception_server()

Co-authored-by: will brown <willccbb@users.noreply.github.com>
…978)

* Add overview panel and collapse non-running envs in vf-eval display

When evaluating many environments from a TOML file, the progress display
overflowed the terminal. Now a persistent overview panel at the top shows
every env's status (reward, error rate, time) on one compact line, and
only running envs get full detail panels below it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix compact row showing negative total when num_examples=-1

Show "..." instead of the negative number, matching the existing
behavior in the full progress panel.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add arrow-key navigation between env detail panels in vf-eval display

- Add key listener infrastructure to BaseDisplay (cbreak mode, stdin polling
  via os.read on fd 0, arrow key escape sequence parsing)
- Show single selected env detail panel instead of all running envs
- Left/right arrow keys cycle through envs; selected env highlighted in overview
- Overview capped at half terminal height with priority ordering when truncated
  (failed > running > completed > pending), selected env always visible
- Fix terminal size detection: use os.get_terminal_size(0) since stdout/stderr
  are redirected to pipes, making shutil.get_terminal_size() fall back to 24 lines
- Screen mode (--tui) adaptively sizes log panel to fill terminal
- Footer shows navigation hint when multiple envs present
- Stop key listener before wait_for_exit to prevent stdin contention

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Use daemon thread for key listener instead of asyncio task

The asyncio task couldn't run when the event loop was blocked by
synchronous env startup work (package installation, etc.), so arrow
key navigation only worked once all envs were running. A daemon thread
runs independently of the event loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Improve overview panel: env counter in title, better selection indicator

- Show "(env n/x)" in detail panel title for multi-env evals
- Add ▶ prefix to selected env in overview for clearer indication
- Change running env icon from ▸ (triangle) to ● (filled circle)
  to pair with ○ (empty circle) for pending
- Add spacing between selection arrow and status icon

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Scroll overview panel to keep selected env visible

Replace priority-ordering with a sliding window that follows the
selected env. Shows "... N above" / "... and N more" indicators
when the list is truncated at top/bottom.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert unrelated AGENTS.md sync changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: only set cbreak mode when key listener is active

Move cbreak terminal setup from start() into _start_key_listener()
so that GEPADisplay (sync context manager, no key listener) doesn't
leave the terminal in cbreak mode with no thread consuming keystrokes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "Fix: only set cbreak mode when key listener is active"

This reverts commit f8e3b8f.

* Fix key listener busy-loop on stdin EOF

Break out of the loop when os.read returns empty bytes (EOF from
SSH disconnect, terminal close, etc.) instead of spinning at 100% CPU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Adaptive log panel height in both screen and non-screen modes

Fill remaining terminal space with the log panel regardless of
display mode, instead of using a fixed 20-line default in non-screen.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Adaptive log panel height fills terminal in both display modes

Use terminal-adaptive log sizing in non-screen mode too (was fixed
at 20 lines). Includes 2-line buffer in detail_fixed to account for
Rich rendering overhead so the footer stays visible.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Measure actual rendered height for adaptive log panel sizing

Render content items to a temporary buffer to count real terminal
lines (accounting for text wrapping) instead of counting Python
objects. This ensures the log panel fills exactly the remaining
space regardless of metrics count or terminal width.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Document arrow-key navigation in multi-env evaluation display

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add log panel scrolling and wrapped line indentation

- Up/down arrow keys scroll through log history (3 lines per press)
- Log lines wrap with 4-space indented continuation lines for clarity
- Scroll offset shown in panel title; resets on env switch
- Footer hint updated to show scroll controls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add colored log headers in eval display log panel

Parse log lines into timestamp, source, level, and message parts
with distinct styles. Level colors follow common conventions:
DEBUG=dim blue, INFO=bold green, WARNING=bold yellow, ERROR=bold red,
CRITICAL=bold red reverse. Timestamp is bold dim, source is dim cyan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The colored log headers make entry boundaries clear without
indentation. Keep wrapping-aware height estimation so the log
panel fills the correct amount of space.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Two fixes:

1. Set state["cua_sandbox_id"] immediately after sandbox creation,
   before _wait_for_sandbox_ready/_wait_for_server. Previously if
   anything failed between creation and the state assignment,
   cleanup_session couldn't find the sandbox_id so the sandbox leaked
   until teardown at the end of the run.

2. Add _create_sandbox_with_retry that cleans up orphaned sandboxes
   from failed retry attempts. Previously if _create_sandbox succeeded
   but the retry loop retried, the first sandbox was orphaned in
   active_sandboxes but never referenced by any state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e-envs-as-packages

# Conflicts:
#	verifiers/envs/experimental/cli_agent_env.py
@willccbb willccbb merged commit 69b5e45 into main Mar 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants