Handle single-file env import by willccbb · Pull Request #959 · PrimeIntellect-ai/verifiers

willccbb · 2026-02-25T06:13:54Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Medium Risk
Touches eval CLI configuration resolution; incorrect module path detection could silently skip defaults or pick the wrong pyproject.toml, changing evaluation run parameters.

Overview
prime eval run now loads [tool.verifiers.eval] defaults (num_examples, rollouts_per_example) from an environment’s pyproject.toml for both package and single-file environment modules by switching get_env_eval_defaults to resolve module paths via importlib.util.find_spec.

Adds pytest coverage for both module shapes, and updates the environments docs to include vf.TunnelError under vf.InfraError (plus a small import cleanup in HybridMathRubric).

^{Written by Cursor Bugbot for commit c59f487. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-25T06:28:14Z

verifiers/scripts/eval.py


        if not pyproject_file.is_file():
-            logger.debug(f"pyproject.toml not found in installed package {module_name}")
+            logger.debug(f"pyproject.toml not found for installed module {module_name}")


Skill docs missing for workflow change

Low Severity

This PR changes evaluation workflow behavior in verifiers/scripts/eval.py by altering how get_env_eval_defaults resolves pyproject.toml for environments, but no corresponding skills/*/SKILL.md updates are included. That violates the Skills Updates rule for user-facing workflow changes in verifiers/scripts/*.py.

^{Triggered by project rule: BugBot Instructions}

…e-envs-as-packages # Conflicts: # verifiers/envs/experimental/cli_agent_env.py

willccbb · 2026-03-03T22:57:19Z

@cursoragent fix merge conflicts vs main

cursor · 2026-03-03T22:57:25Z

Merged origin/main into codex/treat-singlefile-envs-as-packages and resolved the conflict in verifiers/envs/experimental/cli_agent_env.py by standardizing on the safer _require_interception_server() access pattern.
Ran linting via uvx (ruff) with no outstanding issues.
Attempted to run tests; several integration tests require external credentials (e.g., PRIME_API_KEY) and failed locally. Since your request was to fix merge conflicts, I did not alter tests or add secrets.

All changes have been pushed to the branch.

Key edit: ensured tunnel and interception access consistently use _require_interception_server() in CliAgentEnv to avoid None dereferences.

…solve CliAgentEnv interception handling conflicts by using _require_interception_server() Co-authored-by: will brown <willccbb@users.noreply.github.com>

…978) * Add overview panel and collapse non-running envs in vf-eval display When evaluating many environments from a TOML file, the progress display overflowed the terminal. Now a persistent overview panel at the top shows every env's status (reward, error rate, time) on one compact line, and only running envs get full detail panels below it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix compact row showing negative total when num_examples=-1 Show "..." instead of the negative number, matching the existing behavior in the full progress panel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add arrow-key navigation between env detail panels in vf-eval display - Add key listener infrastructure to BaseDisplay (cbreak mode, stdin polling via os.read on fd 0, arrow key escape sequence parsing) - Show single selected env detail panel instead of all running envs - Left/right arrow keys cycle through envs; selected env highlighted in overview - Overview capped at half terminal height with priority ordering when truncated (failed > running > completed > pending), selected env always visible - Fix terminal size detection: use os.get_terminal_size(0) since stdout/stderr are redirected to pipes, making shutil.get_terminal_size() fall back to 24 lines - Screen mode (--tui) adaptively sizes log panel to fill terminal - Footer shows navigation hint when multiple envs present - Stop key listener before wait_for_exit to prevent stdin contention Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use daemon thread for key listener instead of asyncio task The asyncio task couldn't run when the event loop was blocked by synchronous env startup work (package installation, etc.), so arrow key navigation only worked once all envs were running. A daemon thread runs independently of the event loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Improve overview panel: env counter in title, better selection indicator - Show "(env n/x)" in detail panel title for multi-env evals - Add ▶ prefix to selected env in overview for clearer indication - Change running env icon from ▸ (triangle) to ● (filled circle) to pair with ○ (empty circle) for pending - Add spacing between selection arrow and status icon Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Scroll overview panel to keep selected env visible Replace priority-ordering with a sliding window that follows the selected env. Shows "... N above" / "... and N more" indicators when the list is truncated at top/bottom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert unrelated AGENTS.md sync changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: only set cbreak mode when key listener is active Move cbreak terminal setup from start() into _start_key_listener() so that GEPADisplay (sync context manager, no key listener) doesn't leave the terminal in cbreak mode with no thread consuming keystrokes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "Fix: only set cbreak mode when key listener is active" This reverts commit f8e3b8f. * Fix key listener busy-loop on stdin EOF Break out of the loop when os.read returns empty bytes (EOF from SSH disconnect, terminal close, etc.) instead of spinning at 100% CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Adaptive log panel height in both screen and non-screen modes Fill remaining terminal space with the log panel regardless of display mode, instead of using a fixed 20-line default in non-screen. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Adaptive log panel height fills terminal in both display modes Use terminal-adaptive log sizing in non-screen mode too (was fixed at 20 lines). Includes 2-line buffer in detail_fixed to account for Rich rendering overhead so the footer stays visible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Measure actual rendered height for adaptive log panel sizing Render content items to a temporary buffer to count real terminal lines (accounting for text wrapping) instead of counting Python objects. This ensures the log panel fills exactly the remaining space regardless of metrics count or terminal width. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Document arrow-key navigation in multi-env evaluation display Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add log panel scrolling and wrapped line indentation - Up/down arrow keys scroll through log history (3 lines per press) - Log lines wrap with 4-space indented continuation lines for clarity - Scroll offset shown in panel title; resets on env switch - Footer hint updated to show scroll controls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add colored log headers in eval display log panel Parse log lines into timestamp, source, level, and message parts with distinct styles. Level colors follow common conventions: DEBUG=dim blue, INFO=bold green, WARNING=bold yellow, ERROR=bold red, CRITICAL=bold red reverse. Timestamp is bold dim, source is dim cyan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

The colored log headers make entry boundaries clear without indentation. Keep wrapping-aware height estimation so the log panel fills the correct amount of space. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Two fixes: 1. Set state["cua_sandbox_id"] immediately after sandbox creation, before _wait_for_sandbox_ready/_wait_for_server. Previously if anything failed between creation and the state assignment, cleanup_session couldn't find the sandbox_id so the sandbox leaked until teardown at the end of the run. 2. Add _create_sandbox_with_retry that cleans up orphaned sandboxes from failed retry attempts. Previously if _create_sandbox succeeded but the retry loop retried, the first sandbox was orphaned in active_sandboxes but never referenced by any state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…#987)

…e-envs-as-packages # Conflicts: # verifiers/envs/experimental/cli_agent_env.py

willccbb added 2 commits February 24, 2026 22:13

Handle single-file env import

69207b4

Treat single-file env modules as pkg

de23884

cursor bot reviewed Feb 25, 2026

View reviewed changes

willccbb added 2 commits February 24, 2026 22:28

Fix interception server attr checks

42059b4

Merge remote-tracking branch 'origin/main' into codex/treat-singlefil…

4bfde49

…e-envs-as-packages # Conflicts: # verifiers/envs/experimental/cli_agent_env.py

cursoragent and others added 11 commits March 3, 2026 22:59

Merge origin/main into codex/treat-singlefile-envs-as-packages and re…

c94ea00

…solve CliAgentEnv interception handling conflicts by using _require_interception_server() Co-authored-by: will brown <willccbb@users.noreply.github.com>

add preview true uv (#981)

3839e8b

Remove manual log wrapping, let Rich handle it (#985)

26671c0

The colored log headers make entry boundaries clear without indentation. Keep wrapping-aware height estimation so the log panel fills the correct amount of space. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

vf-eval: Add settings panel in summary and --abbreviated-summary flag (…

0cc1b30

…#987)

sanitize illegible binary characters in Rich display (#986)

337824f

Will/misc ty (#989)

62a1171

Merge remote-tracking branch 'origin/main' into codex/treat-singlefil…

10e81db

…e-envs-as-packages # Conflicts: # verifiers/envs/experimental/cli_agent_env.py

Sync generated AGENTS docs and clean ty suppression

c59f487

willccbb merged commit 69b5e45 into main Mar 5, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle single-file env import#959

Handle single-file env import#959
willccbb merged 15 commits intomainfrom
codex/treat-singlefile-envs-as-packages

willccbb commented Feb 25, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 25, 2026

Uh oh!

willccbb commented Mar 3, 2026

Uh oh!

cursor bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

willccbb commented Feb 25, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 25, 2026

Choose a reason for hiding this comment

Skill docs missing for workflow change

Uh oh!

willccbb commented Mar 3, 2026

Uh oh!

cursor bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

willccbb commented Feb 25, 2026 •

edited by cursor bot

Loading

cursor bot commented Mar 3, 2026 •

edited

Loading