Skip to content

fix(function): isolated-vm worker pool to prevent single-worker bottleneck#3155

Open
waleedlatif1 wants to merge 2 commits intostagingfrom
fix/pool
Open

fix(function): isolated-vm worker pool to prevent single-worker bottleneck#3155
waleedlatif1 wants to merge 2 commits intostagingfrom
fix/pool

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

  • Replaced single isolated-vm worker process with a configurable pool (default 4 workers)
  • Executions are distributed across workers using least-loaded selection
  • Workers spawn lazily and clean up after idle timeout
  • Added env vars for pool tuning (IVM_POOL_SIZE, IVM_MAX_CONCURRENT, IVM_MAX_PER_WORKER, IVM_WORKER_IDLE_TIMEOUT_MS, IVM_QUEUE_TIMEOUT_MS)
  • Defaults are permissive (10k concurrent, 2500/worker, 5min queue timeout) — no behavior change for existing users

Type of Change

  • Bug fix

Testing

Tested manually. Existing vitest suite (30 tests) passes unchanged.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Feb 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Feb 6, 2026 7:42pm

Request Review

@waleedlatif1
Copy link
Collaborator Author

@cursor review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

  • Introduces a lazy-spawned isolated-vm worker pool (default 4) and distributes executions by least-loaded worker instead of a single subprocess.
  • Adds global/per-worker concurrency caps plus queueing with a configurable queue timeout and worker idle cleanup.
  • Updates env config to expose pool tuning knobs (IVM_POOL_SIZE, IVM_MAX_CONCURRENT, IVM_MAX_PER_WORKER, IVM_WORKER_IDLE_TIMEOUT_MS, IVM_QUEUE_TIMEOUT_MS).
  • Main risk area is correctness of pool accounting (startup/exit paths and active execution counters) which impacts saturation decisions and queue draining.

Confidence Score: 3/5

  • Mergeable after fixing pool accounting bugs that can break worker capacity tracking.
  • The worker-pool approach is straightforward and mostly self-contained, but there are concrete counter/accounting issues (startup failure double-decrement and per-worker activeExecution drift during cleanup) that can miscompute pool capacity and queue draining behavior under failures.
  • apps/sim/lib/execution/isolated-vm.ts

Important Files Changed

Filename Overview
apps/sim/lib/core/config/env.ts Adds env vars for isolated-vm worker pool sizing and queue/idle timeouts; no functional logic changes here.
apps/sim/lib/execution/isolated-vm.ts Replaces single isolated-vm worker with a lazy worker pool plus global/per-worker concurrency limits and queueing; found counter/accounting bugs in worker startup failure handling and active execution tracking.

Sequence Diagram

sequenceDiagram
  participant Caller as Caller
  participant Pool as isolated-vm.ts
  participant Worker as Child worker
  participant VM as isolated-vm-worker.cjs

  Caller->>Pool: "executeInIsolatedVM(req)"
  alt "totalActiveExecutions >= MAX_CONCURRENT"
    Pool->>Pool: "enqueueExecution(req)"
  else "capacity available"
    Pool->>Pool: "acquireWorker()"
    alt "existing worker found"
      Pool->>Worker: "dispatchToWorker(req)"
    else "spawn new worker"
      Pool->>Worker: "spawnWorker()"
      Worker->>VM: "spawn node isolated-vm-worker.cjs"
      VM-->>Worker: "IPC ready"
      Pool->>Worker: "dispatchToWorker(req)"
    else "all saturated"
      Pool->>Pool: "enqueueExecution(req)"
    end
  end

  Worker->>VM: "IPC execute(executionId, request)"
  opt "VM requests fetch"
    VM-->>Pool: "IPC fetch(fetchId, url, optionsJson)"
    Pool->>Pool: "secureFetch(validateProxyUrl + fetch)"
    Pool-->>VM: "IPC fetchResponse(fetchId, response)"
  end

  VM-->>Pool: "IPC result(executionId, result)"
  Pool->>Pool: "decrement counts + reset idle timeout"
  Pool->>Pool: "drainQueue()"
  opt "worker idle"
    Pool->>Worker: "cleanupWorker() after idle timeout"
  end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@icecrasher321 icecrasher321 changed the title fix(executor): isolated-vm worker pool to prevent single-worker bottleneck fix(function): isolated-vm worker pool to prevent single-worker bottleneck Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant