Skip to content

Scheduling Queue and Event-driven Scheduling Attempts #421

@Varsius

Description

@Varsius

Description

Introduce a scheduling queue in Cortex that keeps track of unschedulable workloads. These workloads should be re-evaluated when the cluster state changes (e.g. nodes added, workloads complete), rather than relying only on exponential backoff.

Objectives

  • Implement a pending queue for unschedulable pods with ordering by some priority (e.g. submission time)
  • Register and handle cluster events to trigger scheduling attempts for pending pods
  • Integrate queueing hints to selectively re-try only affected workloads
  • Documentation

Acceptance Criteria

  • Unschedulable workloads are queued and re-tried when relevant cluster events occur
  • Basic e2e tests

Dependencies

N/A

Additional Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions