Skip to content

Comments

ci(railway): add Railway OSS deployment framework and preview environment CI#3787

Open
mmabrouk wants to merge 6 commits intomainfrom
ci/railway-preview-environments
Open

ci(railway): add Railway OSS deployment framework and preview environment CI#3787
mmabrouk wants to merge 6 commits intomainfrom
ci/railway-preview-environments

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Feb 19, 2026

Summary

  • Add complete Railway OSS deployment infrastructure under hosting/railway/oss/
  • Add 3 GitHub Actions workflows for automated per-PR preview environments
  • Add design docs covering architecture decisions, caveats, and phased rollout plan

What's included

Deployment scripts (hosting/railway/oss/scripts/)

  • bootstrap.sh -- create Railway project, services, volumes (idempotent)
  • configure.sh -- set all environment variables per service
  • deploy-from-images.sh -- full deploy flow from pre-built GHCR images
  • smoke.sh -- health check validation for /w, /api/health, /services/health
  • preview-create-or-update.sh -- create/update PR preview project
  • preview-destroy.sh -- delete PR preview project
  • preview-cleanup-stale.sh -- delete previews older than configurable TTL
  • Plus: build-and-push-images.sh, deploy-gateway.sh, deploy-services.sh, init-databases.sh, upgrade.sh

Gateway (hosting/railway/oss/gateway/)

  • Nginx config with Railway IPv6 DNS resolver ([fd12::10])
  • Variable-based proxy_pass for dynamic DNS re-resolution
  • Rewrite rules for path prefix stripping

CI Workflows (.github/workflows/)

  • 06-railway-preview-build.yml -- build and push PR-tagged images to GHCR (Docker Buildx + GHA cache)
  • 07-railway-preview-deploy.yml -- deploy preview and post URL as PR comment
  • 08-railway-preview-cleanup.yml -- destroy on PR close + daily stale cleanup cron

Design docs (docs/design/railway-preview-environments/)

  • Context, research, plan, status, deployment notes, QA strategy

Testing

This PR itself tests the CI workflows. The build workflow should trigger on this PR, build the 3 images, then deploy a preview environment and post the URL as a comment.

Requires RAILWAY_TOKEN GitHub Actions secret (already configured).


Open with Devin

…ment CI

Add complete Railway OSS deployment infrastructure:
- Bootstrap, configure, deploy, and smoke test scripts
- Nginx gateway with Railway IPv6 DNS resolver and dynamic proxy_pass
- Wrapper Dockerfiles for all 11 services (api, web, services, workers, cron, alembic, etc.)
- Preview lifecycle scripts (create/update, destroy, stale cleanup)
- Three GitHub Actions workflows for automated PR preview environments:
  - 06: build and push PR-tagged images to GHCR
  - 07: deploy preview environment and post URL as PR comment
  - 08: destroy on PR close + daily stale cleanup cron
- Design docs covering architecture, caveats, and phased rollout plan
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 19, 2026
@vercel
Copy link

vercel bot commented Feb 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Feb 19, 2026 8:10pm

Request Review

The deploy job calls a reusable workflow that posts PR comments.
The caller's permissions block must include pull-requests:write
for the called workflow to use it via secrets:inherit.
devin-ai-integration[bot]

This comment was marked as resolved.

@github-actions
Copy link

github-actions bot commented Feb 19, 2026

Railway Preview Environment

Image tag pr-3787-aa306bb
Status Failed
Logs View workflow run

Updated at 2026-02-19T20:11:09.961Z

devin-ai-integration[bot]

This comment was marked as resolved.

Comment on lines +79 to +80
ENV AGENTA_AUTH_KEY=0000000000000000000000000000000000000000000000000000000000000000
ENV AGENTA_CRYPT_KEY=1111111111111111111111111111111111111111111111111111111111111111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Hardcoded default auth/crypt keys in deploy-from-images.sh wrappers

The render_api_like_wrapper, render_api_wrapper, render_services_wrapper, render_web_wrapper, and render_alembic_wrapper functions in deploy-from-images.sh all hardcode AGENTA_AUTH_KEY=000... and AGENTA_CRYPT_KEY=111... as ENV defaults in the generated Dockerfiles. These are the same defaults used in configure.sh:9-10.

For preview environments this is acceptable since they're ephemeral. However, configure.sh also uses these as defaults for production deployments. If an operator runs configure.sh without setting AGENTA_AUTH_KEY/AGENTA_CRYPT_KEY, the deployment will use these well-known placeholder keys, which could be a security concern for non-preview deployments. The README and deployment notes don't explicitly warn about this.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

The Railway CLI uses --version flag, not a version subcommand.
Comment on lines +163 to +165
render_api_like_wrapper worker-tracing '["python", "-m", "entrypoints.worker_tracing"]'
render_api_like_wrapper worker-evaluations '["python", "-m", "entrypoints.worker_evaluations"]'
render_api_like_wrapper cron '["cron", "-f"]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Worker Dockerfiles use bare python but this is safe due to PATH in base image

The static Dockerfiles at hosting/railway/oss/worker-tracing/Dockerfile:14 and hosting/railway/oss/worker-evaluations/Dockerfile:14, as well as the dynamic wrappers generated by hosting/railway/oss/scripts/deploy-from-images.sh:163-164, all use bare python in their CMD. The deployment notes document Bug 3 where alembic failed because bare python resolved to the system python without packages.

However, this is not a bug for workers. The base image api/oss/docker/Dockerfile.gh sets PATH="/opt/venv/bin:${PATH}" in the runner stage, so bare python resolves to /opt/venv/bin/python with all packages. The alembic bug was specifically caused by using sh -lc (login shell), which sources /etc/profile and can reset PATH. Workers use exec-form CMD (["python", "-m", ...]) which doesn't invoke a shell, so PATH from the image ENV is preserved.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Railway CLI uses two different env vars:
- RAILWAY_TOKEN: project-scoped actions only
- RAILWAY_API_TOKEN: account/workspace-level actions (create/list/delete projects)

Our preview scripts need account-level access. Updated all scripts
to accept either variable, and CI workflows to set RAILWAY_API_TOKEN.
Passes through COMPOSIO_API_KEY to the api service if set.
Skipped silently if not provided.
- preview-cleanup-stale.sh: use process substitution instead of
  pipe-to-while so DELETED/SKIPPED counters are not lost in subshell
- smoke.sh: propagate check_endpoint exit code after repair instead
  of unconditional return 0
- 06-railway-preview-build.yml: add path filters so docs-only PRs
  don't trigger full image builds and Railway deploys
- README.md: add security note about placeholder auth/crypt keys
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 11 additional findings in Devin Review.

Open in Devin Review

Comment on lines +37 to +40
created_at="$(printf "%s" "$project" | jq -r '.createdAt')"

# Parse ISO 8601 timestamp to epoch seconds.
created_epoch="$(date -d "$created_at" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%S" "${created_at%%.*}" +%s 2>/dev/null || echo 0)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Stale preview cleanup uses project creation time instead of last-update time, deleting active previews

The preview-cleanup-stale.sh script determines staleness by comparing a project's createdAt timestamp against the max age threshold. Since preview-create-or-update.sh updates preview environments in-place (it calls bootstrap.sh which links to the existing project rather than recreating it), the createdAt field never changes even when a preview receives new deploys.

Root Cause and Impact

At hosting/railway/oss/scripts/preview-cleanup-stale.sh:37, the script extracts createdAt:

created_at="$(printf "%s" "$project" | jq -r '.createdAt')"

And at line 65-66 the jq filter only passes through createdAt:

'.[] | select(.name | startswith($prefix)) | {name: .name, createdAt: .createdAt}'

This means: if a PR is opened and its preview environment is created, then the developer actively pushes new commits over the next 2 days, the daily cron (running at 06:00 UTC with default 24h TTL) will delete the preview environment because createdAt is >24h old — even though the preview was just updated minutes ago.

The preview will be recreated on the next push (the build workflow chains to deploy), but there's a window where the preview URL returns nothing, causing confusion for reviewers who click the link.

The fix should use updatedAt instead of createdAt to reflect actual activity on the project.

Suggested change
created_at="$(printf "%s" "$project" | jq -r '.createdAt')"
# Parse ISO 8601 timestamp to epoch seconds.
created_epoch="$(date -d "$created_at" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%S" "${created_at%%.*}" +%s 2>/dev/null || echo 0)"
name="$(printf "%s" "$project" | jq -r '.name')"
updated_at="$(printf "%s" "$project" | jq -r '.updatedAt')"
# Parse ISO 8601 timestamp to epoch seconds.
created_epoch="$(date -d "$updated_at" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%S" "${updated_at%%.*}" +%s 2>/dev/null || echo 0)"
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant