Skip to content

Channels: HTTP proxy with observability for Lightning#4419

Draft
stuartc wants to merge 27 commits intomainfrom
channels
Draft

Channels: HTTP proxy with observability for Lightning#4419
stuartc wants to merge 27 commits intomainfrom
channels

Conversation

@stuartc
Copy link
Member

@stuartc stuartc commented Feb 11, 2026

Summary

Channels adds lightweight reverse-proxy functionality with observability to Lightning. Deploy a Channel between two systems and get instant visibility into requests and responses, with full audit trail.

  • Source: Authenticates inbound requests (API key / Basic Auth via project credentials)
  • Sink: Forwards to target system with credential injection
  • Observability: Every proxied request logged with headers, body preview, SHA256 hash, timing
  • Snapshots: Channel config captured at request time for auditable history

Streaming proxy powered by Weir — constant memory regardless of payload size.

Spec: #4322 | Go-live target: 26 Feb 2026

Stories

Phase 1 — Foundation

Phase 2 — Core Proxy

Phase 3 — Observability

Phase 4 — UI

Phase 5 — Performance Confidence

Dependency Graph

#4399 Schema ──┬──→ #4401 Proxy ──→ #4403 Source auth
               │       │           #4404 Sink auth (parallel)
               │       │
               ├───────┤──→ #4405 Observer ──→ #4406 Snapshots
               │       │
               ├──→ #4400 Audit trail
               ├──→ #4407 Channel UI
               └───────┴──→ #4408 History page

#4409 Mock sink ──┐
                  ├──→ #4410 K6 load tests
#4401 Proxy ──────┘

@github-project-automation github-project-automation bot moved this to New Issues in Core Feb 11, 2026
* Add channels tables migration (#4399)

Create the four core database tables for the channels feature:
channels, channel_snapshots, channel_requests, and channel_events
with all indexes, foreign keys, and constraints per the data model spec.

* Add channel schema modules (#4399)

Create four Ecto schemas under lib/lightning/channels/:
- Channel: core config with optimistic locking
- ChannelSnapshot: immutable point-in-time copies
- ChannelRequest: proxy request tracking with Ecto.Enum state
- ChannelEvent: detailed request/response event log

* Add Channels context and Project association (#4399)

Create Lightning.Channels context with CRUD operations:
list_channels_for_project/1, get_channel!/1, create_channel/1,
update_channel/2, delete_channel/1 (with :has_history guard).
Wire up has_many :channels on Project schema.

* Add channel factories and context tests (#4399)

Add ExMachina factories for all 4 channel schemas. Create context test
covering CRUD operations, uniqueness constraints, optimistic locking,
and deletion protection for channels with history. Fix lock_version
default to 0 (matching Workflow pattern) and use stale_error_field
for clean changeset errors on version conflicts.

* Add @moduledoc to channel schema modules (#4399)

Add module documentation to all four channel schemas to satisfy
credo --strict readability checks.

* Fix lock_version default and delete_channel error handling (#4399)

- Align migration lock_version default to 0, matching Workflow pattern
- Replace rescue ConstraintError with foreign_key_constraint changeset
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 91.87500% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.41%. Comparing base (a3126d7) to head (9c01b93).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
lib/lightning_web/live/channel_live/index.ex 89.74% 12 Missing ⚠️
lib/lightning_web/plugs/channel_proxy_plug.ex 87.77% 11 Missing ⚠️
lib/lightning/channels/handler.ex 83.63% 9 Missing ⚠️
lib/lightning/channels.ex 91.93% 5 Missing ⚠️
.../lightning_web/live/channel_live/form_component.ex 97.72% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4419      +/-   ##
==========================================
+ Coverage   89.36%   89.41%   +0.04%     
==========================================
  Files         425      437      +12     
  Lines       20187    20650     +463     
==========================================
+ Hits        18041    18465     +424     
- Misses       2146     2185      +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

stuartc and others added 25 commits February 11, 2026 15:14
* Add channel proxy plug with endpoint routing (#4401)

Wire Weir as a path dependency and create ChannelProxyPlug that
intercepts /channels/:id/*path requests before Plug.Parsers,
looks up the channel, sets proxy headers, and forwards to the
sink via Weir.proxy/2 with streaming.

* Add sink_url validation, simplify proxy plug, add path traversal tests (#4401)

- Validate sink_url with Validators.validate_url/2 in Channel changeset
  to reject non-http(s) URLs at creation time rather than at proxy time
- Remove manual x-request-id handling from ChannelProxyPlug since Weir
  now injects it into upstream requests automatically
- Add path traversal tests documenting that ".." as channel_id fails
  UUID validation (404) and ".." in subpaths is forwarded as-is

* Use GitHub source for weir dep, with local path override via WEIR_PATH env var
* Add mock sink server for channel proxy testing (#4409)

Standalone Elixir script using Bandit + Plug that simulates configurable
sink behaviour. Supports fixed, delay, timeout, auth, and mixed modes
with CLI-configurable status codes, delays, and body sizes.

* Add load test runner for channel proxy benchmarking (#4409)

Standalone Elixir script that drives HTTP traffic through the channel
proxy to a mock sink. Connects to Lightning via distributed Erlang for
channel setup and memory sampling. Supports happy_path, ramp_up,
large_payload, large_response, mixed_methods, and slow_sink scenarios.
Reports latency percentiles, throughput, error rates, and BEAM memory.

* Add README for channel proxy benchmarking tools (#4409)

Documents mock sink modes, load test scenarios, CLI options, quick start
guide, and how to interpret results (especially memory delta for verifying
streaming proxy behaviour).

* Add run_all.sh runner and move mock sink delay to query params (#4409)

Replace the mock sink's --delay CLI arg and delay mode with a ?delay=N
query parameter, matching the existing ?response_size=N pattern. Also add
?status=N for per-request status overrides. This lets the load test drive
all scenarios against a single mock sink instance without restarts.

Add --delay option to the load test script (defaults to 2000ms for
slow_sink scenario) which appends ?delay=N to the request URL.

Add run_all.sh which runs all 7 scenarios in sequence with preflight
checks, timestamped log/CSV output, and bail-on-first-failure.
…#4410)

Wrap ChannelProxyPlug in three telemetry spans (request, fetch_channel,
upstream) to identify where proxy latency originates. Split the 1044-line
load_test.exs monolith into 7 focused modules under lib/load_test/. Add
Bench.TelemetryCollector (GenServer+ETS) that gets deployed onto the
Lightning BEAM via RPC to collect server-side timing during load tests.
* Add Channels.Handler module and ttfb_ms to ChannelEvent (#4405)

Implements Weir.Handler behaviour for persisting channel proxy requests:
- handle_request_started: sync ChannelRequest creation with header redaction
- handle_response_started: captures TTFB and response headers
- handle_response_finished: async persistence via Task.Supervisor

Adds ttfb_ms column to channel_events for time-to-first-byte tracking.

* Add get_or_create_current_snapshot to Channels context (#4405)

Upserts a ChannelSnapshot for the channel's current lock_version,
handling concurrent creation races via ON CONFLICT DO NOTHING + re-fetch.

* Wire Channels.Handler into proxy plug with Task.Supervisor (#4405)

Connect the handler to ChannelProxyPlug so every proxied request creates
a ChannelRequest (sync) and ChannelEvent (async). Add snapshot
get-or-create before proxying, pass handler state with channel/snapshot
context, and add Task.Supervisor for async persistence.

* Add handler, snapshot, and proxy plug tests (#4405)

- Handler unit tests: request creation, rejection, header redaction,
  TTFB capture, async event persistence, state transitions
- Snapshot context tests: create, idempotent return, new on version bump
- Proxy plug integration tests: full flow with DB verification
- Fix encode_headers to convert tuples to lists for Jason encoding

* Replace Process.sleep with PubSub-based async coordination in tests (#4405)

Broadcast {:channel_request_completed, request_id} from persist_completion/2
so tests can assert_receive instead of sleeping. Eliminates race conditions
in CI and lays groundwork for #4408 real-time history.

* Squash ttfb_ms column into main channels migration (#4405)

Since neither migration has shipped, merge the separate
add_ttfb_ms_to_channel_events migration into create_channels_tables
to keep migration history clean.

* Classify proxy errors and document skipped callback contract (#4405)

Replace raw inspect() of Weir error structs with classify_error/1 that
maps known transport errors (nxdomain, econnrefused, etc.) and timeout
tuples to stable string identifiers for persistence. Expand the moduledoc
to document when handle_response_started is skipped and which handler
state fields will be absent.

* Fix request_path, tighten factories, and review cleanups (#4405)

- Use forward_path instead of conn.path_info for request_path so
  persisted path reflects the upstream path, not the internal route
- Remove default associations from channel factories so callers must
  provide their own channel/snapshot, preventing cross-entity mismatches
- Simplify delete_channel test to match the actual constraint tested
- Add TODO on broadcast noting it fires even on partial persistence failure

* Update weir

* Report which lint checks failed in CI instead of generic message

* Fix Credo alias style in Channels.Handler
Adds the saturation scenario (ramp through concurrency levels to find
throughput ceiling) and an independent --charts flag that generates
gnuplot PNG charts with timestamped output paths.

Standard scenarios produce a combined throughput + latency chart (dual
y-axis: RPS as filled area, p50/p95/p99 latency as line plots over
1-second buckets). Saturation generates throughput and latency vs
concurrency line charts from CSV data.

--charts auto-creates a timestamped CSV for saturation when --csv is
not specified. The two flags are fully independent.
* Add channel_auth_methods join table and schema (#4403)

Create the channel_auth_methods table with role discriminator and
polymorphic FKs (webhook_auth_method_id, project_credential_id).
Remove source_project_credential_id from channels table.
Add ChannelAuthMethod schema with exclusive FK and role-target
consistency validations. Update Channel schema with filtered
has_many associations for source and sink auth methods.

* Extract shared LightningWeb.Auth module from WebhookAuth plug (#4403)

Move pure auth validation functions (valid_key?, valid_user?,
has_credentials?) into a shared module so both WebhookAuth (triggers)
and ChannelProxyPlug (channels) can reuse them without duplicating
security-sensitive code.

* Add get_channel_with_source_auth/1 and update proxy plug to preload source auth (#4403)

* Add source authentication validation to ChannelProxyPlug (#4403)

Insert authenticate_source/2 between channel fetch and upstream proxying.
Channels with no source auth methods remain publicly accessible (fail-open).
Valid credentials pass through, missing credentials return 401, and wrong
credentials return 404 to hide channel existence.

* Add tests and factory for channel source authentication (#4403)

- Add channel_auth_method factory to ExMachina factories
- Add get_channel_with_source_auth/1 context tests
- Add ChannelAuthMethod changeset validation tests (exclusive FKs,
  role-target consistency, unique constraints)
- Add LightningWeb.Auth unit tests (API key, Basic Auth, credential
  detection)
- Add source auth proxy plug integration tests (auth enforcement,
  401/404 responses, multiple auth methods, mixed types)
- Add unique_constraint declarations to ChannelAuthMethod changeset

* Return JSON error responses from ChannelProxyPlug (#4403)

Replace plain-text send_resp errors with structured JSON responses
(e.g. {"error": "Not Found"}) to match WebhookAuth and other API
error paths in the codebase.

* Consolidate channel_auth_methods into original migration and remove source_project_credential_id (#4403)

Merge the separate channel_auth_methods migration into the base
create_channels_tables migration and remove the now-unused
source_project_credential_id column from channels and channel_snapshots.
* Channels: Schema cleanup, SinkAuth module, and handler state-based request_id (#4404)

Drop sink_project_credential_id direct FK from channels and channel_snapshots
(all sink credentials now go through channel_auth_methods join table). Add
SinkAuth module mapping credential schema+body to Authorization headers for
http, dhis2, and oauth schemas. Replace get_channel_with_source_auth/1 with
get_channel_with_auth/1 that preloads both source and sink auth methods.
Update handler to read request_id from state (Weir no longer provides it in
metadata). Pass request_id from plug conn into handler state.

* Channels: Sink auth resolution, outbound headers, and credential error handling (#4404)

Wire sink auth into the proxy flow: resolve credential at request time,
build outbound headers with auth injection, and pass to Weir via headers
option. Credential errors produce observable ChannelRequest/ChannelEvent
records and return 502.

* Channels: SinkAuth unit tests and proxy plug sink auth integration tests (#4404)

Add comprehensive test coverage for sink authentication:
- SinkAuth.build_auth_header/2 unit tests for http, dhis2, oauth schemas
- Priority rules (access_token > username/password, pat > username/password)
- Unsupported schema error cases
- Proxy plug integration tests verifying auth headers arrive at upstream
- Credential error tests (environment_not_found, missing auth fields → 502)
- Authorization header redaction verification in persisted ChannelEvents
- Proxy headers still forwarded alongside auth header

* Channels: Synchronous handler, remove TaskSupervisor and PubSub broadcast (#4404)

Make the Weir handler fully synchronous — persist_completion now runs
inline during handle_response_finished instead of being spawned as an
async task. This eliminates the need for the Channels TaskSupervisor,
the PubSub broadcast used to signal completion, and all the test
subscribe/assert_receive/on_exit synchronisation machinery.

Bump weir to 41e20bf (Observer refactor replacing Agent-based state).

* Channels: Refactor ChannelProxyPlug to fix credo warnings (#4404)

Flatten nested case chain in do_proxy using with/else and an extracted
proxy_with_auth helper. Move inline aliases to module level, extract
error_response/3 helper, and use status atoms throughout.

* Channels: Fix dialyzer errors in classify_credential_error/1 (#4404)

Match actual OAuth refresh error atoms (:temporary_failure,
:reauthorization_required) instead of non-existent {:oauth_refresh_failed, _}
tuple, and remove unreachable catch-all clause.
* Add channel create, edit, and delete (part 2 of #4407)

* Move Channels menu item under experimental features

* Fix cast_assoc channel auth method creation and add test coverage

* Add channel index stats, fix delete confirmation, and improve form links

- Add `get_channel_stats_for_project/1` context function returning total
  channels and requests in a single LEFT JOIN query
- Render a 2-card metrics grid (Total Channels / Total Requests) above
  the channel table on the index page
- Fix delete confirmation: replace `phx-confirm` (silently dropped by
  the `<.button>` global-attrs whitelist) with `data-confirm`
- Move "Create one in project settings" links inline with section titles
  in the form, always visible regardless of list emptiness
- Add context unit tests for `get_channel_stats_for_project/1`
- Add LiveView tests: stats cards, delete confirm attribute, settings
  links always visible, pre-selected auth methods, and a remove/keep/add
  auth method scenario

* fix failing test

* increase coverage

* Address PR review feedback on Channels CRUD

- Add :update_channel permission to ProjectUsers policy
- Add Channels.get_channel_for_project/2 to scope fetches to the
  current project in a single query
- Add server-side authorization checks to toggle and delete handlers
- Fix apply_action(:edit) to check can_edit_channel, not can_create_channel
- Gate the enabled toggle on can_edit_channel in the template
- Fix merge_selections/2 truthy semantics bug (|| on booleans)
- Use to_form/1 for the changeset in the form component
- Refactor toggle/delete handlers with with and private helpers

* fix edit modal fails to open sometimes

* audit channel_auth_methods

* Add LIVE_DEBUGGER_IP and LIVE_DEBUGGER_EXTERNAL_URL env vars

Allow configuring LiveDebugger's bind address and external URL for
remote/container access via optional env vars in dev mode.

* Fix demo reset by deleting channel tables before projects

Channel requests and snapshots have foreign keys that must be
cleared before their parent channels and projects are deleted.

* Fix new channel modal ID when selected_channel struct has nil id

The %Channel{id: nil} struct is truthy, so the && short-circuit
produced "edit-channel-" instead of falling through to :new. Add
an explicit check on selected_channel.id.

---------

Co-authored-by: Stuart Corbishley <corbish@gmail.com>
The proxy now strips the original x-request-id from inbound headers
and replaces it with the authoritative one from Plug.RequestId, which
prefers the caller's ID when valid (20-200 chars) or generates a new
one. Also adds request ID logging to the benchmarking mock sink and
a moduledoc for ChannelProxyPlug.
Introduce a SinkRequest struct to bundle resolved proxy context,
eliminating triple x-request-id extraction and reducing function
arity. Replace inline header manipulation with composable
reject_header/set_header primitives and a clean pipeline in
build_outbound_headers. Add test for x-request-id forwarding.
Cover the case where a channel's sink_url has a trailing slash,
ensuring the proxy does not produce a double-slash when forwarding
to the root path or a subpath.
Move the header pipeline (reject_header, set_header, add_proxy_headers,
build_outbound_headers) into a nested Headers module with public functions,
enabling direct unit testing of the pure header logic. Add
create_source_auth_channel/2 helper to deduplicate repeated channel
setup across 7 source authentication tests.
Add a Proxy URL column to the channels index table and a URL display
in the edit modal, both with copy-to-clipboard buttons using the
existing Copy + Tooltip hooks. URLs use dir="rtl" for left-truncation.

Also fix the Copy hook to fall back to execCommand('copy') in insecure
contexts where navigator.clipboard is unavailable.
The server-rendered timestamp briefly flashed before the LocalTimeConverter
hook replaced it with a relative time string. Now the datetime-text span
renders as an appropriately-sized skeleton bar that the hook strips away
after conversion, eliminating the visual distraction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

2 participants