Skip to content

Port forwarding session dies after hours due to insufficient reconnection retry budget on WebSocket close 1000 #135

@menvol3

Description

@menvol3

Environment:

SSM Agent: 3.3.x (on-prem hybrid instance, Amazon Linux 2023)
EC2 Security Group: Allow all outbound traffic
session-manager-plugin: latest (on EC2 t4g.micro, AL2023)
Region: us-west-2
Document: AWS-StartPortForwardingSession
Target: Hybrid managed instance (mi-*)
Idle timeout: 20m + ResumeSession
Max Session Timeout: Not set

Description:

We're running long-lived SSM port forwarding sessions (12-24h) from an EC2 instance to an on-premise hybrid managed instance.
The tunnel works correctly for hours (we use ResumeSession to be able to hanlde idleTimeout), but eventually dies silently -- all traffic starts timing out with no response.

Root cause from logs :

The SSM service closes WebSocket connections approximately every 60 minutes, sending websocket: close 1000 (normal): Bye. Both the on-prem agent and the EC2-side session-manager-plugin attempt to reconnect. The on-prem agent consistently reconnects successfully. The session-manager-plugin usually reconnects too, but occasionally fails.

When the plugin fails to reconnect, the on-prem agent receives "Session is already terminated" when trying to recreate the data channel -- meaning the EC2 side has already given up and killed the session*

On-prem agent log -- successful reconnections every ~60 min (same session, same pattern):

2026-03-28 13:27:39 WARN [pluginName=Port] Reach the retry limit 5 for receive messages. Error: websocket: close 1000 (normal): Bye
2026-03-28 14:27:41 WARN [pluginName=Port] Reach the retry limit 5 for receive messages. Error: websocket: close 1000 (normal): Bye
2026-03-28 15:27:43 WARN [pluginName=Port] Reach the retry limit 5 for receive messages. Error: websocket: close 1000 (normal): Bye
2026-03-28 16:27:44 WARN [pluginName=Port] Reach the retry limit 5 for receive messages. Error: websocket: close 1000 (normal): Bye

All of the above resulted in successful reconnections -- tunnel continued working.

On-prem agent log -- the fatal disconnect (same pattern, but EC2 side failed):

2026-03-28 16:57:51 WARN  [pluginName=Port] Reach the retry limit 5 for receive messages. Error: websocket: close 1000 (normal): Bye
2026-03-28 16:57:51 INFO  [pluginName=Port] The session was cancelled
2026-03-28 16:57:51 ERROR [pluginName=Port] Unable to read from connection: use of closed network connection
2026-03-28 16:57:51 ERROR [pluginName=Port] Unable to accept stream: io: read/write on closed pipe
2026-03-28 16:57:52 INFO  [pluginName=Port] Setting task to cancelled as session is already terminated
2026-03-28 16:57:52 ERROR [pluginName=Port] CreateDataChannel failed: Session is already terminated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions