Skip to content

chore(deps): update terraform#514

Open
renovate[bot] wants to merge 1 commit intomainfrom
renovate/terraform
Open

chore(deps): update terraform#514
renovate[bot] wants to merge 1 commit intomainfrom
renovate/terraform

Conversation

@renovate
Copy link
Copy Markdown
Contributor

@renovate renovate bot commented Mar 27, 2026

This PR contains the following updates:

Package Type Update Change
aws (source) required_provider minor < 6.38< 6.40
aws (source) required_provider minor 6.37.06.39.0
google (source) required_provider minor 7.25.07.26.0

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

hashicorp/terraform-provider-aws (aws)

v6.39.0

Compare Source

NOTES:

  • data-source/aws_eks_access_entry: The tags_all attribute is deprecated and will be removed in a future major version (#​47133)

FEATURES:

  • New Data Source: aws_iam_role_policies (#​46936)
  • New Data Source: aws_iam_role_policy_attachments (#​47119)
  • New Data Source: aws_networkmanager_core_network (#​45798)
  • New Data Source: aws_uxc_services (#​47115)
  • New List Resource: aws_eks_cluster (#​47133)
  • New List Resource: aws_organizations_aws_service_access (#​46993)
  • New List Resource: aws_sagemaker_training_job (#​46892)
  • New List Resource: aws_workmail_group (#​47131)
  • New List Resource: aws_workmail_user (#​47131)
  • New Resource: aws_organizations_aws_service_access (#​46993)
  • New Resource: aws_sagemaker_training_job (#​46892)
  • New Resource: aws_uxc_account_customizations (#​47115)
  • New Resource: aws_workmail_group (#​47131)
  • New Resource: aws_workmail_user (#​47131)

ENHANCEMENTS:

  • data-source/aws_outposts_asset: Add instance_families attribute (#​47153)
  • resource/aws_eks_cluster: Add resource identity support (#​47133)
  • resource/aws_eks_cluster: Support tier-8xl as a valid value for control_plane_scaling_config.tier (#​46976)
  • resource/aws_network_acl_rule: Add Resource Identity support (#​47090)
  • resource/aws_observabilityadmin_centralization_rule_for_organization: Add source.source_logs_configuration.data_source_selection_criteria argument. Change source.source_logs_configuration.log_group_selection_criteria to Optional (#​47154)
  • resource/aws_prometheus_scraper: Add source.vpc argument. Change source.eks to Optional (#​47155)
  • resource/aws_s3_bucket_metric: Support bucket metrics for directory buckets (#​47184)
  • resource/aws_s3control_storage_lens_configuration: Add storage_lens_configuration.account_level.advanced_performance_metrics and storage_lens_configuration.account_level.bucket_level.advanced_performance_metrics arguments (#​46865)

BUG FIXES:

  • data-source/aws_eks_access_entry: Fixed tags not being returned (#​47133)
  • data-source/aws_service_principal: Fix service principal names for EC2 and S3 in the aws-cn partition (#​47141)
  • resource/aws_config_organization_conformance_pack: Fix creation timeout when using a delegated administrator account (#​47072)
  • resource/aws_dynamodb_table: Fix Error: waiting for creation AWS DynamoDB Table (xxxxx): couldn't find resource in highly active accounts by restoring 5s delay before polling for table status. This fixes a regression introduced in v6.28.0. (#​47143)
  • resource/aws_eks_cluster: Set bootstrap_self_managed_addons to true when importing (#​47133)
  • resource/aws_elasticache_serverless_cache: Fix InvalidParameterCombination error when cache_usage_limits is removed (#​46134)
  • resource/aws_glue_catalog_table: Detect and report failed view creation (#​47101)

v6.38.0

Compare Source

FEATURES:

  • New Action: aws_dms_start_replication_task_assessment_run (#​47058)
  • New Data Source: aws_dynamodb_backups (#​47036)
  • New Data Source: aws_msk_topic (#​46490)
  • New Data Source: aws_savingsplans_offerings (#​47081)
  • New List Resource: aws_msk_cluster (#​46490)
  • New List Resource: aws_msk_serverless_cluster (#​46490)
  • New List Resource: aws_msk_topic (#​46490)
  • New List Resource: aws_route53_resolver_rule (#​47063)
  • New List Resource: aws_sagemaker_algorithm (#​47051)
  • New List Resource: aws_ssm_document (#​46974)
  • New List Resource: aws_ssoadmin_account_assignment (#​47067)
  • New List Resource: aws_vpc_endpoint (#​46977)
  • New List Resource: aws_workmail_domain (#​46931)
  • New Resource: aws_msk_topic (#​46490)
  • New Resource: aws_observabilityadmin_telemetry_enrichment (#​47089)
  • New Resource: aws_sagemaker_algorithm (#​47051)
  • New Resource: aws_workmail_default_domain (#​46931)
  • New Resource: aws_workmail_domain (#​46931)

ENHANCEMENTS:

  • data-source/aws_networkfirewall_firewall_policy: Add firewall_policy.enable_tls_session_holding attribute (#​47065)
  • resource/aws_bedrockagentcore_agent_runtime: Add authorizer_configuration.custom_jwt_authorizer.custom_claim configuration block (#​47049)
  • resource/aws_bedrockagentcore_gateway: Add authorizer_configuration.custom_jwt_authorizer.custom_claim configuration block (#​47049)
  • resource/aws_bedrockagentcore_gateway_target: Add target_configuration.mcp.api_gateway configuration block (#​46916)
  • resource/aws_dynamodb_table: Add restore_backup_arn argument (#​47068)
  • resource/aws_fis_experiment_template: Support KinesisStreams as a value for action.target.key (#​47010)
  • resource/aws_fis_experiment_template: Support VPCEndpoints as a value for action.target.key (#​47045)
  • resource/aws_mq_broker: Change user block to Optional (#​46883)
  • resource/aws_msk_cluster: Add resource identity support (#​46490)
  • resource/aws_msk_serverless_cluster: Add resource identity support (#​46490)
  • resource/aws_networkfirewall_firewall_policy: Add firewall_policy.enable_tls_session_holding argument (#​47065)
  • resource/aws_securityhub_insight: Add filters.aws_account_name configuration block (#​47027)
  • resource/aws_securityhub_insight: Add filters.compliance_associated_standards_id configuration block (#​47027)
  • resource/aws_securityhub_insight: Add filters.compliance_security_control_id configuration block (#​47027)
  • resource/aws_securityhub_insight: Add filters.compliance_security_control_parameters_name configuration block (#​47027)
  • resource/aws_securityhub_insight: Add filters.compliance_security_control_parameters_value configuration block (#​47027)
  • resource/aws_ssoadmin_account_assignment: Add Resource Identity support (#​47067)

BUG FIXES:

  • resource/aws_api_gateway_method: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_apigatewayv2_integration: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_apigatewayv2_route: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_apigatewayv2_stage: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_appmesh_gateway_route: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_appmesh_route: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_appmesh_virtual_gateway: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_appmesh_virtual_node: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_appmesh_virtual_router: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_appmesh_virtual_service: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_cloudfront_distribution_tenant: Fix panic when managed certificate is not found during creation (#​46982)
  • resource/aws_controltower_control: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_default_route_table: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_gateway_association: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_hosted_private_virtual_interface: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_hosted_private_virtual_interface_accepter: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_hosted_public_virtual_interface: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_hosted_public_virtual_interface_accepter: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_hosted_transit_virtual_interface: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_hosted_transit_virtual_interface_accepter: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_private_virtual_interface: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_public_virtual_interface: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_dx_transit_virtual_interface: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_ecs_express_gateway_service: Fix Provider produced inconsistent result after apply error when environment variables are defined in non-alphabetical order (#​46771)
  • resource/aws_elasticache_reserved_cache_node: Fix Provider returned invalid result object after apply errors where computed attributes remained unknown after create (#​47012)
  • resource/aws_kinesis_stream: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_mq_broker: Fix non-idempotent behavior for RabbitMQ brokers with user block (#​46883)
  • resource/aws_network_acl: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_network_interface_sg_attachment: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_opensearch_domain: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_route53recoverycontrolconfig_routing_control: Fix panic on concurrent creates when API returns ConflictException (#​47038)
  • resource/aws_route_table_association: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_serverlessapplicationrepository_cloudformation_stack: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_servicecatalog_product: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_ses_active_receipt_rule_set: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_ssm_default_patch_baseline: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_vpc_dhcp_options_association: Fix import to honor @region suffix when using resource-level region attribute (#​47043)
  • resource/aws_wafv2_web_acl_rule: Fix Unable to unmarshal DynamicValue error when statement.managed_rule_group_statement.rule_action_override block is specified (#​46998)
  • resource/aws_wafv2_web_acl_rule_group_association: Fix WAFOptimisticLockException errors when multiple associations target the same Web ACL (#​47037)
hashicorp/terraform-provider-google (google)

v7.26.0

Compare Source


Configuration

📅 Schedule: Branch creation - "before 10am on friday" in timezone Europe/London, Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot added dependencies Renovatebot and dependabot updates terraform labels Mar 27, 2026
@renovate renovate bot enabled auto-merge (squash) March 27, 2026 00:52
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 27, 2026

Open in Overmind ↗


model|risks_v6
✨Encryption Key State Risk ✨KMS Key Creation

🔴 Change Signals

Routine 🔴 ▇▅▃▂▁ Multiple AWS compute and load balancing resources showing unusual infrequent updates at 1 event/week for the last 2-3 months, with related attachments and monitoring resources also at 1 event/week for the last 5-7 weeks, which is rare compared to typical patterns.
Policies 🔴 ▃▂▁ Multiple policy violations detected, including missing tags and lack of server-side encryption on S3 buckets, and security risks from open SSH access.

View signals ↗


🔥 Risks

Replacing the directly exposed API instance will interrupt the public EIP endpoint during cutover ‼️High Open Risk ↗
The production API server instance 540044833068.eu-west-2.ec2-instance.i-0ea52e0eb6476a224 is being replaced, and the public Elastic IP 13.134.236.98 attached to its primary ENI eni-065db8208d75f1855 will be reassociated during that replacement. Blast-radius data shows this is a real direct public endpoint today: the instance has public DNS ec2-13-134-236-98.eu-west-2.compute.amazonaws.com and security group sg-03cf38efd953aa056 allows customer traffic to 443. This conflicts with the organization’s requirement that EC2 instances must not be directly reachable from the internet and with the REL02-BP01 / SEC05-BP01 guidance against using individual instances as public endpoints.

Because the instance is not being updated in place but replaced, the EIP and public DNS mapping have to move from the old instance/ENI to a new one. That creates a real cutover risk: any external client, monitor, or runbook using the current direct EC2 identity rather than a managed endpoint will lose continuity during the reassociation, and there is no load balancer or edge service absorbing that swap on the public path. The internal api-health-terraform-example target remains keyed to private IP 10.0.101.142:9090, so the exact mechanism is endpoint handoff and identity drift rather than the NLB depending on the EIP itself, but the result is still production reachability loss during the replacement.

Attachment replacement will leave both load balancers with zero healthy targets during re-registration ‼️High Open Risk ↗
This change replaces both aws_lb_target_group_attachment resources while the affected target groups currently have only one healthy backend each. api-health-terraform-example has a single healthy IP target, 10.0.101.142:9090, and api-207c90ee-tg has a single healthy instance target, i-09d6479fb9b97d123:80. Replacing either attachment will temporarily remove the only registered target from its load balancer, leaving the target group with no healthy capacity despite the load balancers themselves spanning two AZs.

The NLB path is especially exposed because the backend instance that owns 10.0.101.142 is also being replaced. AWS only resumes routing to a newly registered target after registration completes and the target passes initial health checks, and IP targets cannot be re-registered until deregistration completes. During that window, customer API health traffic and internal monitoring through api-health-terraform-example will fail, and the ALB-backed API can also return errors while its sole instance target is detached and reattached. This violates the static-stability expectation in REL10-BP01 and the gradual rollout guidance in REL10-BP03.

Simultaneous EC2 replacement can remove all healthy load balancer targets if the new AMIs/bootstrap differ ‼️High Open Risk ↗
This change replaces all three standalone EC2 instances in the serving tier and also replaces the target-group attachment resources that register them with the load balancers. The api-health-terraform-example service is especially fragile: its NLB target group uses direct ip targets on port 9090, and current state shows only one healthy registered target, 10.0.101.142:9090. The new instances will come up from different AMIs, and two of them switch from reviewable plaintext user_data to opaque base64 blobs, so the bootstrap logic that must start the health responder is no longer inspectable in the plan.

Because these instances are not in an Auto Scaling Group and there is no evidence of a canary or phased rollout, any regression in the new AMI or bootstrap script will remove healthy targets faster than the load balancers can recover. api-health-terraform-example can lose all healthy 9090 targets during the rotation, and api-207c90ee-tg is also a single-instance target behind an ALB, so its replacement creates the same failure mode on port 80. This conflicts with the org’s reliability guidance against simultaneous fleet-wide rollout and makes service interruption likely if the new images do not reproduce the current health-check behavior exactly.

Single healthy backend replacement will create a real service gap during ALB/NLB and EIP cutover ‼️High Open Risk ↗
This change replaces both single-instance frontends that currently carry live traffic: the ALB-backed api_server instance 540044833068.eu-west-2.ec2-instance.i-09d6479fb9b97d123 and the api_access instance 540044833068.eu-west-2.ec2-instance.i-0ea52e0eb6476a224, along with their target-group attachments and the Elastic IP 13.134.236.98. Blast-radius data shows there is only one healthy target in each target group today, and both backends are in eu-west-2a, so there is no spare capacity or cross-AZ redundancy to absorb the replacement.

When Terraform replaces these resources, the old instance target and old IP attachment cannot continue serving indefinitely while the new instances bootstrap, register, and satisfy load balancer health checks. The ALB target group requires two successful /health checks at 30-second intervals before the replacement target is considered healthy, and the EIP must be reassociated from the old instance to the new one. During that convergence window, the ALB or NLB can have zero healthy backends and direct clients hitting the EIP can lose connectivity, causing a real availability gap. This violates the static stability expectations in REL10-BP01 and REL11-BP05.


🧠 Reasoning · ✔ 4 · ✖ 2

Direct public EIP on EC2 instance, EIP lifecycle, and endpoint identity risks

Observations 10

Hypothesis

Direct exposure and operational changes to Elastic IP 13.134.236.98 on ENI eni-065db8208d75f1855 / instances (e.g., i-0ea52e0e...) create multiple risks. Having a public EIP directly on an EC2 instance is an anti-pattern versus terminating internet traffic on managed endpoints (ALB/NLB/CloudFront) protected by WAF/Shield, increasing attack surface and bypassing central controls. Re-association, update, or disassociation of this EIP can change or remove the public mapping (including DNS name resolution) for private IP 10.0.101.142 and any attached instance, potentially breaking public-facing services and health checks and leaving DNS A records stale or routing to unintended resources. Updating or disassociating this EIP directly removes public reachability for services behind the internal api-health target group that rely on this address for external access or monitoring, causing loss of external health checks or client traffic. Endpoint identity can also drift during instance/EIP moves: when the api_access node is replaced and the EIP association is shifted, the instance’s public IP, DNS, and primary network interface all change, so any external clients, health checks, or runbooks that still use literal public IP/DNS instead of the EIP or another stable name may fail during or after the swap. This conflicts with REL02-BP01 / SEC05-BP01 / SEC05-BP03; public exposure should be via managed load balancers or edge endpoints, and EIP lifecycle changes must be carefully sequenced and validated against DNS, monitoring, and client dependencies. EIP 13.134.236.98 is also associated to ENIs whose primary private IPs back internal load balancer targets, so changing, updating, or releasing the EIP can remove the public mapping that external health checks or clients rely on to reach those backends, further reducing availability and reinforcing the anti-pattern of exposing instances directly as public endpoints.

Investigation

I treated the concern area as availability and endpoint-exposure risk around the public Elastic IP and the instance it fronts. I first checked organizational knowledge. That guidance is explicit that EC2 instances must not be directly reachable from the internet, and the AWS network-security guidance also flags direct EC2 public endpoints as an anti-pattern. Blast-radius data confirms this is not hypothetical: Elastic IP 13.134.236.98 is currently attached to instance i-0ea52e0eb6476a224 via primary ENI eni-065db8208d75f1855, and that instance is directly reachable on 443 through security group sg-03cf38efd953aa056 from a customer IP allowlist. The same instance is also registered as target 10.0.101.142:9090 in the internal NLB target group api-health-terraform-example, so it is serving both as a directly exposed public endpoint and as an internal health target.

I then examined the planned changes. The EC2 instance is being replaced, not updated in place, because the AMI changes and many identity fields become unknown after apply. The Elastic IP resource also changes from a concrete instance: i-0ea52e0eb6476a224 to instance: (known after apply), which indicates Terraform will have to detach and reattach the EIP to the replacement instance. Additional planned changes show the target-group attachments are also being replaced, reinforcing that endpoint identity is moving during this apply. AWS documentation says an Elastic IP is associated to an instance or network interface and can be reassociated to a different instance; disassociating it from a primary ENI causes the instance to get a different public IPv4 address, and reassociation moves the public mapping to the new attachment. That means this change necessarily creates a handoff window where the old instance identity is destroyed and the EIP/public DNS mapping is shifted to a new instance/ENI rather than remaining continuously stable. (docs.aws.amazon.com)

The hypothesis overreaches in one place: the internal NLB target group is keyed on private IP 10.0.101.142:9090, so the EIP itself is not what makes that internal target healthy. However, there is still a closely related and real availability risk involving the same resources and change type. Because the instance is force-replaced and the EIP association is being moved as part of that replacement, any external clients, monitors, or operational runbooks using the current instance public DNS/IP or expecting uninterrupted continuity on the direct EC2 endpoint will break or see a gap during the cutover. The blast-radius state shows the current public DNS is the EC2-generated name ec2-13-134-236-98.eu-west-2.compute.amazonaws.com, which is tied to the EIP mapping on the existing primary ENI. Moving the EIP to a newly created instance/ENI necessarily changes the underlying endpoint identity even if the EIP is preserved, and any consumers pinned to the instance’s ephemeral identity rather than the stable EIP will fail. Because this is a production public endpoint and the instance is directly internet-facing, I consider the risk real and high severity: the change introduces a real cutover hazard for live traffic and monitoring while also continuing a configuration that violates org security standards.

✔ Hypothesis proven


Load balancer/NLB target deregistration during attachment replacement causing transient unavailability

Observations 21

Hypothesis

Changing ALB/NLB target group attachments can cause temporary deregistration of EC2 or IP targets (e.g., IP 10.0.101.142 in target group api-health-terraform-example), leading to brief loss of healthy targets, failed health checks, and transient traffic interruption. Replacing aws_lb_target_group_attachment resources may momentarily remove and re-add targets, reducing available capacity and impacting services that depend on those target groups. This risk applies to ALBs (e.g., api-207c90ee-alb) and internal/external NLBs (e.g., mon-internal-terraform-example) where backend IP changes or short-lived deregistration phases can reduce the pool of healthy targets, degrade user traffic handling, or mislead autoscaling and failure detection. For api-health-terraform-example (IP target type on port 9090, TCP health checks), repeated deregistration/re-registration of 10.0.101.142 can cause customer-facing API downtime, internal monitoring and Prometheus scrape interruptions, and failed health checks governed by security group sg-03cf38efd953aa056 and SG rules for ports 8080/9090. Concentrating targets in a single AZ or operating with low surplus capacity can breach multi-AZ, static stability, and HA best practices (REL02-BP03, REL10-BP01, SEC05-BP01/SEC05-BP02). Replacing ALB target group attachments for API backends similarly reduces the number of healthy backends and can increase load or cause failures for API and database traffic. These changes should be coordinated with connection draining, health check grace periods, maintenance windows, and rolling or blue/green deployments so there is always sufficient healthy capacity behind each load balancer.

Investigation

I treated the concern area as transient service unavailability caused by target deregistration during replacement of aws_lb_target_group_attachment resources. I first checked relevant organizational guidance: aws-high-availability says to flag workloads that apply changes to all instances simultaneously or that have ELB targets only in one AZ, and to maintain static stability across AZs. I then inspected the planned changes and current blast-radius state.

The strongest evidence is that both attachment resources are explicitly being replaced, while each corresponding target group currently has only a single healthy backend. api-health-terraform-example has one healthy IP target, 10.0.101.142:9090, behind the internal NLB. api-207c90ee-tg has one healthy instance target, i-09d6479fb9b97d123:80, behind the public ALB. The load balancers themselves are multi-AZ, but the actual registered targets shown in blast radius are singular, so removing either attachment leaves that target group with zero healthy targets at least transiently. The change also replaces instance i-0ea52e0eb6476a224, which owns private IP 10.0.101.142, so the IP attachment replacement is not a harmless metadata churn: the backend itself is being replaced and the EIP association is shifting to a not-yet-known instance.

I verified AWS behavior in official docs. AWS states that deregistering a target stops routing requests to it immediately and places it into draining, and that a newly registered target only starts receiving traffic after registration completes and it passes an initial health check. AWS also specifically notes that when an IP target is deregistered, you must wait for the deregistration delay to complete before registering the same IP again. That means a replace operation on an attachment can create a real gap between old target removal and new target usability, especially for the IP target on 10.0.101.142. Because each affected target group currently has only one healthy target, there is no surplus capacity to absorb that gap. This makes the hypothesis real, with high impact for the ALB-backed API and the internal NLB health/monitoring path.

✔ Hypothesis proven


EC2 AMI/user_data and fleet-wide rotation affecting hardening, behavior, and health checks

Observations 12

Hypothesis

AMI changes for EC2 instances (e.g., i-0464c4413cb0c54aa from ami-076e6b911c0417157 to ami-0dd010f28b091c0fd; i-0ea52e0eb6476a224 from ami-01b1bd34b1f8288c8 to ami-094a6f8df2b26dfde; t4g.nano instances serving api-health-terraform-example) introduce security, compatibility, and availability risks. New AMIs may have unpatched software, misconfigured OS, missing monitoring/agent software, or different networking stack or architecture (e.g., ARM vs x86), which can affect instance hardening, reliability, and performance. Replacing explicit plaintext user_data (e.g., inline httpd install) with opaque base64 blobs obscures initialization logic, making it harder to verify security posture and expected network-facing behavior on private IPs such as 10.0.101.133 and 10.0.101.142. These changes can break ALB/NLB health checks on port 9090 or other monitored ports if health-check responders, listeners, or firewall rules differ, causing targets to be marked unhealthy and disrupting traffic via api-health-terraform-example and mon-internal-terraform-example. Fleet-wide or simultaneous AMI rotation across both servers in the serving tier magnifies the impact of any bootstrap or configuration bug: boot-time failures, package-install regressions, metadata/IMDS behavior changes, or filesystem layout differences can take down all instances at once, eliminating configuration diversity and making rollback harder if old AMIs/user_data are discarded. This conflicts with SEC06-BP02, REL02-BP03, and related guidance; AMI provenance, architecture, installed agents, user_data content, and health-check behavior should be validated in staging, and AMI rollouts staged or canaried so at least one known-good path remains during updates.

Investigation

I treated the concern area as availability loss in the EC2-backed serving tier, with secondary security/compliance concerns around the new AMIs and opaque bootstrap. I first loaded the relevant organizational knowledge. That established explicit requirements and best practices: hardened AMIs should be used and fleet-wide simultaneous rollouts are a reliability risk (SEC06-BP02, REL10-BP03), and production EC2 instances must not be directly reachable from the internet or use unencrypted EBS volumes. I then compared the diffs with current blast-radius state.

The strongest evidence is not speculative AMI quality but the update pattern and topology. All three EC2 instances in scope are being replaced, not updated in place. The two instances behind api-health-terraform-example are standalone EC2 instances, not members of an Auto Scaling Group or any staged rollout mechanism. Current state shows the NLB target group 540044833068.eu-west-2.elbv2-target-group.api-health-terraform-example has target type ip, port 9090, and only one visible healthy target today: 10.0.101.142:9090. The attachment resources for that service are also being replaced in this same plan, confirming target registration churn alongside instance replacement. That means the service depends on direct instance/IP registration while both backing instances are rotated in one change. If the new AMI or user-data bootstrap fails to expose the health responder on 9090, the NLB has no autoscaling buffer or canary path and will lose healthy targets. This is exactly the kind of simultaneous rollout risk called out by the org guidance.

For the api-207c90ee ALB-backed instance, current state shows a single instance target behind api-207c90ee-tg with HTTP /health on port 80, and that instance is also being replaced. Since there is only one healthy instance target, replacement inherently creates a window where health depends entirely on successful bootstrap of the new node.

I did not find concrete evidence that the new AMIs are wrong-architecture: current i-0ea52e0eb6476a224 is arm64 on t4g.nano, and the change keeps the instance type the same, so the specific ARM-vs-x86 concern is unproven. I also did not find a concrete security-group or target-group port mismatch in current state; 9090 is allowed by sg-089e5107637083db5 and the current target is healthy. However, the risk remains real through a closely related mechanism: this plan replaces the entire small fleet and its LB attachments at once, while obscuring new bootstrap behavior behind base64 user data for two instances. With no staged rollout, no ASG self-healing layer, and only one currently healthy NLB target visible, any bootstrap regression in the new AMI/user-data will directly translate into unhealthy targets and service interruption. That is strong, change-specific evidence of availability risk, independent of whether the exact failure is architecture, packages, IMDS behavior, or service startup.

✔ Hypothesis proven


Simultaneous instance, target, and endpoint replacement creating composite availability risk

Observations 1

Hypothesis

Coordinated replacement of multiple EC2 instances, AMIs, and load balancer target attachments—including api_server EC2 replacement (with new AMI and bootstrap) and simultaneous replacement of its ALB target attachment, together with dependent api_access instance/EIP resources—can create composite availability risk. Terraform will sequence explicit dependencies, but there is still a window where old instances or targets are deregistered or terminated before new ones are healthy, registered in ALB/NLB target groups, and reachable via stable endpoints (EIP or DNS). If bootstrap or health check registration is slow, the ALB or NLB may temporarily have no healthy backends, and any clients that reach the EIP directly can encounter a gap. When capacity and redundancy are limited, this undermines static stability, multi-AZ resilience, and blue/green safety margins (REL10-BP01, REL11-BP05). Deployment plans should ensure surplus capacity or parallel environments so that simultaneous replacements do not leave the service without a healthy target or stable endpoint.

Investigation

I treated the concern area as temporary service unavailability during coordinated replacement of the API-facing EC2 instances and their load-balancer/EIP attachments. I first checked organizational guidance. The aws-high-availability knowledge file explicitly says production workloads should be pre-provisioned across AZs and flags both ELBs with targets in only one AZ and changes applied to all instances simultaneously as risks, especially when recovery depends on launching new capacity (REL10-BP01, REL10-BP03, REL11-BP05).

The planned diffs show both relevant compute nodes are being replaced, not updated in place: 540044833068.eu-west-2.ec2-instance.i-09d6479fb9b97d123 changes AMI and is replaced, and 540044833068.eu-west-2.ec2-instance.i-0ea52e0eb6476a224 also changes AMI and is replaced. Both target-group attachments are also replaced, and the Elastic IP 540044833068.eu-west-2.ec2-address.13.134.236.98 is reassociated to a new instance ID only after apply. Blast-radius state shows each load balancer currently has exactly one healthy backend: the ALB target group api-207c90ee-tg has a single healthy instance target i-09d6479fb9b97d123, and the NLB target group api-health-terraform-example has a single healthy IP target 10.0.101.142 from instance i-0ea52e0eb6476a224. Although the load balancers themselves span two AZs, the backends do not; both instances are in eu-west-2a, so there is no surplus capacity or multi-AZ target redundancy.

I then checked AWS/Terraform behavior. AWS ALB health checks require consecutive successful checks before a target is returned to service, and this target group is configured with HealthCheckIntervalSeconds=30 and HealthyThresholdCount=2, so a replacement instance will not become healthy immediately even after the service is up. AWS documents that target registration progresses through a registration state before serving traffic. The EIP also has to be reassociated from the old instance/ENI to the new one, which creates a direct-endpoint cutover rather than parallel capacity. I did not find evidence in the plan of create_before_destroy, an autoscaling group, duplicate attachments, or any other mechanism that preserves overlap. Because each frontend path has only one current healthy target, replacing the instance and its attachment necessarily creates a window where the old target/EIP is removed before the new target has passed health checks and the endpoint has settled.

That makes the hypothesis real. This is not speculative "maybe there is downtime if bootstrap is slow"; the concrete evidence is that there is only one healthy backend per target group today, both are being replaced, and admission of the replacements depends on health-check convergence and EIP reassociation. The impact is high because either the public ALB path or the direct-EIP/health endpoint path can temporarily have zero healthy backends during apply.

✔ Hypothesis proven


EC2 instance IAM role/profile association changes impacting permissions

Observations 1

Hypothesis

Changes to EC2 instance IAM role or instance profile associations (e.g., for i-09d6479fb9b97d123) may not appear clearly in diffs but can silently alter the permissions available to workloads. If an instance profile is removed or replaced, applications depending on that role’s credentials may lose access to required AWS APIs or gain excessive permissions, leading to failures or privilege escalation. IAM associations for instances must be explicitly verified during updates, ensuring least-privilege roles and stable profile bindings for dependent workloads.

Investigation

I investigated the concern area the hypothesis raises: loss or escalation of application permissions due to an EC2 IAM role or instance profile association change. I first checked relevant organizational knowledge for IAM and EC2 guidance. The internal standards say EC2 instances should have IAM instance profiles attached and roles should follow least privilege, but they do not require any special handling beyond explicitly verifying the binding.

I then examined the actual planned change and current state. The replaced instance 540044833068.eu-west-2.ec2-instance.i-09d6479fb9b97d123 keeps iam_instance_profile: api-207c90ee-api-profile unchanged in the diff, while the instance replacement is driven by an AMI change from ami-076e6b911c0417157 to ami-0dd010f28b091c0fd. The current live instance also has that same instance profile attached, and live IAM queries show that api-207c90ee-api-profile currently contains exactly the role api-207c90ee-api-role. There is no planned diff for the IAM role or the instance profile, and no evidence that the instance is switching to a different profile, losing its profile, or receiving a broader role.

I also checked AWS documentation to validate behavior. AWS documents that EC2 permissions are provided through an instance profile attached to the instance, and changing permissions requires replacing the instance profile; removing or swapping roles inside a profile can cause delayed credential changes. None of that is happening here. This change is an instance replacement with the same profile name retained, not an IAM association change. Because there is no evidence of a changed or removed profile binding, the hypothesis is speculative for this specific change.

✖ Hypothesis disproven


Monitoring gaps and coordinated root-volume recreation amplifying boot/storage failures

Observations 3

Hypothesis

Monitoring and storage lifecycle changes during instance/AMI/root-volume replacement can create operational blind spots and amplify boot/storage failures across the fleet. When a high-CPU CloudWatch alarm is dimensioned to a specific EC2 instance ID for the CPU-stress workload and that instance is replaced, there can be a period where the alarm is repointed to the new instance before meaningful CPU history accumulates, while the old instance’s metrics and alarms become effectively orphaned. This is especially risky when combined with heavy bootstrap user_data, instance-type/AMI swaps, and concurrent target/endpoint changes, as observability is weakest exactly when new nodes are most likely to misbehave (REL06-BP03, OPS04-BP02). In parallel, coordinated recreation of root EBS volumes (e.g., delete_on_termination=true on vol-0a61278f4602fc12b plus new AMI/root disk settings and encryption or gp3 changes across both servers) can turn a single boot or storage regression—such as encrypted root provisioning issues, gp3 initialization delays, or altered boot-time service start order—into a full fleet outage. Because both root chains are refreshed at once, there is no unaffected sibling left to absorb traffic or validate the new volume/image path under live load, increasing data-loss and availability risk (REL09-BP01, SEC06-BP02, SEC08-BP02). Monitoring, alarm dimensions, backup/restore, and rollback procedures should be verified and staged alongside compute and volume replacement.

Investigation

I investigated the concern area as two related risks: an observability gap during EC2 replacement and a fleet-wide outage from simultaneous root-volume/AMI refreshes. I first checked the relevant organizational guidance. The knowledge base does say production workloads should have CloudWatch alarms for key metrics (REL06-BP03) and warns against applying changes to all instances simultaneously without gradual rollout (REL10-BP03), and the security policy requires EBS volumes to be encrypted. Those standards make the configuration imperfect, but they do not by themselves prove this Terraform change will cause the specific failure described.

The actual evidence shows only one of the replaced instances is behind the application load balancer. The blast radius for 540044833068.eu-west-2.elbv2-target-group.api-207c90ee-tg contains a single healthy target, 540044833068.eu-west-2.ec2-instance.i-09d6479fb9b97d123, and the companion api-207c90ee-unhealthy-targets alarm watches target-group health rather than a hard-coded instance ID. That means the most important service-availability monitoring remains attached to the stable ALB/target-group identifiers and will continue to detect a failed replacement target even if the per-instance CPU alarm spends time in INSUFFICIENT_DATA. The high-CPU alarm is indeed instance-scoped today, and its diff shows dimensions becoming computed because the instance ID will be replaced, but CloudWatch alarms treat missing data as missing by default and this alarm explicitly has TreatMissingData: missing, so the expected effect is a temporary monitoring blind spot for CPU history, not a service outage.

I also examined the storage side. Both replaced instances currently use delete_on_termination=true root EBS volumes, so replacement will recreate their root disks. However, the diffs do not show an explicit storage configuration change such as new volume size, type, encryption flag, KMS key, or throughput; the concrete before-values remain gp3, 8 GiB on the affected instances, and the current root volume status for vol-0a61278f4602fc12b is healthy and fully initialized. The instance replacement is driven by AMI changes from ami-076e6b911c0417157 to ami-0dd010f28b091c0fd, not by an identified root-volume misconfiguration. Without evidence that the new AMI is broken, that encryption defaults in this account will conflict with the chosen instance types, or that Terraform is also changing load balancer health checks, target group ports, or rollback controls in a harmful way, the hypothesis remains speculative. There is a standards/compliance issue because the current root volume is unencrypted and the proposed diff leaves encryption unresolved as (known after apply), but that is not sufficient evidence that this specific change will fail at boot or cause full-fleet unavailability.

So the concern area is directionally reasonable, but after checking the planned changes, current target health, alarm configuration, and root-volume state, I found no strong evidence that this Terraform update will actually produce the hypothesized outage or storage failure. The change weakens CPU-instance-level observability during replacement, yet the ALB unhealthy-target alarm still covers the live service path, and the storage failure mechanism is unproven from the available data.

✖ Hypothesis disproven


💥 Blast Radius

Items 138

Edges 201

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overmind

⛔ Auto-Blocked


🔴 Decision

Auto-blocked: Routine score (-5) is below minimum (-1)


📊 Signals Summary

Routine 🔴 -5


🔥 Risks Summary

High 0 · Medium 0 · Low 0


💥 Blast Radius

Items 26 · Edges 63


View full analysis in Overmind ↗

@renovate renovate bot force-pushed the renovate/terraform branch from ba884cb to 12213ca Compare March 31, 2026 20:58
@renovate renovate bot changed the title chore(deps): update terraform aws to v6.38.0 chore(deps): update terraform Mar 31, 2026
@renovate renovate bot force-pushed the renovate/terraform branch from 12213ca to 97c66a8 Compare April 1, 2026 23:30
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overmind

⛔ Auto-Blocked


🔴 Decision

Found 4 high risks requiring review


📊 Signals Summary

Routine 🔴 -5

Policies 🔴 -3


🔥 Risks Summary

High 4 · Medium 0 · Low 0


💥 Blast Radius

Items 138 · Edges 201


View full analysis in Overmind ↗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Renovatebot and dependabot updates terraform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants