fix(lib): write drop-in configs to active network file directory#147
fix(lib): write drop-in configs to active network file directory#147mjnowen wants to merge 1 commit intoamazonlinux:mainfrom
Conversation
When another network config generator (e.g. netplan, cloud-init) creates a .network file with a lower numerical prefix than the 70- prefix used by amazon-ec2-net-utils, systemd-networkd selects that file as the active config for the interface. Drop-in files written under 70-<iface>.network.d/ are then silently ignored because they belong to an inactive network file. This causes secondary IPv4 addresses (ec2net_alias.conf) and policy routing rules (ec2net_policy_*.conf) to never be applied, even though they are correctly fetched from IMDS and written to disk. Add _get_active_dropin_dir() which queries the systemd-networkd runtime state to discover the actual active NETWORK_FILE for an interface, then returns that file's drop-in directory. Falls back to the original 70-<iface>.network.d path when detection is unavailable (e.g. during early boot before networkd has initialised the interface). Use _get_active_dropin_dir() in create_ipv4_aliases() and create_rules() instead of the hardcoded 70- path.
| local default_dir="${unitdir}/70-${iface}.network.d" | ||
| local ifindex network_file | ||
| ifindex=$(cat "/sys/class/net/${iface}/ifindex" 2> /dev/null) || { echo "$default_dir"; return; } | ||
| network_file=$(sed -n 's/^NETWORK_FILE=//p' "/run/systemd/netif/links/${ifindex}" 2> /dev/null) || { echo "$default_dir"; return; } |
There was a problem hiding this comment.
Could there be a race condition between netplan and ec2-net-utils during a restart where netplan config is not first and you have two seperate configs since netplan has not logic to append to ec2-net-utils config?
There was a problem hiding this comment.
The policy-routes@.service start path calls systemd-networkd-wait-online -i $iface before _get_active_dropin_dir() is ever invoked, so networkd has already selected its active .network file (including any netplan-generated one) by that time. The refresh path (timer-driven) does not call wait-online explicitly, but the timer only fires 30s+ after the start path completes (OnActiveSec=30, OnUnitInactiveSec=60), so the interface is already fully online and NETWORK_FILE= is populated.
On Ubuntu, netplan's systemd generator runs during the early generator phase — before any services start — so 10-netplan-*.network files are on disk well before systemd-networkd.service begins matching interfaces to files.
On Amazon Linux 2023 there is no competing generator — 70-<iface>.network is the only (and therefore active) file, so the helper returns the same path as the old hardcoded logic and no race is possible.
On RHEL the default network manager is NetworkManager rather than systemd-networkd, so this codepath does not apply.
If detection somehow fails (e.g. a truly degenerate early-boot ordering), the helper falls back to 70-<iface>.network.d — the same path used before this change — so the worst case is parity with the current behaviour, not a new failure mode.
The coexistence of 70-<iface>.network alongside a lower-numbered netplan file is pre-existing and unchanged by this PR; it's precisely that coexistence that causes the bug this fix addresses.
fix(lib): write drop-in configs to active network file directory
Fixes: #146
Summary
_get_active_dropin_dir()helper that discovers the actual active.networkfile for an interface via systemd-networkd runtime statecreate_ipv4_aliases()andcreate_rules()use the helper instead of hardcoded70-<iface>.network.dpaths.networkfileProblem
systemd-networkdselects exactly one.networkfile per interface — the one with the lowest numerical prefix that matches. Drop-in files are only read from the active file's.d/directory.create_ipv4_aliases()andcreate_rules()hardcode their drop-in directory to70-<iface>.network.d/. When netplan or cloud-init generates a10-netplan-<iface>.network, systemd-networkd uses that file and silently ignores70-<iface>.networkand all its drop-ins.This means
ec2net_alias.conf(secondary IPv4 addresses) andec2net_policy_*.conf(policy routing rules) are written to disk correctly but never applied.Solution
A new helper
_get_active_dropin_dir():ifindexfrom sysfs.NETWORK_FILE=line from/run/systemd/netif/links/<ifindex>.<NETWORK_FILE>.das the drop-in directory.70-<iface>.network.dwhen detection fails (e.g. early boot, missing sysfs node).Test plan
Tested on Ubuntu 24.04 (c6i.2xlarge) where netplan generates
/run/systemd/network/10-netplan-ens5.network.1. Add a single secondary IP
aws ec2 assign-private-ip-addresses \ --network-interface-id eni-058ecf72704d3be0c \ --secondary-private-ip-address-count 1 # Wait ~30s for refresh timer ip addr show ens5 # 2: ens5: ... # inet 172.31.79.251/32 scope global noprefixroute ens5 <-- secondary # inet 172.31.79.44/20 metric 100 brd ... scope global dynamic ens5 networkctl status ens5 # Network File: /run/systemd/network/10-netplan-ens5.network # └─/run/systemd/network/10-netplan-ens5.network.d/ec2net_alias.conf # Address: 172.31.79.44 (DHCP4 via 172.31.64.1) # 172.31.79.2512. Add multiple secondary IPs
aws ec2 assign-private-ip-addresses \ --network-interface-id eni-058ecf72704d3be0c \ --secondary-private-ip-address-count 3 # Wait ~30s ip addr show ens5 | grep inet # inet 172.31.64.253/32 scope global noprefixroute ens5 # inet 172.31.71.37/32 scope global noprefixroute ens5 # inet 172.31.73.123/32 scope global noprefixroute ens5 # inet 172.31.79.44/20 metric 100 ...3. Partial removal (remove 1 of 3)
aws ec2 unassign-private-ip-addresses \ --network-interface-id eni-058ecf72704d3be0c \ --private-ip-addresses 172.31.73.123 # Wait ~30s ip addr show ens5 | grep inet # inet 172.31.64.253/32 scope global noprefixroute ens5 # inet 172.31.71.37/32 scope global noprefixroute ens5 # inet 172.31.79.44/20 metric 100 ...4. Remove all secondary IPs
aws ec2 unassign-private-ip-addresses \ --network-interface-id eni-058ecf72704d3be0c \ --private-ip-addresses $(aws ec2 describe-network-interfaces \ --network-interface-ids eni-058ecf72704d3be0c \ --query 'NetworkInterfaces[0].PrivateIpAddresses[?Primary==`false`].PrivateIpAddress' \ --output text) # Wait ~30s ip addr show ens5 | grep inet # inet 172.31.79.44/20 metric 100 ... <-- only primary remains5. Syslog confirms drop-in written to correct path
6. Fallback behaviour
When
70-ens5.networkIS the active file (no netplan/cloud-init conflict), the helper returns70-ens5.network.dand behaviour is identical to before this change.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.