Skip to content

fix(lib): write drop-in configs to active network file directory#147

Open
mjnowen wants to merge 1 commit intoamazonlinux:mainfrom
mjnowen:fix/dropin-dir-follows-active-network-file
Open

fix(lib): write drop-in configs to active network file directory#147
mjnowen wants to merge 1 commit intoamazonlinux:mainfrom
mjnowen:fix/dropin-dir-follows-active-network-file

Conversation

@mjnowen
Copy link
Copy Markdown
Contributor

@mjnowen mjnowen commented Mar 30, 2026

fix(lib): write drop-in configs to active network file directory

Fixes: #146

Summary

  • Adds _get_active_dropin_dir() helper that discovers the actual active .network file for an interface via systemd-networkd runtime state
  • create_ipv4_aliases() and create_rules() use the helper instead of hardcoded 70-<iface>.network.d paths
  • Fixes secondary IPv4 aliases and policy routing rules being silently ignored on distributions where another network config generator (netplan, cloud-init) creates a lower-numbered .network file

Problem

systemd-networkd selects exactly one .network file per interface — the one with the lowest numerical prefix that matches. Drop-in files are only read from the active file's .d/ directory.

create_ipv4_aliases() and create_rules() hardcode their drop-in directory to 70-<iface>.network.d/. When netplan or cloud-init generates a 10-netplan-<iface>.network, systemd-networkd uses that file and silently ignores 70-<iface>.network and all its drop-ins.

This means ec2net_alias.conf (secondary IPv4 addresses) and ec2net_policy_*.conf (policy routing rules) are written to disk correctly but never applied.

Solution

A new helper _get_active_dropin_dir():

  1. Reads the interface's ifindex from sysfs.
  2. Parses the NETWORK_FILE= line from /run/systemd/netif/links/<ifindex>.
  3. Returns <NETWORK_FILE>.d as the drop-in directory.
  4. Falls back to the original 70-<iface>.network.d when detection fails (e.g. early boot, missing sysfs node).
+_get_active_dropin_dir() {
+    local iface=$1
+    local default_dir="${unitdir}/70-${iface}.network.d"
+    local ifindex network_file
+    ifindex=$(cat "/sys/class/net/${iface}/ifindex" 2> /dev/null) || { echo "$default_dir"; return; }
+    network_file=$(sed -n 's/^NETWORK_FILE=//p' "/run/systemd/netif/links/${ifindex}" 2> /dev/null) || { echo "$default_dir"; return; }
+    if [ -n "$network_file" ]; then
+        echo "${network_file}.d"
+    else
+        echo "$default_dir"
+    fi
+}

Test plan

Tested on Ubuntu 24.04 (c6i.2xlarge) where netplan generates /run/systemd/network/10-netplan-ens5.network.

1. Add a single secondary IP

aws ec2 assign-private-ip-addresses \
    --network-interface-id eni-058ecf72704d3be0c \
    --secondary-private-ip-address-count 1
# Wait ~30s for refresh timer

ip addr show ens5
# 2: ens5: ...
#     inet 172.31.79.251/32 scope global noprefixroute ens5   <-- secondary
#     inet 172.31.79.44/20 metric 100 brd ... scope global dynamic ens5

networkctl status ens5
# Network File: /run/systemd/network/10-netplan-ens5.network
#               └─/run/systemd/network/10-netplan-ens5.network.d/ec2net_alias.conf
# Address: 172.31.79.44 (DHCP4 via 172.31.64.1)
#          172.31.79.251

2. Add multiple secondary IPs

aws ec2 assign-private-ip-addresses \
    --network-interface-id eni-058ecf72704d3be0c \
    --secondary-private-ip-address-count 3
# Wait ~30s

ip addr show ens5 | grep inet
#     inet 172.31.64.253/32 scope global noprefixroute ens5
#     inet 172.31.71.37/32 scope global noprefixroute ens5
#     inet 172.31.73.123/32 scope global noprefixroute ens5
#     inet 172.31.79.44/20 metric 100 ...

3. Partial removal (remove 1 of 3)

aws ec2 unassign-private-ip-addresses \
    --network-interface-id eni-058ecf72704d3be0c \
    --private-ip-addresses 172.31.73.123
# Wait ~30s

ip addr show ens5 | grep inet
#     inet 172.31.64.253/32 scope global noprefixroute ens5
#     inet 172.31.71.37/32 scope global noprefixroute ens5
#     inet 172.31.79.44/20 metric 100 ...

4. Remove all secondary IPs

aws ec2 unassign-private-ip-addresses \
    --network-interface-id eni-058ecf72704d3be0c \
    --private-ip-addresses $(aws ec2 describe-network-interfaces \
        --network-interface-ids eni-058ecf72704d3be0c \
        --query 'NetworkInterfaces[0].PrivateIpAddresses[?Primary==`false`].PrivateIpAddress' \
        --output text)
# Wait ~30s

ip addr show ens5 | grep inet
#     inet 172.31.79.44/20 metric 100 ...   <-- only primary remains

5. Syslog confirms drop-in written to correct path

ec2net[...]: install_and_reload detected change for: /run/systemd/network/10-netplan-ens5.network.d/ec2net_alias.conf
ec2net[...]: Reloaded networkd

6. Fallback behaviour

When 70-ens5.network IS the active file (no netplan/cloud-init conflict), the helper returns 70-ens5.network.d and behaviour is identical to before this change.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

When another network config generator (e.g. netplan, cloud-init) creates
a .network file with a lower numerical prefix than the 70- prefix used
by amazon-ec2-net-utils, systemd-networkd selects that file as the
active config for the interface. Drop-in files written under
70-<iface>.network.d/ are then silently ignored because they belong to
an inactive network file.

This causes secondary IPv4 addresses (ec2net_alias.conf) and policy
routing rules (ec2net_policy_*.conf) to never be applied, even though
they are correctly fetched from IMDS and written to disk.

Add _get_active_dropin_dir() which queries the systemd-networkd runtime
state to discover the actual active NETWORK_FILE for an interface, then
returns that file's drop-in directory. Falls back to the original
70-<iface>.network.d path when detection is unavailable (e.g. during
early boot before networkd has initialised the interface).

Use _get_active_dropin_dir() in create_ipv4_aliases() and
create_rules() instead of the hardcoded 70- path.
local default_dir="${unitdir}/70-${iface}.network.d"
local ifindex network_file
ifindex=$(cat "/sys/class/net/${iface}/ifindex" 2> /dev/null) || { echo "$default_dir"; return; }
network_file=$(sed -n 's/^NETWORK_FILE=//p' "/run/systemd/netif/links/${ifindex}" 2> /dev/null) || { echo "$default_dir"; return; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a race condition between netplan and ec2-net-utils during a restart where netplan config is not first and you have two seperate configs since netplan has not logic to append to ec2-net-utils config?

Copy link
Copy Markdown
Contributor Author

@mjnowen mjnowen Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The policy-routes@.service start path calls systemd-networkd-wait-online -i $iface before _get_active_dropin_dir() is ever invoked, so networkd has already selected its active .network file (including any netplan-generated one) by that time. The refresh path (timer-driven) does not call wait-online explicitly, but the timer only fires 30s+ after the start path completes (OnActiveSec=30, OnUnitInactiveSec=60), so the interface is already fully online and NETWORK_FILE= is populated.

On Ubuntu, netplan's systemd generator runs during the early generator phase — before any services start — so 10-netplan-*.network files are on disk well before systemd-networkd.service begins matching interfaces to files.

On Amazon Linux 2023 there is no competing generator — 70-<iface>.network is the only (and therefore active) file, so the helper returns the same path as the old hardcoded logic and no race is possible.

On RHEL the default network manager is NetworkManager rather than systemd-networkd, so this codepath does not apply.

If detection somehow fails (e.g. a truly degenerate early-boot ordering), the helper falls back to 70-<iface>.network.d — the same path used before this change — so the worst case is parity with the current behaviour, not a new failure mode.

The coexistence of 70-<iface>.network alongside a lower-numbered netplan file is pre-existing and unchanged by this PR; it's precisely that coexistence that causes the bug this fix addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Secondary IPv4 addresses and policy routing rules silently ignored when a lower-priority network file exists

2 participants