Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ifup finishes successfully but interfaces are still down (bridge on a bond on SR-IOV VF's) #319

Open
kaysond opened this issue Jan 13, 2025 · 4 comments

Comments

@kaysond
Copy link

kaysond commented Jan 13, 2025

I'm using ifupdown2 3.2.0-1+pmx11 (the 3.2.0 debian package with proxmox patches - see https://git.proxmox.com/?p=ifupdown2.git;a=summary).

interfaces:

auto lo
iface lo inet loopback

auto enp1s0f0v0
iface enp1s0f0v0 inet manual

auto enp1s0f1v0
iface enp1s0f1v0 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves enp1s0f0v0 enp1s0f1v0
        bond-miimon 100
        bond-mode active-backup
        bond-primary enp1s0f0v0

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        bridge-stp off
        bridge-fd 0

auto vmbr0.10
iface vmbr0.10 inet static
        address 10.7.0.20/24
        gateway 10.7.0.1

source /etc/network/interfaces.d/*

on boot start-networking runs successfully via systemd (i.e. exits 0), but the interfaces are all down:

> ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:9b:d6:84 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
3: enp1s0f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:9b:d6:85 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
4: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8c:16:45:92:88:9b brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
5: enp1s0f0v0: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff
6: enp1s0f1v0: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff
7: enp1s0f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 86:62:db:16:bf:ef brd ff:ff:ff:ff:ff:ff
8: enp1s0f1v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether e6:93:a7:a5:17:b9 brd ff:ff:ff:ff:ff:ff
9: enp1s0f1v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether de:7d:9a:5c:a5:a3 brd ff:ff:ff:ff:ff:ff
10: enp1s0f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 96:cd:ab:44:2c:57 brd ff:ff:ff:ff:ff:ff
11: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue master vmbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff
12: vmbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff
13: vmbr0.10@vmbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether ba:4e:ac:fb:fe:ef brd ff:ff:ff:ff:ff:ff

dmesg shows:

[    6.891996] bond0: (slave enp1s0f0v0): Enslaving as a backup interface with a down link
[    6.895778] bond0: (slave enp1s0f1v0): Enslaving as a backup interface with a down link
[    6.911862] vmbr0: port 1(bond0) entered blocking state
[    6.911866] vmbr0: port 1(bond0) entered disabled state
[    6.911877] bond0: entered allmulticast mode

and debug logs show that it's running the commands to bring the interfaces up:

2025-01-10 12:32:50,576: MainThread: ifupdown: scheduler.py:105:run_iface_op(): debug: bond0: pre-up : running module bond
2025-01-10 12:32:50,576: MainThread: ifupdown.bond: bond.py:697:get_ifla_bond_attr_from_user_config(): info: bond0: set bond-mode active-backup
2025-01-10 12:32:50,576: MainThread: ifupdown.bond: bond.py:697:get_ifla_bond_attr_from_user_config(): info: bond0: set bond-miimon 100
2025-01-10 12:32:50,577: MainThread: ifupdown.bond: bond.py:697:get_ifla_bond_attr_from_user_config(): info: bond0: set bond-primary enp1s0f0v0
2025-01-10 12:32:50,577: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:3130:link_add_bond_with_info_data(): info: bond0: netlink: ip link add dev bond0 type bond (with attributes)
2025-01-10 12:32:50,577: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:3134:link_add_bond_with_info_data(): debug: attributes: OrderedDict([(1, 1), (3, 100), (11, 5)])
2025-01-10 12:32:50,577: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2744:link_set_master(): info: enp1s0f0v0: netlink: ip link set dev enp1s0f0v0 master bond0
2025-01-10 12:32:50,579: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2610:link_up_force(): info: enp1s0f0v0: netlink: ip link set dev enp1s0f0v0 up
2025-01-10 12:32:50,581: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2744:link_set_master(): info: enp1s0f1v0: netlink: ip link set dev enp1s0f1v0 master bond0
2025-01-10 12:32:50,585: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2610:link_up_force(): info: enp1s0f1v0: netlink: ip link set dev enp1s0f1v0 up

(full logs: ifupdown2.debug.failed.log)

Running ifreload -a or ifup enp1s0f0v0 enp1s0f1v0 succeed but don't actually bring the interfaces up. If I do an ifdown enp1s0f0v0 enp1s0f1v0, first however, then either command successfully brings the interfaces up, and once everything is up, the network behaves as expected.


Initially, I suspected some kind of race condition on boot because I'm setting up and using SR-IOV VF's with udev rules:

ACTION=="add", SUBSYSTEM=="net", ENV{INTERFACE}=="enp1s0f0np0", ATTR{device/sriov_numvfs}="3"
ACTION=="add", SUBSYSTEM=="net", ENV{INTERFACE}=="enp1s0f1np1", ATTR{device/sriov_numvfs}="3"

but 1) systemd is set up to run udev settle first, and in a systemd-analyze plot, I can see the VF interfaces are up before start-networking is run. Furthermore, if I add long sleep commands to delay ifup, nothing changes.

Weirdly, if I get rid of the bridge, and assign a static IP to the bond, it comes up as expected:

auto lo
iface lo inet loopback

auto enp1s0f0v0
iface enp1s0f0v0 inet manual

auto enp1s0f1v0
iface enp1s0f1v0 inet manual

auto bond0
iface bond0 inet static
        bond-slaves enp1s0f0v0 enp1s0f1v0
        bond-miimon 100
        bond-mode active-backup
        bond-primary enp1s0f0v0
       address 10.1.0.20/24
       gateway 10.1.0.1

source /etc/network/interfaces.d/*
> ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:9b:d6:84 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
3: enp1s0f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 68:05:ca:9b:d6:85 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
4: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 8c:16:45:92:88:9b brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
5: enp1s0f1v0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff
6: enp1s0f0v0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff
7: enp1s0f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:12:4a:95:c6:54 brd ff:ff:ff:ff:ff:ff
8: enp1s0f1v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether e6:8a:d1:a7:58:de brd ff:ff:ff:ff:ff:ff
9: enp1s0f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ce:ef:ad:81:8a:28 brd ff:ff:ff:ff:ff:ff
10: enp1s0f1v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 72:f5:80:f1:b6:35 brd ff:ff:ff:ff:ff:ff
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 42:9d:5d:b1:61:e1 brd ff:ff:ff:ff:ff:ff

and now dmesg shows:

[    6.720277] bond0: (slave enp1s0f0v0): Enslaving as a backup interface with a down link
[    6.724151] bond0: (slave enp1s0f1v0): Enslaving as a backup interface with a down link
[    6.805535] iavf 0000:02:02.0 enp1s0f0v0: NIC Link is Up Speed is 10 Gbps Full Duplex
[    6.809171] iavf 0000:02:0a.0 enp1s0f1v0: NIC Link is Up Speed is 10 Gbps Full Duplex
[    6.831866] bond0: (slave enp1s0f0v0): link status definitely up, 10000 Mbps full duplex
[    6.831876] bond0: (slave enp1s0f1v0): link status definitely up, 10000 Mbps full duplex
[    6.831878] bond0: (slave enp1s0f0v0): making interface the new active one
[    6.831893] bond0: active interface up!

full logs for the successful case: ifupdown2.debug.succeeded.log


Finally, if I get rid of the SR-IOV VF's, and setup the bridge on the bond on two physical interfaces, it works as expected. I'm suspicious that it may be MAC address related, but as ifup runs successfully, I'm not sure where to look.

If I do a diff of the two logs linked above, the key difference that I see is that the successful boot brings the bond up with: MainThread: ifupdown2.NetlinkListenerWithCache: nlcache.py:2606:link_up(): info: bond0: netlink: ip link set dev bond0 up, and the failed with MainThread: ifupdown: utils.py:305:_log_command_exec(): info: executing /bin/ip -force -batch - [link set dev bond0 up]

I've been debugging this for several days now and am now stuck, so any help is greatly appreciated. Ultimately, I do think this is an ifupdown2 bug, because at the very least, it shouldn't be exiting 0 if it's not actually able to bring the interfaces up.

@williamdes
Copy link

I have a similar issue, but added allow-hotplug lines and it seems to do the trick.
Can you confirm ?

Ref: #213

@kaysond
Copy link
Author

kaysond commented Jan 27, 2025

I have a similar issue, but added allow-hotplug lines and it seems to do the trick. Can you confirm ?

Ref: #213

What did you add allow-hotplug to? I tried it for the VF interfaces but it didn't work. I don't think you're supposed to use allow-hotplug for bonds/bridges.

@williamdes
Copy link

williamdes commented Jan 27, 2025

It solves the following bug: when I reboot there is no ip loaded, I need to run: if reload - a
I added lines for each interface on my /etc/network/interfaces file. Debian 12.

allow-hotplug ens18
allow-hotplug ens19
allow-hotplug ens20

For reasons that I do not understand, it works.
even if the interfaces are clearly already present at boot time.

And yes, maybe only add for non bridge interfaces

@kaysond
Copy link
Author

kaysond commented Jan 27, 2025

It solves the following bug: when I reboot there is no ip loaded, I need to run: if reload - a I added lines for each interface on my /etc/network/interfaces file. Debian 12.

allow-hotplug ens18
allow-hotplug ens19
allow-hotplug ens20

For reasons that I do not understand, it works. even if the interfaces are clearly already present at boot time.

And yes, maybe only add for non bridge interfaces

Ah that didn't work for me unfortunately. If I run ifreload -a after boot, it also doesn't work, so I suspect there might be something going on with the driver too...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants