Under the hood: The administrative state of Linux network interfaces

Recently we were wondering why node_exporter, in all the nice metrics it exposes about a Linux system, does not show if a Linux network interface is configured to be UP or DOWN, but only the operational state. So we started digging…

On the CLI, using iproute2 tooling, the operational state is shown explicitly, for example in the 2nd column in the following output:

$ ip -br l

lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0 DOWN aa:bb:cc:dd:ee:ff <NO-CARRIER,BROADCAST,MULTICAST,UP>
wlan0 UP 00:08:15:ab:cd:ef <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1 DOWN 01:31:17:00:47:11 <BROADCAST,MULTICAST>
ffho-ops UNKNOWN <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP>

If you look closely you can see that the administrative state is encoded within the last column, namely it’s up if the keyword “UP” is part of the list, and down otherwise.

We started digging through /sys/class/net/* but didn’t find any entry which seemed to correspond to the administrative state of the interface. Digging further the flags caught my eye and playing with an interface revealed that the last bit seemed to indicate if the interface should be UP or DOWN.

While crafting a small PR for node_exporter I dug further to figure out why that is. The first stop was the Linux Kernel cross reference, which revealed the flags seem to stem from BSD. Searching for those yielded the netdevice(7) man page containing a definition for the flags:

SIOCGIFFLAGS, SIOCSIFFLAGS
Get or set the active flag word of the device.
ifr_flags contains a bit mask of the following values:

Device flags
IFF_UP Interface is running.
IFF_BROADCAST Valid broadcast address set.
IFF_DEBUG Internal debugging flag.
IFF_LOOPBACK Interface is a loopback interface.
IFF_POINTOPOINT Interface is a point-to-point link.
IFF_RUNNING Resources allocated.
...

Now it all makes sense, and hopefully soon everyone can just check the adminstate of Linux networking interfaces in Prometheus 🙂

Update (May 2023): The PR has been merged.

Influencing Linux Source Address Selection on routes installed by bird and FRR

The use of dynamic routing protocols – mainly IS-IS, OSPF and BGP – is quite common in contemporary networks, even on the host networking stacks. In some situations it is desirable to not only control the path packets will take to any given destination, but also the source address of locally sourced traffic.

By default Linux systems will use the primary IP address of the egress interfaces, which has global scope and has the same address family of the flow in question. This decision can be overridden per destination by setting the src attribute of a route to a specific locally configured IP address, for example:

ip route add 2001:db8:0815::/48 via 2001:db8:1::1 src 2001:db8::42

For routes installed by a routing daemon this has to happen inside the routing daemon, so the the NETLINK call will know about the source address to set. For bird this is rather straight forward, for FRR it took me – and apparently others – a bit of time to find the right knobs, so I’ll document both ways here for future me and like present you looking for it 🙂

bird

In bird that’s fairly simple and can be done via the export filter of protocol kernel. In bird 1 this could look like this:

define LO_IP = 192.0.2.42;

protocol kernel {
scan time 20;
import none;
export filter {
# <Apply any required filtering here/>

# Set src attr of all routers installed in FIB to LO_IP
krt_prefsrc = LO_IP;
accept;
};
}

For bird 2 this has to happen inside the address family specific configuration:

define LO_IP = 2001:db8::42;

protocol kernel {
scan time 20;

ipv6 {
import none;
export filter {
# <Apply any required filtering here/>

# Set src attr of all routers installed in FIB to LO_IP
krt_prefsrc = LO_IP;
accept;
};
}

FRR

For FRR the configuration is a little bit more involved and requires a route-map to be applied to the protocols the routes are learned from! So if you want to set the source address for routes learned from OSPF this would look like this:

route-map set-loopback-src-ip permit 1
set src 192.0.2.42
!
ip protocol ospf route-map set-loopback-src-ip

If you want to filter on which prefixes this applies, this could be done by adding a prefix list into the route-map, e.g.

route-map set-loopback-src-ip permit 1
match ip address prefix-list YOUR_PREFIX_LIST
set src 192.0.2.42

If you run OSPF + BGP and want the source address to be set for both protocols, you need to add an additional line to the config example from above:

ip protocol bgp route-map set-loopback-src-ip

MPLS Lab – Playing with static LSPs and VRFs on Linux

At DENOG13 I held a workshop Fun with PBR, VRFs and NetNS on Linux (in German) where I showcased forwarding IP packets within a VRF via static MPLS LSPs. I’ve been asked to publish the configuration for this lab, so here we are 🙂

Consider the following topology consisting of a core ring build with 5 routers, a border router (br-01) connected to core router E (cr-E) as well as to the Internet. All routers take part of OSPF area 0 and run iBGP with br-01 as the route reflector which is providing a default route. This is the same setup used for most of the FrOSCon Network Track.

Topology of MPLS lab
Continue reading MPLS Lab – Playing with static LSPs and VRFs on Linux

Conditional SLAAC with bird – Only advertise default route if one is present on the router

When people want to deploy a Linux box as a 1st hop router in an IPv6 network RAdvd usually is the first choice for sending out Router Advertisments. The configuration is rather straight forward and on most distributions extensive example configuration files exist. Once configured RAdvd will silently do it’s job and happilly send out periodic RAs and answer any Router Solicitations, even thought the router itself may not have any network connectivity (anymore).

While designing the Freifunk Hochstift Backbone a while back I was looking for a way to only send out RAs with a default route if the router actually has one itself to prevent clients from getting broken IPv6 connectivity (and people complaining).

I found that the Bird Internet Routing Daemon – which already was in use for OSPF/BGP – also provides a ravd protocol which does what I wanted using a trigger. The following configuration will keep sending out RAs but will zero the router lifetime, which will allow clients to configure an IP address from this prefix but not use this router as default gateway, which is OK for my particular setup.

protocol radv {
    # ONLY advertise prefix, IF default route is available
    import all;
    export all;
    trigger ::/0;

    interface "vlan0815" {
        default lifetime 600 sensitive yes;

        prefix 2001:db8:23:42::/64 {
        preferred lifetime 3600;
    };
}

This is the default behaviour of the radv trigger but it can be fine tuned to your requirements:

trigger prefix

RAdv protocol could be configured to change its behavior based on availability of routes. When this option is used, the protocol waits in suppressed state until a trigger route (for the specified network) is exported to the protocol, the protocol also returns to suppressed state if the trigger route disappears. Note that route export depends on specified export filter, as usual. This option could be used, e.g., for handling failover in multihoming scenarios.

During suppressed state, router advertisements are generated, but with some fields zeroed. Exact behavior depends on which fields are zeroed, this can be configured by sensitive option for appropriate fields. By default, just default lifetime (also called router lifetime) is zeroed, which means hosts cannot use the router as a default router. preferred lifetime and valid lifetime could also be configured as sensitive for a prefix, which would cause autoconfigured IPs to be deprecated or even removed.

The radv protocol has a long list of knobs and switches for fine tuning, see the bird 1.6 and 2.0 documentation for further details.

OpenVPN and VRFs

Using VRFs on Linux enables a whole new set of network setups.

As described in the previous post about VRFs on Linux, VRFs allow to isolate different interfaces at layer 3. In the Freifunk Hochstift network we chose to consider the main or default VRF as the internal network and move any internet facing interfaces into an external VRF. Following this concept allows, to safely contain traffic within the internal network and only at designated border routers leak eligible traffic into the internet.

In an distributed environment like the Freifunk Hochstift network, it is inevitable to connect different islands using VPN tunnels over the Internet. This could be done by the means of GRE tunnels as shown in the previous article about the border routers, or by means of encrypted VPNs like OpenVPN, IPsec or Wireguard. As the old infrastructure quite heavily relied on OpenVPN tunnels, and they worked quite well, the new setup should keep this building block in place.

Continue reading OpenVPN and VRFs

(Vlan-aware) Bridges on Linux

Lately I’ve had some conversations about how Linux sucks at bridging tagged VLANs into VMs, which just isn’t true anymore.

With recent Kernels Linux bridges have become vlan-aware and now allow configuring any bridge port like a port of any decent network switch with respect to 802.1q VLANS. A port can present a VLAN as untagged traffic as well as a number of VLANs in tagged mode. As can be expected, SVIs can be configured as vlan interfaces of a bridge, too.

The old brctl utility has been integrated within the iproute suite as part of of ip link. The commands map as follows:

brctl addbr br0
ip link add br0 type bridge
    [ forward_delay FORWARD_DELAY ]
    ...
    [ vlan_filtering VLAN_FILTERING ]
    [ vlan_default_pvid VLAN_D_PVID ]
    ...
    [ nf_call_iptables NF_CALL_IPT ]
    [ nf_call_ip6tables NF_CALL_IP6TABLES ]
    [ nf_call_arptables NF_CALL_ARPTABLES ]


brctl addif br0 eth0
ip link set eth0 master br0

Continue reading (Vlan-aware) Bridges on Linux

Seriously predictable interface names – An introduction to systemd .link files

Predictable interface names are a new thing. The most common argument made is that they are not really predictable though, depending on the point of view. How about making interface names predictable and meaningful in the same time?

Most admins will probably think of udev right now, which previously was heavily used to achieve exactly that. In times of systemd the new hotness are .link files which provide similar capabilities and allow even more options to be set for interfaces.

Continue reading Seriously predictable interface names – An introduction to systemd .link files

FrOSCon 13 Network Track – Videos and Slides

A month ago the 13th Free and Open Source Software Conference (FrOSCon) took place in St. Augustin, Germany.  At this years event I organized a two day Network Track designed for a broad audience of Linux folks, system administrators and developers to answer questions about networking topics they were afraid to ask or didn’t realize they wanted or had to know! As the lines between system engineering and network engineering keep on blurring it’s getting more and more important to broaden the focus in both worlds, keywords being things like SDN, IP-Fabric, Segment Routing etc. here.

The track started with a lecture about networking basics and over both days advanced to technically more sophisticated topics following a red line.

On Saturday the focus was on Layer 2 and Layer 3 fundamentals (Ethernet switching and routing), dynamic routing protocols as well as the Linux packet-filter. Sundays track started gently with VLANs, Bonding and Bridging and advanced to more sophisticated topics like policy-based routing, VRFs, Open vSwitch, Segment Routing and Software Defined Networking. The track concluded with an overview about Best Current Operational Practices and a Q&A sessions.

All these talks are – thanks to the nice folks at CCC-VOC – available on Video at media.ccc.de (german audio) as are the slides (english):

Day 1

Day 2

  • Adv. topics in Layer2/3 – Light and dark magic with the Linux network stack  (Video) (Slides)
  • Segment Routing  (Video) (Slides)
  • Open vSwitch – The switch within your machine  (Video) (Slides)
  • Best Current Opertional Practices – Dos, Don’ts and lessons learned
    (Video) (Slides)
  • Building your own SDN – The penguin orchestrates the network  (Video) (Slides)
  • Grand Q&A  (Video)

Anycasted services with Debian, bird, anycast-healthchecker and Cisco Nexus 7000

A while ago we started getting alerts, that one of our Kerberos KDCs had problem with the Kerberos database replication. A little digging revealed, that the problems are caused by load spikes on the KDC which were the result of a burst of legitimate queries fired by some systems we didn’t have much control over. Additionally we found that the MIT Kerberos implementation queries all KDCs provided in the configuration file in sequential order, so the first KDC get’s nearly all queries. While thinking about load balancing solutions, quickly anycast came to mind, so we decided to set it up. Anycast leverages the Equal Cost Multipath Routing (ECMP)  capability of common routers to distribute traffic to multiple next-hops for the same destination.

The solution consists of three corner stones:

  1. anycast-healtchecker as a means to check service availability
  2. bird as a BGP speaker on the KDCs and route reflectors
  3. Data center routers (Cisco Nexus 7010) speaking BGP to the route reflectors

The topology is as follows:

Topology KDCs

Continue reading Anycasted services with Debian, bird, anycast-healthchecker and Cisco Nexus 7000

Beware of the details (and VLANs)

A friend today challenged me with this problem on an Ubuntu box:

auto eth2
iface eth2 inet static
    address 10.23.0.32
    netmask 31

auto eth2.210
iface eth2.210 inet static
    address 10.42.0.32
    netmask 31

root@box:~# ifup eth2.210
Cannot find device "eth2.210"
Failed to bring up eth2.210.

The first thing coming to mind is “package vlan missing?”, which it was. After installing it, it got more interesting:

root@box:~# ifup eth2.210
Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config
ifup: recursion detected for interface eth2 in parent-lock phase
ERROR: trying to add VLAN #210 to IF -:eth2:-  error: File exists
Cannot find device "eth2.210"
Failed to bring up eth2.210

A little poking around showed, that there’s no interface eth2.210 present in the system, but

root@box:~# cat /proc/net/vlan/config 
VLAN Dev name | VLAN ID
Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD
rename10       | 210  | eth2

root@box:~# ip l
[...]
10: rename10@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether DE:AD:BE:EF:23:42 brd ff:ff:ff:ff:ff:ff
[...]

Deleting the renameNN interface an running ifup again, just created a renameNN+1 interface with kernel log entries like

[7366672.699018] rename14: renamed from eth2.210

I suspected some bugs in /etc/network/if-{,pre-}up.d/ but this even happend, when manually running

ip link add link eth2 name eth2.210 type vlan id 210

or

ip link add link eth2 name vlan210 type vlan id 210

Dafuq?

Continue reading Beware of the details (and VLANs)