Linux – SDN Clinic

WireGuard and VRFs

With WireGuard having found its way into the Linux kernel and all major distributions out there, we’ve got a pretty powerful new card into our decks, that we can play when we’re in need of a VPN solution. The website emphasizes the aim for performance, simplicity, and avoiding headaches, which reality seems to mostly agree with.

As long as WireGuard required to build kernel modules, e.g. by using DKMS, I wasn’t very interested (as I’m not a fan of compilers present on my servers). However, as this need went away, I got curious.

Advanced Linux Routing – VRFs

Linux has been a first class networking citizen for quite a long time now. Every system running a Linux kernel out of the box has at least three routing tables and is supporting multiple mechanisms for advanced routing features – from policy based routing (PBR), to VRFs(-lite), and network namespaces (NetNS). Each of these provide different levels of separation and features, with PBR being the oldest one and VRFs the most recent addition (starting with Kernel 4.3).

This article is the third part of the Linux Routing series and will provide an overview on Virtual Routing and Forwarding (VRFs) and its applications. The first post in the series about Linux Routing Fundamentals covers the basics and plumbings of Linux routing tables, what happens when an IP packet is sent from or through a Linux box, and how to figure out why. The second post on Policy-based routing covers the concepts and plumbing of PBR on Linux. Both are a good read if you don’t feel familiar with these topics and provide knowledge about corner stones referenced here.

Advanced Linux Routing – Policy-based routing

Linux has been a first class networking citizen for quite a long time now. Every system running a Linux kernel out of the box has at least three routing tables and is supporting multiple mechanisms for advanced routing features – from policy based routing (PBR), to VRFs(-lite), and network namespaces (NetNS). Each of these provide different levels or separation and features, with PBR being the oldest one and VRFs the most recent addition (starting with Kernel 4.3).

This article is the second part of the Linux Routing series and will provide an overview on Policy-based routing (PBR) and its applications. The previous post about Linux Routing Fundamentals covers the basics and plumbings of Linux routing tables, what happens when an IP packet is sent from or through a Linux box, and how to figure out why. It is a good read if you don’t feel familiar with these topics. Posts about VRFs and Network Namespaces will follow.

Linux Routing Fundamentals

Linux has been a first class networking citizen for quite a long time now. Every system running a Linux kernel out of the box has at least three routing tables and is supporting multiple mechanisms for advanced routing features from policy based routing (PBR), to VRFs(-lite), and network namespaces (NetNS). Each of these provide different levels or separation and features, with PBR being the oldest one and VRFs the most recent addition (starting with kernel 4.3).

This article is the first part of the Linux Routing series and will provide an overview of the basics and plumbings of Linux routing tables, what happens when an IP packet is sent from or through a Linux box, and how to figure out why. It’s the baseline for future articles on PBR, VRFs, and NetNSes, their differences as well and applications.

Under the hood: The administrative state of Linux network interfaces

Recently we were wondering why node_exporter, in all the nice metrics it exposes about a Linux system, does not show if a Linux network interface is configured to be UP or DOWN, but only the operational state. So we started digging…

On the CLI, using iproute2 tooling, the operational state is shown explicitly, for example in the 2nd column in the following output:

$ ip -br l

lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0             DOWN           aa:bb:cc:dd:ee:ff <NO-CARRIER,BROADCAST,MULTICAST,UP>
wlan0            UP             00:08:15:ab:cd:ef <BROADCAST,MULTICAST,UP,LOWER_UP> 
eth1             DOWN           01:31:17:00:47:11 <BROADCAST,MULTICAST>
ffho-ops         UNKNOWN         <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP>

If you look closely you can see that the administrative state is encoded within the last column, namely it’s up if the keyword “UP” is part of the list, and down otherwise.

We started digging through /sys/class/net/* but didn’t find any entry which seemed to correspond to the administrative state of the interface. Digging further the flags caught my eye and playing with an interface revealed that the last bit seemed to indicate if the interface should be UP or DOWN.

While crafting a small PR for node_exporter I dug further to figure out why that is. The first stop was the Linux Kernel cross reference, which revealed the flags seem to stem from BSD. Searching for those yielded the netdevice(7) man page containing a definition for the flags:

SIOCGIFFLAGS, SIOCSIFFLAGS
    Get or set the active flag word of the device.
    ifr_flags contains a bit mask of the following values:

    Device flags
    IFF_UP            Interface is running.
    IFF_BROADCAST     Valid broadcast address set.
    IFF_DEBUG         Internal debugging flag.
    IFF_LOOPBACK      Interface is a loopback interface.
    IFF_POINTOPOINT   Interface is a point-to-point link.
    IFF_RUNNING       Resources allocated.
    ...

Now it all makes sense, and hopefully soon everyone can just check the adminstate of Linux networking interfaces in Prometheus 🙂

Update (May 2023): The PR has been merged.

Influencing Linux Source Address Selection on routes installed by bird and FRR

The use of dynamic routing protocols – mainly IS-IS, OSPF and BGP – is quite common in contemporary networks, even on the host networking stacks. In some situations it is desirable to not only control the path packets will take to any given destination, but also the source address of locally sourced traffic.

By default Linux systems will use the primary IP address of the egress interfaces, which has global scope and has the same address family of the flow in question. This decision can be overridden per destination by setting the src attribute of a route to a specific locally configured IP address, for example:

ip route add 2001:db8:0815::/48 via 2001:db8:1::1 src 2001:db8::42

For routes installed by a routing daemon this has to happen inside the routing daemon, so the the NETLINK call will know about the source address to set. For bird this is rather straight forward, for FRR it took me – and apparently others – a bit of time to find the right knobs, so I’ll document both ways here for future me and – likely – present you looking for it 🙂

bird

In bird that’s fairly simple and can be done via the export filter of protocol kernel. In bird 1 this could look like this:

define LO_IP = 192.0.2.42;

protocol kernel {
    scan time 20;
    import none;
    export filter {
        # <Apply any required filtering here/>

        # Set src attr of all routers installed in FIB to LO_IP
        krt_prefsrc = LO_IP;
        accept;
    };
}

For bird 2 this has to happen inside the address family specific configuration:

define LO_IP = 2001:db8::42;

protocol kernel {
    scan time 20;

    ipv6 {
        import none;
        export filter {
            # <Apply any required filtering here/>

            # Set src attr of all routers installed in FIB to LO_IP
            krt_prefsrc = LO_IP;
            accept;
        };
}

FRR

For FRR the configuration is a little bit more involved and requires a route-map to be applied to the protocols the routes are learned from! So if you want to set the source address for routes learned from OSPF this would look like this:

route-map set-loopback-src-ip permit 1
 set src 192.0.2.42
!
ip protocol ospf route-map set-loopback-src-ip

If you want to filter on which prefixes this applies, this could be done by adding a prefix list into the route-map, e.g.

route-map set-loopback-src-ip permit 1
 match ip address prefix-list YOUR_PREFIX_LIST
 set src 192.0.2.42

If you run OSPF + BGP and want the source address to be set for both protocols, you need to add an additional line to the config example from above:

ip protocol bgp route-map set-loopback-src-ip

MPLS Lab – Playing with static LSPs and VRFs on Linux

At DENOG13 I held a workshop Fun with PBR, VRFs and NetNS on Linux (in German) where I showcased forwarding IP packets within a VRF via static MPLS LSPs. I’ve been asked to publish the configuration for this lab, so here we are 🙂

Consider the following topology consisting of a core ring build with 5 routers, a border router (br-01) connected to core router E (cr-E) as well as to the Internet. All routers take part of OSPF area 0 and run iBGP with br-01 as the route reflector which is providing a default route. This is the same setup used for most of the FrOSCon Network Track.

Conditional SLAAC with bird – Only advertise default route if one is present on the router

When people want to deploy a Linux box as a 1st hop router in an IPv6 network RAdvd usually is the first choice for sending out Router Advertisments. The configuration is rather straight forward and on most distributions extensive example configuration files exist. Once configured RAdvd will silently do it’s job and happilly send out periodic RAs and answer any Router Solicitations, even thought the router itself may not have any network connectivity (anymore).

While designing the Freifunk Hochstift Backbone a while back I was looking for a way to only send out RAs with a default route if the router actually has one itself to prevent clients from getting broken IPv6 connectivity (and people complaining).

I found that the Bird Internet Routing Daemon – which already was in use for OSPF/BGP – also provides a ravd protocol which does what I wanted using a trigger. The following configuration will keep sending out RAs but will zero the router lifetime, which will allow clients to configure an IP address from this prefix but not use this router as default gateway, which is OK for my particular setup.

protocol radv {
    # ONLY advertise prefix, IF default route is available
    import all;
    export all;
    trigger ::/0;

    interface "vlan0815" {
        default lifetime 600 sensitive yes;

        prefix 2001:db8:23:42::/64 {
        preferred lifetime 3600;
    };
}

This is the default behaviour of the radv trigger but it can be fine tuned to your requirements:

trigger prefix

RAdv protocol could be configured to change its behavior based on availability of routes. When this option is used, the protocol waits in suppressed state until a trigger route (for the specified network) is exported to the protocol, the protocol also returns to suppressed state if the trigger route disappears. Note that route export depends on specified export filter, as usual. This option could be used, e.g., for handling failover in multihoming scenarios.

During suppressed state, router advertisements are generated, but with some fields zeroed. Exact behavior depends on which fields are zeroed, this can be configured by sensitive option for appropriate fields. By default, just default lifetime (also called router lifetime) is zeroed, which means hosts cannot use the router as a default router. preferred lifetime and valid lifetime could also be configured as sensitive for a prefix, which would cause autoconfigured IPs to be deprecated or even removed.

The radv protocol has a long list of knobs and switches for fine tuning, see the bird 1.6 and 2.0 documentation for further details.

OpenVPN and VRFs

Using VRFs on Linux enables a whole new set of network setups.

VRFs allow to isolate different interfaces at layer 3 (see the Advanced Linux Routing – VRFs article for details). In the Freifunk Hochstift network we chose to consider the main or default VRF as the internal network and move any internet facing interfaces into an external VRF. Following this concept allows, to safely contain traffic within the internal network and only at designated border routers leak eligible traffic into the internet.

In an distributed environment like the Freifunk Hochstift network, it is inevitable to connect different islands using VPN tunnels over the Internet. This could be done by the means of GRE tunnels as shown in the previous article about the border routers, or by means of encrypted VPNs like OpenVPN, IPsec or Wireguard. As the old infrastructure quite heavily relied on OpenVPN tunnels, and they worked quite well, the new setup should keep this building block in place.

(Vlan-aware) Bridges on Linux

Lately I’ve had some conversations about how Linux sucks at bridging tagged VLANs into VMs, which just isn’t true anymore.

With recent Kernels Linux bridges have become vlan-aware and now allow configuring any bridge port like a port of any decent network switch with respect to 802.1q VLANS. A port can present a VLAN as untagged traffic as well as a number of VLANs in tagged mode. As can be expected, SVIs can be configured as vlan interfaces of a bridge, too.

The old brctl utility has been integrated within the iproute suite as part of of ip link. The commands map as follows:

brctl addbr br0
ip link add br0 type bridge
    [ forward_delay FORWARD_DELAY ]
    ...
    [ vlan_filtering VLAN_FILTERING ]
    [ vlan_default_pvid VLAN_D_PVID ]
    ...
    [ nf_call_iptables NF_CALL_IPT ]
    [ nf_call_ip6tables NF_CALL_IP6TABLES ]
    [ nf_call_arptables NF_CALL_ARPTABLES ]


brctl addif br0 eth0
ip link set eth0 master br0

Continue reading (Vlan-aware) Bridges on Linux