NetBox scripts – now with clean error handling

Recently I wrote a scripts which aids in provisioning things inside our network, which will do some sanity checks and if all is good set up a number of things (prefixes, IPs, sub-interfaces, IPs on them, etc.). The reason to do this inside a script is to do this in an atomic operation, so either the full provision process is done, or nothing is changed at all.

If the sanity checks fail (invalid input, trying to create something with overlapping resources, etc.) the script should fail and ideally report a clear error message one what was wrong, which should be reported to the caller (via API).

Now I was looking into exiting the script on an error and only found the option to throw any Exception which will produce a stack trace of all internal Exceptions which had been caught and handled. There is a AbortTransaction Exception, which allows to terminate the script and thereby the DB transaction, but it was not designed to carry an error message.

Looking into the code it seemed like adding support to gracefully abort a would be rather straight forward to add, so I did (issue, PR). Today the PR got merged and NetBox v3.4.4 (and later) includes the AbortScript exception to elegantly abort scripts, which you can use like this:

from utilities.exceptions import AbortScript

if some_error:
     raise AbortScript("Some meaningful error message")

Insights into Network Automation with Go

A great new book on Network Automation with Go just dropped recently and if you want to get into automation parts of your network or wants to start doing so with Go, it’s definitely for you!

Network Automation with Go book

It contains a lot of background on the Golang programming language, its concepts and how to use them to build reliable, scalable, testable, and observable applications. The authors also discuss Network Automation and configuration management approaches in general, and dig into APIs and network monitoring.

Continue reading Insights into Network Automation with Go

Under the hood: The administrative state of Linux network interfaces

Recently we were wondering why node_exporter, in all the nice metrics it exposes about a Linux system, does not show if a Linux network interface is configured to be UP or DOWN, but only the operational state. So we started digging…

On the CLI, using iproute2 tooling, the operational state is shown explicitly, for example in the 2nd column in the following output:

$ ip -br l

lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0 DOWN aa:bb:cc:dd:ee:ff <NO-CARRIER,BROADCAST,MULTICAST,UP>
wlan0 UP 00:08:15:ab:cd:ef <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1 DOWN 01:31:17:00:47:11 <BROADCAST,MULTICAST>
ffho-ops UNKNOWN <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP>

If you look closely you can see that the administrative state is encoded within the last column, namely it’s up if the keyword “UP” is part of the list, and down otherwise.

We started digging through /sys/class/net/* but didn’t find any entry which seemed to correspond to the administrative state of the interface. Digging further the flags caught my eye and playing with an interface revealed that the last bit seemed to indicate if the interface should be UP or DOWN.

While crafting a small PR for node_exporter I dug further to figure out why that is. The first stop was the Linux Kernel cross reference, which revealed the flags seem to stem from BSD. Searching for those yielded the netdevice(7) man page containing a definition for the flags:

SIOCGIFFLAGS, SIOCSIFFLAGS
Get or set the active flag word of the device.
ifr_flags contains a bit mask of the following values:

Device flags
IFF_UP Interface is running.
IFF_BROADCAST Valid broadcast address set.
IFF_DEBUG Internal debugging flag.
IFF_LOOPBACK Interface is a loopback interface.
IFF_POINTOPOINT Interface is a point-to-point link.
IFF_RUNNING Resources allocated.
...

Now it all makes sense, and hopefully soon everyone can just check the adminstate of Linux networking interfaces in Prometheus 🙂

Influencing Linux Source Address Selection on routes installed by bird and FRR

The use of dynamic routing protocols – mainly IS-IS, OSPF and BGP – is quite common in contemporary networks, even on the host networking stacks. In some situations it is desirable to not only control the path packets will take to any given destination, but also the source address of locally sourced traffic.

By default Linux systems will use the primary IP address of the egress interfaces, which has global scope and has the same address family of the flow in question. This decision can be overridden per destination by setting the src attribute of a route to a specific locally configured IP address, for example:

ip route add 2001:db8:0815::/48 via 2001:db8:1::1 src 2001:db8::42

For routes installed by a routing daemon this has to happen inside the routing daemon, so the the NETLINK call will know about the source address to set. For bird this is rather straight forward, for FRR it took me – and apparently others – a bit to find the right knobs, so I’ll document both ways here for future me and others looking for it 🙂

bird

In bird that’s fairly simple and can be done via the export filter of protocol kernel. In bird 1 this could look like this:

define LO_IP = 192.0.2.42;

protocol kernel {
scan time 20; # Scan kernel routing table every 20s
import none;
export filter {
# <Apply any required filtering here/>

# Set src attr of all routers installed in FIB to LO_IP
krt_prefsrc = LO_IP;
accept;
};
}

For bird 2 this has to happen inside the address family specific configuration:

define LO_IP = 2001:db8::42;

protocol kernel {
scan time 20; # Scan kernel routing table every 20s

ipv6 {
import none;
export filter {
# <Apply any required filtering here/>

# Set src attr of all routers installed in FIB to LO_IP
krt_prefsrc = LO_IP;
accept;
};
}

FRR

For FRR the configuration is a little bit more involved and requires a route-map to be applied to the protocols the routes are learned from! So if you want to set the source address for routes learned from OSPF this would look like this:

route-map set-loopback-src-ip permit 1
set src 192.0.2.42
!
ip protocol ospf route-map set-loopback-src-ip

If you want to filter on which prefixes this applies, this could be done by adding a prefix list into the route-map, e.g.

match ip address prefix-list YOUR_PREFIX_LIST

If you run OSPF + BGP and want the source address to be set for both protocols, you need to add an additional line to the config example from above:

ip protocol bgp route-map set-loopback-src-ip

Hardware platforms of a Freifunk network

This post is part of the series Building your own Software Defined Network with Linux and Open Source Tools and covers the hardware platforms used within the backbone network infrastructure.

In the early days into the project we didn’t have much funds but thankfully received quite some donations in terms of old hardware as well as money. As we were young and didn’t know what we know today, we went down quite some different roads, made lots of experiences along the way, eventually reaching the setup we have today. This posts lists most the platforms we used within the last years, basically only leaving out early wireless platforms and sponsored server machines.

As most Freifunk communities rely heavily on products from the portfolio of Ubiquiti Networks, quite some devices will be covered. In the following I will just call them ubnt.

Continue reading Hardware platforms of a Freifunk network

Modelling (C)WDM MUXes in NetBox/Nautobot – the universal way

A while ago we had a discussion in #DENOG on how to best model CWDM MUXes in NetBox/Nautobot, so they can be used to build a network topology, which can be leveraged for holistic automation, and are prepared for augments, repairs and network changes.

8Ch CWDM MUX

“Natural” way of modelling

The “natural” way of modelling for example an 8Ch CWDM MUX – as shown above – would be do create a DeviceType containing one RearPort with 8 position as well as 8 FrontPorts which map to one position each. So logically this would look like the following

Inside NetBox the Rear Port and Front Ports view of the DeviceType could look like this

One Rear Port with 8 positions
Eight Front Ports mapped to one position each

This model works absolutely, allows tracing connections through the MUXes etc. as long as we always use exactly the same MUX on both ends off the fiber for one WDM setup.

Limitations

If one MUX would decay over time and needs to be replaced by another model e.g. due to supply chain issues, which for example might have 10 channels and does not skip 1390 and 1410, the mapping would fail. This obviously would also be the case if for an expansion the setup should be changed to 18Ch MUXes and this could only be done one side after another.

Universal way of modelling

As the ITU CWDM standard only defines 18 CWDM channels in total with a fixed spacing of 20nm we can use one simple trick to overcome this limitation: Always define all 18 positions and only map the existing Front Ports to their associated position as shown in the following table.

PositionChannel
1
1270
21290
31310
41330
51350
61370
71390
81410
91430
101450
111470
121490
131510
141530
151550
161570
171590
181610

So the Front Ports of the MUX shown above would be mapped to positions 1-6, 9, and 10. This way all CWDM MUXes can be connected in NetBox/Nautobot and all channels which exist on both ends can be connected and traced through.

Eight Front Ports mapped to one their associate (existing) positions

What about DWDM?

For DWDM systems this obviously would be somewhat harder given the bigger amount of possible channel, especially when different spacing options (25, 50, 100GHz) are taken into account. It would be possible though to go for 160 positions and map the channels accordingly given strict adherence to a to be predefined map.

This is the way: Holistic approach on network automation

In the past months and years I’ve had a number of great discussion with a lot of fantastic networking people on how to (not) do network automation (at scale), what worked for them and what didn’t. At a smaller scale I made quite some experiences myself at previous roles as well as consulting engagements and in particular by building and operating the Freifunk Hochstift community ISP network plus it’s SDN. This article is the distilled result of those discussions and experiences and might be somewhat opinionated.

Network automation done right

Continue reading This is the way: Holistic approach on network automation

MPLS Lab – Playing with static LSPs and VRFs on Linux

At DENOG13 I held a workshop Fun with PBR, VRFs and NetNS on Linux (in German) where I showcased forwarding IP packets within a VRF via static MPLS LSPs. I’ve been asked to publish the configuration for this lab, so here we are 🙂

Consider the following topology consisting of a core ring build with 5 routers, a border router (br-01) connected to core router E (cr-E) as well as to the Internet. All routers take part of OSPF area 0 and run iBGP with br-01 as the route reflector which is providing a default route. This is the same setup used for most of the FrOSCon Network Track.

Topology of MPLS lab
Continue reading MPLS Lab – Playing with static LSPs and VRFs on Linux

Custom Links in Netbox – Shortcut to device WebUI/IPMI

Netbox allows to create Custom Links to a number of models for a while now. I’ve been reminded of that feature quite recently and follow the idea to have a direct link on the device page which leads to the management interface of that device.

In the Freifunk Hochstift environment we have a number of wireless backbone links which happen to have issues now and then and require some love in that case. As we use lots of Ubnt gear that love has to be applied via the Web interface of the device which is reachable via the management IP, which is configured as the primary IP of the device in Netbox.

Up to now this meant searching for the device in Netbox and copy the IP address of the device in the browsers URL bar and hit return. This can be achieved much nicer with a custom link which directly points to https://primary_ip and is accessible via a button on the device page.

This can be achieved by creating a Custom Link in Netbox (Customization -> Custom Link) for the appropriate Content Type, Link text + URL. As the intention is to add a Custom Link to wireless backbone devices, the appropriate Content Type is DCIM > device. Button class allows to configure the color of the button.

Using a little Jinja2 in the Link text it’s possible to only show this link on WBBL devices and certain switches: If the expression(s) given there renders as empty text the link will not be shown. My first instinct was to check for the device role and manufacturer:

{% if obj.device_role.slug == 'wbbl' or
      obj.device_type.manufacturer.slug == 'netonix'
%}
WebUI
{% endif %}

But checking for the platforms makes more sense as that’s the primary indicator for the devices which are managed via a web interface and they may be used in different roles (we have wireless local links too for example):

{% if obj.platform.name in [ 'AirOS', 'Netonix' ] %}
WebUI
{% endif %}

As shown above the device or “thing” referenced by the Content Type is represented via the obj object and it’s attributes are directly accessible. A look into the API representation or the Django model(s) of the device (or “thing” for that matter) may help to figure out the correct name.

What I did not find in the docs nor API nor model is that how you can access the IP part of an IP address, which in Netbox always carries the netmask. While searching I stumbled across an issue and learned that the following is possible:

https://{{ obj.primary_ip4.address.ip }}

Netbox also allows to group custom links within a drop down. All links sharing the same Group name will show up in a drop down named like the group.

With those pieces it’s easily possible to create custom links for special devices which point to the primary IP and provide the OPS team with a few clicks solution in case of trouble. Thanks to Tim for the hint 🙂

Conditional SLAAC with bird – Only advertise default route if one is present on the router

When people want to deploy a Linux box as a 1st hop router in an IPv6 network RAdvd usually is the first choice for sending out Router Advertisments. The configuration is rather straight forward and on most distributions extensive example configuration files exist. Once configured RAdvd will silently do it’s job and happilly send out periodic RAs and answer any Router Solicitations, even thought the router itself may not have any network connectivity (anymore).

While designing the Freifunk Hochstift Backbone a while back I was looking for a way to only send out RAs with a default route if the router actually has one itself to prevent clients from getting broken IPv6 connectivity (and people complaining).

I found that the Bird Internet Routing Daemon – which already was in use for OSPF/BGP – also provides a ravd protocol which does what I wanted using a trigger. The following configuration will keep sending out RAs but will zero the router lifetime, which will allow clients to configure an IP address from this prefix but not use this router as default gateway, which is OK for my particular setup.

protocol radv {
    # ONLY advertise prefix, IF default route is available
    import all;
    export all;
    trigger ::/0;

    interface "vlan0815" {
        default lifetime 600 sensitive yes;

        prefix 2001:db8:23:42::/64 {
        preferred lifetime 3600;
    };
}

This is the default behaviour of the radv trigger but it can be fine tuned to your requirements:

trigger prefix

RAdv protocol could be configured to change its behavior based on availability of routes. When this option is used, the protocol waits in suppressed state until a trigger route (for the specified network) is exported to the protocol, the protocol also returns to suppressed state if the trigger route disappears. Note that route export depends on specified export filter, as usual. This option could be used, e.g., for handling failover in multihoming scenarios.

During suppressed state, router advertisements are generated, but with some fields zeroed. Exact behavior depends on which fields are zeroed, this can be configured by sensitive option for appropriate fields. By default, just default lifetime (also called router lifetime) is zeroed, which means hosts cannot use the router as a default router. preferred lifetime and valid lifetime could also be configured as sensitive for a prefix, which would cause autoconfigured IPs to be deprecated or even removed.

The radv protocol has a long list of knobs and switches for fine tuning, see the bird 1.6 and 2.0 documentation for further details.