Deploying a Freifunk Hochstift backbone POP with Netbox Scripts

Some weeks ago Network to Code held the first (virtual) Netbox Day (YouTube playlist, Slides repo on github). John Anderson gave a great NetBox Extensibility Overview and introduced me to Netbox Scripts (Video, Slide deck, Slide 28) which allow to add custom Python code to add own procedures to netbox. I was hooked. About three to four hours of fiddling, digging through the docs, and some hundred lines of Python later I had put together a procedure to provision a complete Freifunk Hochstift Backbone POP within Netbox according to our design. I’m going to share my proof of concept code here and walk you through the key parts of the script.

Netbox scripts provide a great and really simple interface to codify procedures and design principles which apply to your infrastructure and fire up complex network setups within netbox by just entering a set of config parameters in a form like the following and a click of one button.

So first write down what information is needed for your procedure. For provisioning a new site within the Freifunk Hochstift network I need:

The site (which has to be predefined in netbox)
The number of this site (mgmt VLAN and prefix will be derived from this)
The name of the rack to be created
The number of ports connected on the patch panel
The setup of poles and cables on the roof
The asset tag of the backbone router
The asset tag of the switch
The node id of the backbone router

It would also be possible to create the site with all needed information (name, slug, description, GPS coordinates, …) from within the same form. As we are migration to netbox and all sites already have been migration I chose to work with a site drop-down menu. While writing this, I though about making the site number a custom attribute of a site in a future iteration as it’s supposed to be statically allocation to a site anyway.

Setting up the input form

Let’s get to code! As we know what data we need, we can create a Python file for our procedude. It has to be located in /etc/netbox/scripts/ (unless configured otherwise); mine is called ProvisionBackbonePOP.py. You need to define a class which inherits from Script and defines the values and types which are required for your procedure. The part defining the variables of ProvisionBackbonePOP looks like the following. See the netbox docs for a reference on what data types are available here.

class ProvisionBackbonePOP (Script):
   class Meta:
       name = "Provision Backbone POP"
       description = "Provision a new backbone POP"
       field_order = ['site', 'site_no', 'rack_name', 'rack_units', 'panel_ports', 'pole_setup']
       commit_default = False

    # Drop down for sites
    site = ObjectVar (
        description = "Site to be deployed",
        queryset = Site.objects.all ()
    )

    # Site No.
    site_no = IntegerVar (description = "Site number (for Mgmt VLAN + prefix)")

    # Rack name
    rack_name = StringVar (description = "Name of the rack")

    # Rack units
    rack_units = IntegerVar (description = "Number of units of this rack")

    # BBR Asset Tag
    bbr_asset_tag = StringVar (description = "Asset tag of backbone router")

    # Switch asset tag
    sw_asset_tag = StringVar (description = "Asset tag of switch")

    # Panel ports
    panel_ports = IntegerVar (description = "Number of port on the patch panel (if 19\")")

    # Pole setup
    pole_setup = StringVar (description = "Space separated list of &lt;pole no&gt;:&lt;num_cables&gt;")

    # BBR ID
    node_id = IntegerVar (description = "Node ID of BBR")

Ok, we have a nicely generated form with all values required for our procedure.

Let’s the magic begin

The next integral part is the run (self, data, commit) method of your class. It will be called when the “Run Script” button has been clicked. data is a dict holding all form items, and commit indicates if changes shall be commited or if a dry run is happening. In my case I didn’t care about the latter as netbox just handles the database transaction side and will abort or commit the transaction accordingly.

def run (self, data, commit):
    site = data['site']
    site_no = data['site_no']
    rack_name = data['rack_name']
    rack_units = data['rack_units'] 
    panel_ports = data['panel_ports']
    pole_setup = data['pole_setup']
    sw_asset_tag = data['sw_asset_tag']
    bbr_asset_tag = data['bbr_asset_tag']
    node_id = data['node_id']

    # Set up POP Mgmt VLAN
    vlan = self.create_mgmt_vlan (site, site_no)

    # Mgmt prefix
    prefix = self.create_mgmt_prefix (site, site_no, vlan)

    # Create rack
    rack = self.create_rack (site, rack_name, rack_units)

    # Create patch panel
    pp = self.create_patch_panel (site, rack, rack_name, panel_ports)

    self.create_and_connect_surges (site, rack, pp, pole_setup)

    # Create switch
    sw = self.setup_swtich (site, rack, pp, panel_ports, vlan, site_no, sw_asset_tag)

    # Create backbone router
    bbr = self.setup_bbr (site, rack, vlan, site_no, node_id, bbr_asset_tag, sw)

As you can see, I split the code up into several methods to achieve smaller parts of the procedure. Mine are defined within the same class and file but they could be easily moved into a own library of procedures or parts thereof. The following section will cover the above methods in detail.

Creating the management VLAN and prefix

So what’s happening here? At first we create a management VLAN for the given site. According to our policy a management VLAN has a unique ID which is 3000 + the site number and is called “Mgmt <site>” and has the role “Mgmt”. Codified in Python this looks like the following:

def create_mgmt_vlan (self, site, site_no):
    vlan_id = 3000 + int (site_no)
    try:
        vlan = VLAN.objects.get (site = site, vid = vlan_id)                  self.log_info ("Mgmt vlan %s already present, carrying on." % vlan)
        return vlan
    except VLAN.DoesNotExist:
        pass

    vlan = VLAN (
        site = site,
        name = "Mgmt %s" % site.name,
        vid = vlan_id,
        role = Role.objects.get (name = 'Mgmt')
    )

    vlan.save ()
    self.log_success ("Created mgmt VLAN %s" % vlan)

    return vlan

Writing this method as an idempotent function help while developing the whole script as items already created wouldn’t raise an error. The vlan object is returned by the method as it’s of use in creating the prefix and setting up ports of the switch. Creating the management prefix which will be 172.30.<site number> .0/24 and should be assigned to this vlan looks like this:

def create_mgmt_prefix (self, site, site_no, vlan):
    prefix_cidr = "172.30.%d.0/24" % site_no
    try:
        prefix = Prefix.objects.get (prefix = prefix_cidr)
        self.log_info ("Mgmt prefix %s already present, carrying on." % prefix)

        return prefix
    except Prefix.DoesNotExist:
        pass

    prefix = Prefix (
        site = site,
        prefix = prefix_cidr,
        vlan = vlan,
        role = Role.objects.get (name = 'Mgmt')
    )

    prefix.save ()
    self.log_success ("Created mgmt prefix %s" % prefix)

    return prefix

Creating the rack

The rack is set up by create_rack (self, site, name, units) creating a planned 9RU wall mounted cabinet with 19″ width at the given site with the given name and given units and role Backbone. The code can be found on github. As we are about to create a number of devices, some within the rack we just created, the rack object is returned as well.

Creating the patch panel and outdoor cabling

Creating the patch panel itself is similar to above, create_patch_panel (self, site, rack, rack_name, num_ports) will create a planned device of type and role Patchpanel, in the given rack at the given site and mount it on the top RU facing the front of the rack (code). After the device is created, the given number of ports will be added as front as well as rear ports of type 8P8C and connected one on one.

# Create front and rear ports
for n in range (1, int (num_ports) + 1):
    rear_port = RearPort (
        device = pp,
        name = str (n),
        type = PortTypeChoices.TYPE_8P8C,
        positions = 1
    )
    rear_port.save ()

    front_port = FrontPort (
        device = pp,
        name = str (n),
        type = PortTypeChoices.TYPE_8P8C,
        rear_port = rear_port,
       rear_port_position = 1,
    )
    front_port.save ()

Now this is getting even cooler. We can create interfaces in a loop and don’t have to add them one by one! Let’s connect outdoor surge protectors to the rear ports of the patch panel while we are at it! This code leverages the predefined device type for the surge protector with one front and back port coming with it.

[...]

pp_port = 1
for pole_config in pole_setup.split ():
    pole_no, num_surges = pole_config.split (':')

    for n in range (1, int (num_surges) + 1):
        # Create surge
        surge_name = "sp-%s-mast%s-%s" % (site.slug.lower (), pole_no, n)
        surge = Device (
            device_type = surge_type,
            device_role = surge_role,
            name = surge_name,
            status = DeviceStatusChoices.STATUS_PLANNED,
           site = site
        )

       surge.save ()

       # Link RearPort of SP to next free panel port
       cable = Cable (
           termination_a = RearPort.objects.get (device = pp, name = str (pp_port)),
           termination_b = RearPort.objects.get (device = surge, name = str (1)),
           status = CableStatusChoices.STATUS_PLANNED
     )

    cable.save ()
    self.log_success ("Created surge protector %s and linked it to patch panel port %s." % (surge, pp_port))

    pp_port += 1

As a result all rear ports of the patch panel should be connected to a rear port of an outdoor surge protector which is labeled as Mast<num>-<num> (Mast being german for pole) so the mapping of fixed cabling from the rack to the roof is clearly visible and documented. The results will look like the following (where the switch ports are also connected to the panel already).

Setting up the switch

The switch is created from a predefined device type populated with all interfaces etc. as well and the first num_ports interfaces are connected to the front ports of the patch panel as well. So far nothing new. According to our design the last two copper ports will be bundled into a LAG and configured as a trunk carrying all VLANs (tagged all in netbox speach). All unused ports are to be disabled. The management IP of the switch will always be mgmt prefix + 10 and can be configured automatically as well. The code can be found here and the result looks like this:

Setting up the backbone router

The backbone router will be created from a predefined device type as well, having it’s first two ports bundled into a LAG and connected to the switch ports we want them to. As each backbone router is a router, it will also have it’s lookback IPs configured on the lo interface and as it will become the router for the management network of the POP an SVI for the management VLAN with the first IP from the management prefix. The result looks like this:

Device types, roles, etc.

All our devices types and roles, platforms, etc. are available within the NACL repository on github. I published some of them to the netbox devicetype-library on github, too. The Netonix WS-12-120-AC has already been merged 🙂

If you happen to use some hardware which is not already available in the netbox devicetype-library, please consider publishing your devices types there, too.

Conclusions and future work

Investing three to four hours of engineering time, playing around with the possibilities of Netbox Scripts resulting in 450 lines of Python code allowed provisioning a whole Freifunk Hochstift backbone POP within 1.681s. This could never be achieved when adding all devices manually to netbox. With the click of one butten we created one VLAN and prefix, one rack and 11 devices and connected everything which needs to be connected (with planned cables!).

When all devices, interfaces and IPs are provisioned in netbox, our automation (Salt Stack) will be used to deploy the configuration to the backbone router. Generating / pushing the configuration to the Netonix switches is on our agenda, too.

All of this could also have been done by an external tool in the language of your choice using the netbox API and basically to the same things and calling API endpoints with similar names. This way I don’t have to write code to handle HTTP requests, care about API tokens and exposing the API to my OPS team and still have a nice front end I can give to fellow operators who don’t even need to know all the gory details on how to create all puzzle pieces in netbox and putting them all together. All they need to know are nine variables as of today.

Those could easily be stripped down to six as

the site number, will become a custom attribute of the site
the rack name will mostly be R1 (although some exceptions exist)
the “standard issue rack” has 9 RU (although some exceptions exist, but aren’t handle in the current code yet anyway)