Anycasted services with Debian, bird, anycast-healthchecker and Cisco Nexus 7000

A while ago we started getting alerts, that one of our Kerberos KDCs had problem with the Kerberos database replication. A little digging revealed, that the problems are caused by load spikes on the KDC which were the result of a burst of legitimate queries fired by some systems we didn’t have much control over. Additionally we found that the MIT Kerberos implementation queries all KDCs provided in the configuration file in sequential order, so the first KDC get’s nearly all queries. While thinking about load balancing solutions, quickly anycast came to mind, so we decided to set it up. Anycast leverages the Equal Cost Multipath Routing (ECMP)  capability of common routers to distribute traffic to multiple next-hops for the same destination.

The solution consists of three corner stones:

  1. anycast-healtchecker as a means to check service availability
  2. bird as a BGP speaker on the KDCs and route reflectors
  3. Data center routers (Cisco Nexus 7010) speaking BGP to the route reflectors

The topology is as follows:

Topology KDCs

Continue reading Anycasted services with Debian, bird, anycast-healthchecker and Cisco Nexus 7000

Beware of the details (and VLANs)

A friend today challenged me with this problem on an Ubuntu box:

auto eth2
iface eth2 inet static
    address 10.23.0.32
    netmask 31

auto eth2.210
iface eth2.210 inet static
    address 10.42.0.32
    netmask 31

root@box:~# ifup eth2.210
Cannot find device "eth2.210"
Failed to bring up eth2.210.

The first thing coming to mind is “package vlan missing?”, which it was. After installing it, it got more interesting:

root@box:~# ifup eth2.210
Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config
ifup: recursion detected for interface eth2 in parent-lock phase
ERROR: trying to add VLAN #210 to IF -:eth2:-  error: File exists
Cannot find device "eth2.210"
Failed to bring up eth2.210

A little poking around showed, that there’s no interface eth2.210 present in the system, but

root@box:~# cat /proc/net/vlan/config 
VLAN Dev name | VLAN ID
Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD
rename10       | 210  | eth2

root@box:~# ip l
[...]
10: rename10@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether DE:AD:BE:EF:23:42 brd ff:ff:ff:ff:ff:ff
[...]

Deleting the renameNN interface an running ifup again, just created a renameNN+1 interface with kernel log entries like

[7366672.699018] rename14: renamed from eth2.210

I suspected some bugs in /etc/network/if-{,pre-}up.d/ but this even happend, when manually running

ip link add link eth2 name eth2.210 type vlan id 210

or

ip link add link eth2 name vlan210 type vlan id 210

Dafuq?

Continue reading Beware of the details (and VLANs)