A while ago we started getting alerts, that one of our Kerberos KDCs had problem with the Kerberos database replication. A little digging revealed, that the problems are caused by load spikes on the KDC which were the result of a burst of legitimate queries fired by some systems we didn’t have much control over. Additionally we found that the MIT Kerberos implementation queries all KDCs provided in the configuration file in sequential order, so the first KDC get’s nearly all queries. While thinking about load balancing solutions, quickly anycast came to mind, so we decided to set it up. Anycast leverages the Equal Cost Multipath Routing (ECMP) capability of common routers to distribute traffic to multiple next-hops for the same destination.
The solution consists of three corner stones:
- anycast-healtchecker as a means to check service availability
- bird as a BGP speaker on the KDCs and route reflectors
- Data center routers (Cisco Nexus 7010) speaking BGP to the route reflectors
The topology is as follows: