Advanced Linux Routing – Policy-based routing

Linux has been a first class networking citizen for quite a long time now. Every box running a Linux kernel out of the box has at least three routing tables and is supporting multiple mechanisms for advanced routing features from policy based routing (PBR), to VRFs(-lite), and network namespaces (NetNS). Each of these provide different levels or separation and features, with PBR being the oldest one and VRFs the most recent addition (starting with Kernel 4.3).

This article is part of the Linux Routing series and will provide an overview on Policy-based routing (PBR) and its applications. The previous post about Linux Routing Fundamentals covers the basics and plumbings of Linux routing tables, what happens when an IP packet is sent from or through a Linux box, and how to figure out why. It is a good read if you don’t feel familiar with these topics. Posts about VRFs and Network Namespaces will follow.

Policy-Based Routing

Policy-based routing (PBR) is a pretty powerful feature of the Linux network stack and has been around for a long time, being available since Linux 2.2 released in 1999.

Commonly a routing decision is made solely based on the destination IP address of a packet, which generally is enough. In cases where plain routing isn’t enough PBR comes in. It allows influencing the routing decision depending on a number of things, including but not limited to the packets source address or source/destination port, the ingress interface the packet has been received on, the user who originated the packet, or an fwmark value and thereby anything that can be matched by netfilter.

PBR works by specifying rules, which are evaluated in ascending order or their preference. PBR rules can be shown and manipulated using the ip rule command from the iproute2 package. See ip rule help or man ip-rule for a grammar and in-depth explanation of the possibilities.

On every Linux system there’s a default rule set containing rules for the local, main, and default tables – see the previous post on Linux Routing Fundamentals for more information about Linux routing tables and these three in particular. The following output shows the default Linux PBR ruleset:

$ ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default

The basic idea is that you can match packets based on some characteristics and instruct the Kernel to route them using a different routing table. This allows to specify different routes for the same destination(s) depending on said characteristics.

Note: If not specified otherwise custom PBR rules are inserted before above default rules, so they can add special handling for desired situations. That also means that if no custom rule matches or no match is found in the routing tables specified by the custom PBR rules, Linux will continue evaluating further rules including the default ones. There might be situations where you want to make sure a custom routing table contains a default route, even a blackhole one, so specific traffic does not reach the main table.

PBR Applications

Before the rise of DSL and FTTx, Internet connections were pretty expensive and bandwidth scarce. Maybe even today a site has multiple uplinks which differ in bandwidth, latency, or other properties and you want to steer different traffic over different links.

Even today setups with multiple uplinks are still common and sometimes provide challenges, especially when you don’t own IP space and must work with the IPs assigned from the ISP(s).

PBR by destination port

Some simple examples could be to route web traffic over the cheap, high bandwidth link with higher latency and use the more expensive one for the rest. Rules to route web traffic – identified by destination port 80 and 443 – using routing table 80 could look like this:

ip rule add dport 80 table 80
ip rule add dport 443 table 80

Routing table 80 would then need to exist – e.g. by adding routes to it – and have a default route over the cheap uplink, for example. Assuming the cheap uplink is on eth1 and the next-hop IP for that routes is 192.168.178.1 the route could be added like this, with dev eth1 being optional:

# ip route add default via 192.168.178.1 dev eth1 table 80

Note that this only influences the egress path, meaning packets leaving our network – thereby having destination port 80 or 443 – and responses will be routed regularly, using the local or main table (unless the source port is forced to be 80 or 443 too).

PBR by source address

Want to route traffic from (a) specific machine(s) differently? Or in a setup with two uplinks, want to make sure that traffic from the IP assigned from ISP A uses the uplink from ISP A?

You can make all traffic with a given source address/prefix use a separate routing table like this:

ip rule add from 192.168.178.0/25 table 178

PBR by source interface

Similar to the above example, it might be handy to route all traffic coming in on a specific interface differently. This could be done like this:

# ip rule add iif eth2 table 23

PBR drawbacks

As nice and versatile policy-based routing is, it also has some drawbacks when used in production environments, especially when you don’t control the traffic passing through.

When trying to separate traffic flows and route them using different paths, you can only match for certain attributes of the expected flows and any ICMP errors happening, e,g. fragmentation needed, destination unreachable, etc., will most likely fail to match. As a result they tend to fall through and are routed using a later rule, commonly the main table, and may end up on an interfaces/uplink where they never should never. In a worst case scenario these packets could leak to the Internet, although they should have stayed internal, and your ISP might raise an eyebrow or starts asking questions.

In dual-stack networks, which are common these days, you need to remember adding rules for both address families and be sure to also match for IPv6 link-local addresses, as some traffic might be local. Ideally you can create rules based on incoming interfaces, when trying to separate things, however that still might not catch the ICMP errors case above. In these cases VRFs might be what you’re looking for.

Leave a Reply