BGP Load Balancing

Posted by

I’ve been working with a few different customers lately regarding BGP Internet routing. One of the major requests customers are asking for is BGP load balancing.  Although BGP was designed with redundancy in mind it wasn’t designed with load balancing in mind.  However, there are ways to create a load balancing configuration, that although not perfect, works well. I explain the details below.

I’m making a few assumptions here in this example:

  • You are peering upstream via BGP with two different service providers
  • You are pulling full Internet tables via BGP
  • You have your own autonomous system (AS) number
Here is the topology we are using in this example:

Base Configuration

AS123 is “us”, AS 4 and AS5 are our two upstream service providers, and AS6 is the “Internet”.  We will be advertising our block 123.1.0.0/16 upstream and we will be receiving the set of routes from the “Internet” starting at 7.0.0.0/8 and ending at 12.0.0.0/8.  The routes from R6 to R5 are prepended with a few additional AS paths. This is to simulate connecting to two upstream providers with different paths to the routes advertised by R6. This causes R3 and R1 to prefer the R2->R4 to get to the R6 routes. We’ve also prepended additional AS path entries on the routes from R3 to R5 to also simulate different paths to the Internet, causing R6 to prefer the R6->R4 path for routes to the 123.1.0.0/16 network.

The R2-R4-R6 path is preferred in both directions from AS123 to AS6.

R2 BGP:

R2#sh bgp
BGP table version is 18, local router ID is 123.1.1.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 7.0.0.0          4.4.24.4                               0 4 6 i
*> 8.0.0.0          4.4.24.4                               0 4 6 i
*> 9.0.0.0          4.4.24.4                               0 4 6 i
*> 10.0.0.0         4.4.24.4                               0 4 6 i
*> 11.0.0.0         4.4.24.4                               0 4 6 i
*> 12.0.0.0         4.4.24.4                               0 4 6 i
*> 123.1.0.0/16     0.0.0.0                  0         32768 i
* i                 123.1.1.3                0    100      0 i
*>i123.1.2.0/24     123.1.1.1                0    100      0 i
*>i123.1.3.0/24     123.1.1.1                0    100      0 i

 

R3 BGP:

R3#sh bgp
BGP table version is 40, local router ID is 123.1.1.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i7.0.0.0          123.1.1.2                0    100      0 4 6 i
*                   5.5.35.5                               0 5 6 6 6 6 i
*>i8.0.0.0          123.1.1.2                0    100      0 4 6 i
*                   5.5.35.5                               0 5 6 6 6 6 i
*>i9.0.0.0          123.1.1.2                0    100      0 4 6 i
*                   5.5.35.5                               0 5 6 6 6 6 i
*>i10.0.0.0         123.1.1.2                0    100      0 4 6 i
*                   5.5.35.5                               0 5 6 6 6 6 i
*>i11.0.0.0         123.1.1.2                0    100      0 4 6 i
*                   5.5.35.5                               0 5 6 6 6 6 i
*>i12.0.0.0         123.1.1.2                0    100      0 4 6 i
*                   5.5.35.5                               0 5 6 6 6 6 i
* i123.1.0.0/16     123.1.1.2                0    100      0 i
*>                  0.0.0.0                  0         32768 i
*>i123.1.2.0/24     123.1.1.1                0    100      0 i
*>i123.1.3.0/24     123.1.1.1                0    100      0 i

R6 BGP:

R6#sh bgp
BGP table version is 20, local router ID is 6.6.6.6
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 7.0.0.0          0.0.0.0                  0         32768 i
*> 8.0.0.0          0.0.0.0                  0         32768 i
*> 9.0.0.0          0.0.0.0                  0         32768 i
*> 10.0.0.0         0.0.0.0                  0         32768 i
*> 11.0.0.0         0.0.0.0                  0         32768 i
*> 12.0.0.0         0.0.0.0                  0         32768 i
*> 123.1.0.0/16     4.4.46.4                               0 4 123 i
*                   5.5.56.5                               0 5 123 123 123 123 i
*> 123.1.2.0/24     4.4.46.4                               0 4 123 i
*                   5.5.56.5                               0 5 123 123 123 123 i
*> 123.1.3.0/24     4.4.46.4                               0 4 123 i
*                   5.5.56.5                               0 5 123 123 123 123 i

All of the initial configuration files to get the results above can be found here.

At this point we have a functioning BGP network. If R2 fails or the link from R2 to R4 fails traffic from R1 will be routed through R3. However, its an all or nothing thing, no load balancing/sharing. In real world applications, it won’t be much different. Not manipulating your BGP configuration in the appropriate way can cause unequal use of your upstream provider bandwidth. The next section describes a few strategies for creating a more balanced BGP environment.

Load Balancing with BGP

Inbound Load Balancing

The first step is provide a more balanced inbound BGP strategy. In order to do this we need to identify smaller subnets in our environment that we can advertise out to our upstream BGP peers. In the network above we’ve created two smaller networks 123.1.2.0/24 and 123.1.3.0/24. If we advertise upstream as follow:

  • R2->R4
    • 123.1.2.0/24
    • 123.1.0.0/16
  • R3->R5
    • 123.1.3.0/24
    • 123.1.0.0/16

R6 will then have a route for 123.1.2.0/24 through R4 and a route to 123.1.3.0/24 through R5 and if the host count is generally equal between the two /24 networks the level of traffic or bandwidth should be roughly the same on the R2->R4 link and the R3->R5 link from “Internet”.

In order to accomplish this on R2 we must prevent the 123.1.3.0/24 prefix from being advertised to R4. We can do this with a prefix list and a route map as follows:

ip prefix-list FILTERED seq 5 permit 123.1.3.0/24

route-map TO_R4 deny 10
 match ip address prefix-list FILTERED
route-map TO_R4 permit 20

router bgp 123
 address-family ipv4
  neighbor 4.4.24.4 route-map TO_R4 out

This will deny any prefix from being advertised that is in the FILTERED prefix list. Now R2 will only advertise the 123.1.2.0/24 and the 123.1.0.0/16 prefixes.

We must do the inverse on R3, filtering out the 123.1.2.0/24 prefix. Since we already placed a route-map on R3 in the previous section to prepend the as path we must replace that route-map. Config on R3 is as follows:

ip prefix-list FILTERED seq 5 permit 123.1.2.0/24

route-map TO_R5 deny 10
 match ip address prefix-list FILTERED
route-map TO_R5 permit 20

router bgp 123
 address-family ipv4
  neighbor 5.5.35.5 route-map TO_R5 out

Now we can confirm the results on R6

R6#sh bgp
BGP table version is 27, local router ID is 6.6.6.6
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 7.0.0.0          0.0.0.0                  0         32768 i
*> 8.0.0.0          0.0.0.0                  0         32768 i
*> 9.0.0.0          0.0.0.0                  0         32768 i
*> 10.0.0.0         0.0.0.0                  0         32768 i
*> 11.0.0.0         0.0.0.0                  0         32768 i
*> 12.0.0.0         0.0.0.0                  0         32768 i
*  123.1.0.0/16     5.5.56.5                               0 5 123 i
*>                  4.4.46.4                               0 4 123 i
*> 123.1.2.0/24     4.4.46.4                               0 4 123 i
*> 123.1.3.0/24     5.5.56.5                               0 5 123 i

R6#sh ip route bgp
     123.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
B       123.1.2.0/24 [20/0] via 4.4.46.4, 00:21:21
B       123.1.3.0/24 [20/0] via 5.5.56.5, 00:06:37
B       123.1.0.0/16 [20/0] via 4.4.46.4, 00:21:21
R6#traceroute 123.1.2.1 so lo7

Type escape sequence to abort.
Tracing the route to 123.1.2.1

  1 4.4.46.4 28 msec 20 msec 24 msec
  2 4.4.24.2 20 msec 48 msec 28 msec
  3 123.1.12.1 [AS 123] 80 msec 48 msec * 

R6#traceroute 123.1.3.1 so lo7

Type escape sequence to abort.
Tracing the route to 123.1.3.1

  1 5.5.56.5 72 msec 60 msec 44 msec
  2 5.5.35.3 76 msec 40 msec 56 msec
  3 123.1.13.1 [AS 123] 60 msec 48 msec * 

As you can see, the “Internet”, now sees more specific routes to the /24 prefixes, one towards R4 the other towards R5. In a real application you would need to determine all of the prefixes and based on host count determine which routers should advertise which routes to balance the incoming traffic.

Outbound Load Balancing

Outbound load balancing can be much trickier. Basically, we want traffic from R1 destined to the “Internet” to take different paths to get there. The approach I recommend is to set the local preference on incoming routes in a balanced manner. One way to do this would be to set the local preference to 150 on incoming routes sourced from odd AS’s on R2, and 150 on incoming routes sourced from even AS’s on R3.

Current routing table on R1

R1#sh ip route bgp
B    7.0.0.0/8 [200/0] via 123.1.1.2, 00:10:04
B    8.0.0.0/8 [200/0] via 123.1.1.2, 00:10:04
B    9.0.0.0/8 [200/0] via 123.1.1.2, 00:10:04
B    10.0.0.0/8 [200/0] via 123.1.1.2, 00:10:04
B    11.0.0.0/8 [200/0] via 123.1.1.2, 00:10:04
     123.0.0.0/8 is variably subnetted, 9 subnets, 3 masks
B       123.1.0.0/16 [200/0] via 123.1.1.2, 00:10:04
B    12.0.0.0/8 [200/0] via 123.1.1.2, 00:10:04

Notice, currently, all outbound traffic to AS7 to AS12 from R1 will go through R2 and out to R4 on the way to the Internet. We want to make this more distributed and balanced.

I’ve added statements in the route-maps on R6 to override the origin AS on the “Internet” routes:

route-map SET_AS_11 permit 10
 set origin egp 11
!
route-map SET_AS_10 permit 10
 set origin egp 10
!
route-map SET_AS_12 permit 10
 set origin egp 12
!
route-map SET_AS_7 permit 10
 set origin egp 7
!
route-map SET_AS_8 permit 10
 set origin egp 8
!
route-map SET_AS_9 permit 10
 set origin egp 9

router bgp 6
 address-family ipv4
  network 7.0.0.0 route-map SET_AS_7
  network 8.0.0.0 route-map SET_AS_8
  network 9.0.0.0 route-map SET_AS_9
  network 10.0.0.0 route-map SET_AS_10
  network 11.0.0.0 route-map SET_AS_11
  network 12.0.0.0 route-map SET_AS_12

You can then see the different origins on R3:

R3#sh bgp
BGP table version is 40, local router ID is 123.1.1.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i7.0.0.0          123.1.1.2                0    100      0 4 6 7 e
*>i8.0.0.0          123.1.1.2                0    100      0 4 6 8 e
*>i9.0.0.0          123.1.1.2                0    100      0 4 6 9 e
*>i10.0.0.0         123.1.1.2                0    100      0 4 6 10 e
*>i11.0.0.0         123.1.1.2                0    100      0 4 6 11 e
*>i12.0.0.0         123.1.1.2                0    100      0 4 6 12 e
* i123.1.0.0/16     123.1.1.2                0    100      0 i
*>                  0.0.0.0                  0         32768 i
*>i123.1.2.0/24     123.1.1.1                0    100      0 i
*>i123.1.3.0/24     123.1.1.1                0    100      0 i

First step is to create the necessary as-path access-list statements on R2 and R3.

R2 – match prefixes from odd AS numbers

ip as-path access-list 10 permit [1 3 5 7 9]$

R3 – match prefixes from even AS numbers

ip as-path access-list 10 permit [2 4 6 8 0]$

Next we create the route-map statements and apply to the BGP peers

R2

route-map SET_LOCAL_PREF permit 10
 match as-path 10
 set local-preference 150
route-map SET_LOCAL_PREF permit 20

router bgp 123
 !
 address-family ipv4
  neighbor 4.4.24.4 route-map SET_LOCAL_PREF in

R3

route-map SET_LOCAL_PREF permit 10
 match as-path 10
 set local-preference 150
route-map SET_LOCAL_PREF permit 20

router bgp 123
 !
 address-family ipv4
  neighbor 5.5.35.5 route-map SET_LOCAL_PREF in

I had to make sure that R2 and R3 were only advertising routes to AS 4 and 5 that originated in AS123. If this isn’t done AS123 becomes a transit AS. R5 was actually preferring the R3-R2-R4-R6 path to R6 and wasn’t advertising properly to R3. This was because the AS path, because of the prepends from R6, was shorter through R3.

On R2:

ip prefix-list LOCAL_ROUTES permit 123.1.0.0/16 le 24

route-map TO_R4 permit 20
 match ip address prefix-list LOCAL_ROUTES

On R3

ip prefix-list LOCAL_ROUTES permit 123.1.0.0/16 le 24

route-map TO_R5 permit 20
 match ip address prefix-list LOCAL_ROUTES

After making these changes the proper routes showed up in R5 and properly advertised “Internet” routes.

I had to clear ip bgp peers to get things to settle down correctly but at the end R1 now shows different routes to get to odd versus even AS originated prefixes.

R1

R1#sh bgp               
BGP table version is 34, local router ID is 123.1.3.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i7.0.0.0          123.1.1.2                0    150      0 4 6 7 e
*>i8.0.0.0          123.1.1.3                0    150      0 5 6 6 6 6 8 e
*>i9.0.0.0          123.1.1.2                0    150      0 4 6 9 e
*>i10.0.0.0         123.1.1.3                0    150      0 5 6 6 6 6 10 e
*>i11.0.0.0         123.1.1.2                0    150      0 4 6 11 e
*>i12.0.0.0         123.1.1.3                0    150      0 5 6 6 6 6 12 e
*>i123.1.0.0/16     123.1.1.2                0    100      0 i
* i                 123.1.1.3                0    100      0 i
*> 123.1.2.0/24     0.0.0.0                  0         32768 i
*> 123.1.3.0/24     0.0.0.0                  0         32768 i
R1#sh ip route bgp
B    7.0.0.0/8 [200/0] via 123.1.1.2, 00:02:44
B    8.0.0.0/8 [200/0] via 123.1.1.3, 00:04:06
B    9.0.0.0/8 [200/0] via 123.1.1.2, 00:02:44
B    10.0.0.0/8 [200/0] via 123.1.1.3, 00:04:06
B    11.0.0.0/8 [200/0] via 123.1.1.2, 00:02:44
     123.0.0.0/8 is variably subnetted, 9 subnets, 3 masks
B       123.1.0.0/16 [200/0] via 123.1.1.2, 00:02:02
B    12.0.0.0/8 [200/0] via 123.1.1.3, 00:04:06
R1#traceroute 7.7.7.7 so lo2

Type escape sequence to abort.
Tracing the route to 7.7.7.7

  1 123.1.12.2 64 msec 12 msec 12 msec
  2 4.4.24.4 16 msec 20 msec 44 msec
  3 4.4.46.6 24 msec 24 msec * 
R1#traceroute 8.8.8.8 so lo2

Type escape sequence to abort.
Tracing the route to 8.8.8.8

  1 123.1.13.3 8 msec 52 msec 20 msec
  2 5.5.35.5 44 msec 52 msec 44 msec
  3 5.5.56.6 56 msec 60 msec * 

As you can see we have split up both ingress and egress traffic in AS123. This is of course all good in theory but will take some potential manipulation of these strategies to get truly balanced traffic.

Here is a copy of the final configs.

 

UPDATE: I did see some activity on the NANOG mailing list regarding advertising longer prefixes such as /24. I can definitely see their point. So in this example I’m using /24 just because it was easy. It would make more sense to take the /16 in AS123 and split it into two /17 networks instead. If you have 3 or 4 upstream provider links I would split it into 4 /18 networks and so on and so forth. As a rule of thumb, advertise the smallest prefix lengths you possibly can to do what you can to keep the global BGP table size down.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s