Re: [Idr] why has 4096 bytes limit on BGP messages size?

Curtis Villamizar <curtis@occnc.com> Fri, 15 June 2007 05:50 UTC

Return-path: <idr-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Hz4hq-0000kr-9w; Fri, 15 Jun 2007 01:50:46 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Hz4ho-0000gr-GS for idr@ietf.org; Fri, 15 Jun 2007 01:50:44 -0400
Received: from c-24-7-120-3.hsd1.ca.comcast.net ([24.7.120.3] helo=sailbum.orleans.occnc.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Hz4hn-0004Ti-S2 for idr@ietf.org; Fri, 15 Jun 2007 01:50:44 -0400
Received: from sailbum.orleans.occnc.com (localhost [127.0.0.1]) by sailbum.orleans.occnc.com (8.13.6/8.13.6) with ESMTP id l5F5pY6W002152; Fri, 15 Jun 2007 01:51:34 -0400 (EDT) (envelope-from curtis@sailbum.orleans.occnc.com)
Message-Id: <200706150551.l5F5pY6W002152@sailbum.orleans.occnc.com>
To: Danny McPherson <danny@tcb.net>
From: Curtis Villamizar <curtis@occnc.com>
Subject: Re: [Idr] why has 4096 bytes limit on BGP messages size?
In-reply-to: Your message of "Thu, 14 Jun 2007 16:40:31 MDT." <75E42991-5683-422A-89AD-733A11CEF6EE@tcb.net>
Date: Fri, 15 Jun 2007 01:51:34 -0400
X-Spam-Score: 1.8 (+)
X-Scan-Signature: 50a516d93fd399dc60588708fd9a3002
Cc: idr List Routing <idr@ietf.org>, Pekka Savola <pekkas@netcore.fi>
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: curtis@occnc.com
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Errors-To: idr-bounces@ietf.org

In message <75E42991-5683-422A-89AD-733A11CEF6EE@tcb.net>
Danny McPherson writes:
>  
>  
> On Jun 14, 2007, at 9:59 AM, Pekka Savola wrote:
> >
> > (The header fields already tell the length of the message, so  
> > allocating 4K buffer for each BGP message (when most messages are  
> > much smaller than 4K) is probably a waste in many cases.)
>  
> Pekka,
> Do you have some references to data supporting your comment here
> regarding most messages being smaller than 4K?
>  
> -danny


I looked into exactly this more than a decade ago.  Its not likely to
have changed much.

Most of the time you get an occasional BGP keepalive.  This is
uninteresting.

What is interesting is when a transient occurs.  You have to engineer
for the stress conditions and this is it for a BGP implementation.

At that time many peers sent a large number of prefixes with the same
AS path.  These got packed into multiple full BGP updates and a few
partial updates.  Then there were AS paths (and other attributes, but
mostly AS path) that had a smaller number of prefixes.

If the BGP sender is efficient then lots of small updates are packed
into a single write.  TCP will deliver this as a stream of TCP
segments making use of the full MSS.  On the receiving end one big
read will pick up multiple BGP updates.  The BGP adj-in RIB processing
is done (which is very minimal) and a read is immediately reposted.
This way the flow of BGP updates goes quickly.  By the time the BGP
adj-out processing happens most or all of the updates have been
transferred.  A possible slow step is getting the route changes jammed
into the forwarding cards.  This too has improved.

So during a transient most of the reads are delivering either full BGP
updates or multiple partially filled updates.  If the read buffer is
larger than 4KB this is even more efficient as multiple full BGP
updates can be read in a single read.

Curtis


_______________________________________________
Idr mailing list
Idr@ietf.org
https://www1.ietf.org/mailman/listinfo/idr