[Idr] AD-review comments on draft-ietf-idr-bgp-analysis-04.txt

Alex Zinin <zinin@psg.com> Thu, 15 April 2004 02:33 UTC

Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA12093 for <idr-archive@ietf.org>; Wed, 14 Apr 2004 22:33:41 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1BDwhC-0000VP-00 for idr-archive@ietf.org; Wed, 14 Apr 2004 22:33:42 -0400
Received: from exim by ietf-mx with spam-scanned (Exim 4.12) id 1BDwgD-0000S8-00 for idr-archive@ietf.org; Wed, 14 Apr 2004 22:32:43 -0400
Received: from optimus.ietf.org ([132.151.1.19]) by ietf-mx with esmtp (Exim 4.12) id 1BDwfH-0000Pa-00; Wed, 14 Apr 2004 22:31:43 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1BDwbh-0007os-9i; Wed, 14 Apr 2004 22:28:01 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1BDwYb-0006yl-Nc for idr@optimus.ietf.org; Wed, 14 Apr 2004 22:24:49 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id WAA11925 for <idr@ietf.org>; Wed, 14 Apr 2004 22:24:45 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1BDwYY-00006s-00 for idr@ietf.org; Wed, 14 Apr 2004 22:24:46 -0400
Received: from exim by ietf-mx with spam-scanned (Exim 4.12) id 1BDwXd-00004U-00 for idr@ietf.org; Wed, 14 Apr 2004 22:23:51 -0400
Received: from psg.com ([147.28.0.62] ident=mailnull) by ietf-mx with esmtp (Exim 4.12) id 1BDwXA-000020-00 for idr@ietf.org; Wed, 14 Apr 2004 22:23:20 -0400
Received: from psg.com ([147.28.0.62] helo=[127.0.0.1]) by psg.com with esmtp (Exim 4.30; FreeBSD) id 1BDwXA-000Gc2-76 for idr@ietf.org; Thu, 15 Apr 2004 02:23:20 +0000
From: Alex Zinin <zinin@psg.com>
Reply-To: Alex Zinin <zinin@psg.com>
X-Priority: 3 (Normal)
Message-ID: <513361939.20040414192319@psg.com>
To: idr@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Transfer-Encoding: 7bit
Subject: [Idr] AD-review comments on draft-ietf-idr-bgp-analysis-04.txt
Sender: idr-admin@ietf.org
Errors-To: idr-admin@ietf.org
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/mail-archive/working-groups/idr/>
Date: Wed, 14 Apr 2004 19:23:19 -0700
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on ietf-mx.ietf.org
X-Spam-Status: No, hits=0.8 required=5.0 tests=AWL,PRIORITY_NO_NAME autolearn=no version=2.60
Content-Transfer-Encoding: 7bit

Folks-

I have some technical comments regarding periods of stability and instability in
the Internet, BGP steady state, and the security considerations, plus several
editorial remarks.

See inline below, please.

> 2.1.  Key Features
...
>    One of the most important path attributes is the Autonomous System
>    Path, or AS_PATH.  AS reachability information traverses the
                        ^^
                        "As"

>    Internet, this information is augmented by the list of autonomous
>    systems that have been traversed thus far, forming the AS_PATH.  The
>    AS_PATH allows straightforward suppression of the looping of routing
>    information.  In addition, the AS_PATH serves as a powerful and
>    versatile mechanism for policy-based routing.
> 
>    BGP enhances the AS_PATH attribute to include sets of autonomous
>    systems as well as lists via the AS_SET attribute.  This extended
>    format allows generated aggregate routes to carry path information
>    from the more specific routes used to generate the aggregate.  It
>    should be noted however, that as of this writing, AS_SETs are rarely
>    used in the Internet [ROUTEVIEWS].
> 
> 
> 
> 2.2.  BGP Algorithms
> 
> 
>    BGP uses an algorithm that is neither a pure distance vector
>    algorithm or a pure link state algorithm.  It is instead a modified
>    distance vector algorithm referred to as a "Path Vector" algorithm
>    that uses path information to avoid traditional distance vector
>    problems.  Each route within BGP pairs destination with path
>    information to that destination.  Path information (also known as
>    AS_PATH information) is stored within the AS_PATH attribute in BGP.
>    This allows BGP to reconstruct large portions of overall topology
>    whenever required.

I've always been uncomfortable with documents saying that BGP reconstructs the
overall topology. It doesn't really do this like the link-state protocols do,
for example.

>    BGP uses an incremental update strategy in order to conserve
>    bandwidth and processing power.  That is, after initial exchange of
>    complete routing information, a pair of BGP routers exchanges only
>    changes to that information.  Such an incremental update design
>    requires reliable transport between a pair of BGP routers to function
>    correctly.  BGP solves this problem by using TCP for reliable
>    transport.

Should also note that use of TCP as the transport mechanism helps control
congestion and CPU utilization.

> 
>    In addition to incremental updates, BGP has added the concept of
>    route aggregation so that information about groups of networks may be
> 
> 
> 
> Meyer and Patel                                   Section 2.2.  [Page 5]
> 
> INTERNET-DRAFT             Expires: March 2004            September 2003
> 
> 
>    aggregated and sent as a single Network Layer Reachability (NLRI).

this doesn't read well. Neither "reachability" nor "information" are countable.
"Single prefix" instead?

>    Finally, note that BGP is a self-contained protocol.  That is, BGP
>    specifies how routing information is exchanged both between BGP
>    speakers in different autonomous systems, and between BGP speakers
>    within a single autonomous system.
> 
> 
> 
> 2.3.  BGP Finite State Machine (FSM)
> 
> 
>    The BGP FSM is a set of rules that are applied to a BGP speaker's set
>    of configured peers for the BGP operation.  A BGP implementation
>    requires that a BGP speaker must connect to and listen on TCP port
>    179 for accepting any new BGP connections from its peers.  The BGP
>    Finite State Machine, or FSM, must be initiated and maintained for
>    each new incoming and outgoing peer connections.  However, in steady
>    state operation, there will be only one BGP FSM per connection per
>    peer.
> 
>    There may exist a temporary period where in a BGP peer may have
>    separate incoming and outgoing connections resulting into two
>    different BGP FSMs for a peer (instead of one).  This can be resolved
>    following BGP connection collision rules defined in the [BGP4].
> 
>    Following are different states of BGP FSM for its peers:
> 
>    IDLE:           State when BGP peer refuses any incoming
>                    connections.
> 
>    CONNECT:        State in which BGP peer is waiting for
>                    its TCP connection to be completed.
> 
>    ACTIVE:         State in which BGP peer is trying to acquire a
>                    peer by listening and accepting TCP connection.
> 
>    OPENSENT:       BGP peer is waiting for OPEN message from its
>                    peer.
> 
>    OPENCONFIRM:    BGP peer is waiting for KEEPALIVE or NOTIFICATION
>                    message from its peer.
> 
>    ESTABLISHED:    BGP peer connection is established and exchanges
>                    UPDATE, NOTIFICATION, and KEEPALIVE messages with
>                    its peer.
> 
>    There are different BGP events that operate on above mentioned states
> 
> 
> 
> Meyer and Patel                                   Section 2.3.  [Page 6]
> 
> INTERNET-DRAFT             Expires: March 2004            September 2003
> 
> 
>    of BGP FSM for its peers.  These BGP events are used for initiating and
>    terminating peer connections.  They also assist BGP in identifying any
>    persistent peer connection oscillations and provide a mechanism
>    for controlling them.
> 
>    Following are different BGP events:
> 
>    Manual Start:           Manually start the peer connection.
> 
>    Manual Stop:            Manually stop the peer connection.
> 
>    Automatic Start:        Local system automatically starts the peer
>                            connection.
> 
>    Manual start with
>    passive TCP flag:       Local system administrator manually starts the
>                            peer connection with peer in passive mode.
> 
>    Automatic start
>    with passive TCP flag:  Local system administrator automatically starts
>                            the peer connection with peer in passive mode.
> 
>    Automatic start
>    with bgp_stop_flap
>    option set:             Local system administrator automatically starts
>                            the peer connection with peer oscillation
>                            damping enabled.
> 
>    Automatic start with
>    bgp_stop_flap option
>    set and passive TCP
>    establishment
>    option set:             Local system administrator automatically starts
>                            the peer connection with peer oscillation
>                            damping enabled and with peer in passive mode.
> 
>    Automatic stop:         Local system automatically stops the
>                            BGP connection.
> 
>    Both, Manual Start and Manual Stop are mandatory BGP events.  All
>    other events are optional.
> 

Ummmm... the above are "administrative" events only. There are other events in
the FSM, and most of other events are mandatory.

> 
> 
> 
> 
> 
> 
> 
> 
> Meyer and Patel                                   Section 2.3.  [Page 7]
> 
> INTERNET-DRAFT             Expires: March 2004            September 2003
> 
> 
> 3.  BGP Capabilities
> 
> 
>    The BGP Capability mechanism [RFC2842] provides an easy and flexible
>    way to introduce new features within the protocol.  In particular,
>    the BGP capability mechanism allows peers to negotiate various
>    optional features during startup.  This allows the base BGP protocol
>    to contain only essential functionality, while at the same time
>    providing a flexible mechanism for signaling protocol extensions.
> 
> 
> 
> 4.  BGP Persistent Peer Oscillations
> 
> 
>    Ideally, whenever a BGP speaker detects an error in any peer
>    connection, it shuts down the peer and changes its FSM state to IDLE.
>    BGP speaker requires a Start event to re-initiate its idle peer
>    connection.  If the error remains persistent and BGP speaker
>    generates Start event automatically then it may result in persistent
>    peer flapping.  However, although peer oscillation is found to be
>    wide-spread in BGP implementations, methods for preventing persistent
>    peer oscillations are outside the scope of base BGP protocol
>    specification.
> 
> 
> 
> 5.  Implementation Guidelines
> 
> 
>    A robust BGP implementation is work conserving.  This means that if
>    the number of prefixes is bound, arbitrarily high levels of route
>    change can be tolerated with bounded impact on route convergence for
>    occasionally changes in generally stable routes.

"occasional"?

> 
>    A BGP implementation under high load conditions should empty as much
>    inbound routing updates from its input streams, processing only the
>    most recent route if the route for a given NLRI changes multiple
>    times.  TCP also provides blocking on the writes on the sender side.
>    A BGP implementation under load should expect blocks on write calls
>    and send only the most recent routes when sockets unblock rather than
>    sending entire history.
> 
>    A robust implementation of BGP should have the following
>    characteristics:
>          1.  It is able to operate in almost arbitrarily high levels
>              of route flap without loosing peerings (failing to send
>              keepalives) or loosing other protocol adjacencies as a
> 
> 
> 
> Meyer and Patel                                     Section 5.  [Page 8]
> 
> INTERNET-DRAFT             Expires: March 2004            September 2003
> 
> 
>              result of BGP load.
> 
>          2.  Instability of a subset of routes should not affect the
>              route advertisements or forwarding associated with the set
>              of stable routes.
> 
>          3.  High levels of instability and peers of different CPU speed
>              or load resulting in faster or slower processing of routes
>              should not cause instability and should have a bounded
>              impact on the convergence time for generally stable routes.
> 
>    Numerous robust BGP implementations exist.  Producing a robust
>    implementation is not a trivial matter but clearly achievable.
> 
> 
> 
> 
> 6.  BGP Performance characteristics and Scalability
> 
> 
>    In this section, we provide "order of magnitude" answers to the
>    questions of how much link bandwidth, router memory and router CPU
>    cycles the BGP protocol will consume under normal conditions.  In
>    particular, we will address the scalability of BGP and its
>    limitations.
> 
> 
> 
> 6.1.  Link bandwidth and CPU utilization
> 
> 
>    Immediately after the initial BGP connection setup, BGP peers
>    exchange complete set of routing information.  If we denote the total
>    number of routes in the Internet by N, the mean AS distance of the
>    Internet by M (distance at the level of an autonomous system,
>    expressed in terms of the number of autonomous systems), the total
>    number of unique AS paths by A, and assume that the networks are
>    uniformly distributed among the autonomous systems, then the worst
>    case amount of bandwidth consumed during the initial exchange between
>    a pair of BGP speakers is
> 
>            BW = O(N + (M * A))
> 
>    The following table illustrates the typical amount of bandwidth
>    consumed during the initial exchange between a pair of BGP speakers
>    based on the above assumptions (ignoring bandwidth consumed by the
>    BGP Header).  For purposes of the estimates here, we will calculate
>    BW = 4 * (N + (M * A)).
> 
> 
> 
> Meyer and Patel                                   Section 6.1.  [Page 9]
> 
> INTERNET-DRAFT             Expires: March 2004            September 2003
> 
> 
>     # NLRI       Mean AS Distance       # AS's     Bandwidth (MR)
>     ----------   ----------------       ------    ----------------
>     40,000       15                     400        184,000   bytes
>     100,000      10                     10,000     800,000   bytes
>     120,000      10                     15,000     1,080,000 bytes
>     140,000      15                     20,000     1,760,000 bytes
> 
>     [note that most of this bandwidth is consumed by the NLRI exchange]

Is the caption for column 3 correct? It says "# AS's", which reads as "number of
AS's" while it seems it should be the number of unique paths.
> 
>    BGP was created specifically to reduce the size of the set of NLRI
>    entries which have to be carried and exchanged by border routers.
>    The aggregation scheme, defined in RFC 1519 [RFC1519], describes the
>    provider-based aggregation scheme in use in today's Internet.
> 
>    Due to the advantages of advertising a few large aggregate blocks
>    instead of many smaller class-based individual networks, it is
>    difficult to estimate the actual reduction in bandwidth and
>    processing that BGP has provided over BGP-3.  If we simply enumerate
>    all aggregate blocks into their individual class-based networks, we
>    would not take into account "dead" space that has been reserved for
>    future expansion.  The best metric for determining the success of
>    BGP's aggregation is to sample the number NLRI entries in the
>    globally connected Internet today and compare it to projected growth
>    rates before BGP was deployed.
> 
>    At the time of this writing, the full set of exterior routes carried
>    by BGP is approximately 120,000 network entries [ROUTEVIEWS].
> 
> 
> 
> 6.1.1.  CPU utilization
> 
> 
>    An important and fundamental feature of BGP is that BGP's CPU
>    utilization depends only on the stability of the Internet.  If the
>    Internet is stable, then the only link bandwidth and router CPU
>    cycles consumed by BGP are due to the exchange of the BGP KEEPALIVE
>    messages.  The KEEPALIVE messages are exchanged only between peers.
>    The suggested frequency of the exchange is 30 seconds.  The KEEPALIVE
>    messages are quite short (19 octets), and require virtually no
>    processing.  As a result, the bandwidth consumed by the KEEPALIVE
>    messages is about 5 bits/sec.  Operational experience confirms that
>    the overhead (in terms of bandwidth and CPU) associated with the
>    KEEPALIVE messages should be viewed as negligible.
> 
>    During periods of Internet instability, changes to the reachability
>    information are passed between routers in UPDATE messages.  The

While theoretically the text is correct and we could talk about periods of
stability and instability of the Internet, I wonder if this text is still
applicable from the practical perspective. I.e., the continuous churn that the
Internet BGP speakers experience ensures that they practically always have
something to process.

Probably instead of talking about periods of "Internet stability" and "Internet
instability" we could talk about something like "periods of stable state among
BGP speakers when they do no have new updates to communicate to each other".

>
> Meyer and Patel                                Section 6.1.1.  [Page 10]
> 
> INTERNET-DRAFT             Expires: March 2004            September 2003
> 
> 
>    greatest overhead per UPDATE message occurs when each UPDATE message
>    contains only a single network.  It should be pointed out that in
>    practice routing changes exhibit strong locality with respect to the
>    AS path.  That is, routes that change are likely to have common AS
>    path.  In this case, multiple networks can be grouped into a single
>    UPDATE message, thus significantly reducing the amount of bandwidth
>    required (see also Appendix F.1 of [BGP4]).
> 
>    Since in the steady state the link bandwidth and router CPU cycles
>    consumed by the BGP protocol are dependent only on the stability of
>    the Internet, it follows that BGP should have no scaling problems in
>    the areas of link bandwidth and router CPU utilization.

Not sure I follow here. What is meant by the "steady state" here? If it means
convergence on a stable topology after the initial exchange of updates, then why
talk about "stability of the Internet"? In other words, if the Internet is
unstable then can we say that the BGP speakers are in steady state? In the lack
of topology changes, it seems that the consumption should instead depend on the
number of peers (affects KEEPALIVE processing) and the number of persistently
oscillating routes potentially present in the network...

> This assumes
>    that as the Internet grows,  the overall stability of the inter-AS
>    connectivity of the Internet can be controlled.

please specify how it is assumed to be "controlled", e.g. through operational
practices.

> In particular, while
>    the size of the IPv4 Internet routing table is bounded by O(232 * M),

232 or 2^32?

>    (where M is a slow-moving function describing the AS
>    interconnectivity of the network),

slow-moving or slow-growing?

> no such bound can be formulated
>    for the dynamic properties (i.e., stability) of BGP.  Although, the
>    dynamic properties of the network cannot be quantitatively bounded,
>    they can be controlled within BGP.  Beyond certain changes in the
>    network, BGP can start to suppress such changes using BGP Route Flap
>    Damping [RFC2439], pacing of its route updates, or BGP would be
>    unable to keep up with the changes and force suppression of multiple
>    changes over very short periods by causing the BGP peer socket to
>    block on the sender.
> 
> 
> 
> 6.1.2.  Memory requirements
> 
> 
>    To quantify the worst case memory requirements for BGP, we denote the
>    total number of networks in the Internet by N, the mean AS distance
>    of the Internet by M (distance at the level of an autonomous system,
>    expressed in terms of the number of autonomous systems), the total
>    number of unique AS paths as A.  Then the worst case memory
>    requirements (MR) can be expressed as
> 
> 
>            MR = O(N + (M * A))
> 
> 
>    Since a mean AS distance M is a slow moving function of the
>    interconnectivity ("meshiness") of the Internet, for all practical
>    purposes the worst case router memory requirements are on the order
>    of the total number of networks in the Internet times the number of
>    peers the local system is peering with.  We expect that the total
>    number of networks in the Internet will grow much faster than the
> 
> 
> 
> Meyer and Patel                                Section 6.1.2.  [Page 11]
> 
> INTERNET-DRAFT             Expires: March 2004            September 2003
> 
> 
>    average number of peers per router.  As a result, BGP's memory
>    scaling properties are linearly related to the total number of
>    networks in the Internet.
> 
>    The following table illustrates typical memory requirements of a
>    router running BGP.  We denote average number of routes advertised by
>    each peer as N, the total number of unique AS paths as A, the mean AS
>    distance of the Internet as M (distance at the level of an autonomous
>    system, expressed in terms of the number of autonomous systems),
>    number of bytes required to store a route as R, and number of bytes
>    required to store one AS in an AS path as P.  It is assumed that each
>    network is encoded as four bytes, each AS is encoded as two bytes,
>    and each networks is reachable via some fraction of all of the peers
>    (# BGP peers/per net).  For purposes of the estimates here, we will
>    calculate MR = ((N * R) + (M * A) * P)
> 
> 
>      # Networks  Mean AS Distance # AS's # BGP peers/per net Memory Req (MR)
>      ----------  ---------------- ------ ------------------- --------------
>       100,000           20         3,000         20             1,040,000
>       100,000           20        15,000         20             1,040,000
>       120,000           10        15,000        100            75,000,000
>       140,000           15        20,000        100           116,000,000
> 
> 
>    In analyzing BGP's memory requirements, we focus on the size of the
>    forwarding table (and ignoring implementation details).  In
>    particular, we derive upper bounds for the size of the forwarding
>    table.

The above doesn't look like calculations for a forwarding table--the FIB doesn't
normally contact AS-PATH info or all possible paths to a destination.

>
> 11.  Security Considerations
> 
> 
>    This document presents an analysis of the BGP protocol and as such
>    presents no new security implications for BGP.

I'm afraid the IESG will expect to see some more here. Specifically, I would
suggest that the document talks about available security mechanisms, and the
analysis of how they affect processing overhead, BW, and scalability.
Certainly a pointer to the bgp-vuln document would be useful.

Alex


_______________________________________________
Idr mailing list
Idr@ietf.org
https://www1.ietf.org/mailman/listinfo/idr