Re: draft-ietf-idr-bgp4-18.txt
Yakov Rekhter <yakov@juniper.net> Fri, 01 November 2002 13:51 UTC
Received: from trapdoor.merit.edu (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA06898 for <idr-archive@ietf.org>; Fri, 1 Nov 2002 08:51:20 -0500 (EST)
Received: by trapdoor.merit.edu (Postfix) id 905379122C; Fri, 1 Nov 2002 08:53:31 -0500 (EST)
Delivered-To: idr-outgoing@trapdoor.merit.edu
Received: by trapdoor.merit.edu (Postfix, from userid 56) id 358ED91235; Fri, 1 Nov 2002 08:53:31 -0500 (EST)
Delivered-To: idr@trapdoor.merit.edu
Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id B52509122C for <idr@trapdoor.merit.edu>; Fri, 1 Nov 2002 08:53:23 -0500 (EST)
Received: by segue.merit.edu (Postfix) id 81CCE5DEF4; Fri, 1 Nov 2002 08:53:23 -0500 (EST)
Delivered-To: idr@merit.edu
Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id ACD5A5DE17 for <idr@merit.edu>; Fri, 1 Nov 2002 08:53:21 -0500 (EST)
Received: from juniper.net (garnet.juniper.net [172.17.28.17]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id gA1DrIm43866; Fri, 1 Nov 2002 05:53:18 -0800 (PST) (envelope-from yakov@juniper.net)
Message-Id: <200211011353.gA1DrIm43866@merlot.juniper.net>
To: "Natale, Jonathan" <JNatale@celoxnetworks.com>
Cc: 'Parag Deshpande' <paragdeshpande@sdksoft.com>, idr@merit.edu
Subject: Re: draft-ietf-idr-bgp4-18.txt
In-Reply-To: Your message of "Fri, 01 Nov 2002 08:20:05 EST." <1117F7D44159934FB116E36F4ABF221B02C7C5F6@celox-ma1-ems1.celoxnetworks.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <58459.1036158797.1@juniper.net>
Date: Fri, 01 Nov 2002 05:53:18 -0800
From: Yakov Rekhter <yakov@juniper.net>
Sender: owner-idr@merit.edu
Precedence: bulk
Jonathan, > This is obviously an "uncontrolled copy", but *I think* it is current. > Also, refer to the "RE: BGP Base Draft - Issue List v1.5" email sent on > Monday, October 28, 2002 7:00 PM for info on the proposed changes. > I am assuming that this current version was removed because the > new version is to be posted shortly. In fact, I submitted the -18 version on Wednesday. Yakov. > > > > -----Original Message----- > > From: Parag Deshpande [mailto:paragdeshpande@sdksoft.com] > > Sent: Thursday, October 31, 2002 5:57 PM > > To: idr@merit.edu > > Cc: Susan Hares > > Subject: draft-ietf-idr-bgp4-18.txt > > > > > Hi, > > > > I am unable to locate the latest bgp draft on ietf site. > > Where can I get it? > > I would appreciate if someone could just mail it to me. > > > > Thanks, > > Parag > > > > > > > ------_=_NextPart_000_01C281A9.64ABEC00 > Content-Type: text/plain; > name="draft-ietf-idr-bgp4-17.txt" > Content-Transfer-Encoding: quoted-printable > Content-Disposition: attachment; > filename="draft-ietf-idr-bgp4-17.txt" > > > > > Network Working Group Y. Rekhter > INTERNET DRAFT Juniper Networks > T. Li > Procket Networks, Inc. > Editors > > > > A Border Gateway Protocol 4 (BGP-4) > <draft-ietf-idr-bgp4-17.txt> > > > Status of this Memo > > > This document is an Internet-Draft and is in full conformance with > all provisions of Section 10 of RFC2026. > > Internet-Drafts are working documents of the Internet Engineering > Task Force (IETF), its areas, and its working groups. Note that > other groups may also distribute working documents as Internet- > Drafts. > > Internet-Drafts are draft documents valid for a maximum of six = > months > and may be updated, replaced, or obsoleted by other documents at any > time. It is inappropriate to use Internet-Drafts as reference > material or to cite them other than as ``work in progress.'' > > The list of current Internet-Drafts can be accessed at > http://www.ietf.org/ietf/1id-abstracts.txt > > The list of Internet-Draft Shadow Directories can be accessed at > http://www.ietf.org/shadow.html. > > > > 1. Acknowledgments > > This document was originally published as RFC 1267 in October 1991, > jointly authored by Kirk Lougheed and Yakov Rekhter. > > We would like to express our thanks to Guy Almes, Len Bosack, and > Jeffrey C. Honig for their contributions to the earlier version of > this document. > > We like to explicitly thank Bob Braden for the review of the earlier > version of this document as well as his constructive and valuable > comments. > > > > Expiration Date July 2002 = > =0C[Page 1] > > > > > > RFC DRAFT January = > 2002 > > > We would also like to thank Bob Hinden, Director for Routing of the > Internet Engineering Steering Group, and the team of reviewers he > assembled to review the earlier version (BGP-2) of this document. > This team, consisting of Deborah Estrin, Milo Medin, John Moy, Radia > Perlman, Martha Steenstrup, Mike St. Johns, and Paul Tsuchiya, acted > with a strong combination of toughness, professionalism, and > courtesy. > > This updated version of the document is the product of the IETF IDR > Working Group with Yakov Rekhter and Tony Li as editors. Certain > sections of the document borrowed heavily from IDRP [7], which is = > the > OSI counterpart of BGP. For this credit should be given to the ANSI > X3S3.3 group chaired by Lyman Chapin and to Charles Kunzinger who = > was > the IDRP editor within that group. We would also like to thank Enke > Chen, Edward Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey > Haas, Dimitry Haskin, John Krawczyk, David LeRoy, Dan Massey, Dan > Pei, Mathew Richardson, John Scudder, John Stewart III, Dave Thaler, > Paul Traina, Russ White, Curtis Villamizar, and Alex Zinin for their > comments. > > Many thanks to Sue Hares for her contributions to the document, and > especially for her work on the BGP Finite State Machine. > > We would like to specially acknowledge numerous contributions by > Dennis Ferguson. > > > 2. Introduction > > The Border Gateway Protocol (BGP) is an inter-Autonomous System > routing protocol. It is built on experience gained with EGP as > defined in RFC 904 [1] and EGP usage in the NSFNET Backbone as > described in RFC 1092 [2] and RFC 1093 [3]. > > The primary function of a BGP speaking system is to exchange network > reachability information with other BGP systems. This network > reachability information includes information on the list of > Autonomous Systems (ASs) that reachability information traverses. > This information is sufficient to construct a graph of AS > connectivity from which routing loops may be pruned and some policy > decisions at the AS level may be enforced. > > BGP-4 provides a new set of mechanisms for supporting Classless > Inter-Domain Routing (CIDR) [8, 9]. These mechanisms include support > for advertising an IP prefix and eliminates the concept of network > "class" within BGP. BGP-4 also introduces mechanisms which allow > aggregation of routes, including aggregation of AS paths. > > > > > Expiration Date July 2002 = > =0C[Page 2] > > > > > > RFC DRAFT January = > 2002 > > > To characterize the set of policy decisions that can be enforced > using BGP, one must focus on the rule that a BGP speaker advertises > to its peers (other BGP speakers which it communicates with) in > neighboring ASs only those routes that it itself uses. This rule > reflects the "hop-by-hop" routing paradigm generally used throughout > the current Internet. Note that some policies cannot be supported by > the "hop-by-hop" routing paradigm and thus require techniques such = > as > source routing (aka explicit routing) to enforce. For example, BGP > does not enable one AS to send traffic to a neighboring AS intending > that the traffic take a different route from that taken by traffic > originating in the neighboring AS. On the other hand, BGP can = > support > any policy conforming to the "hop-by-hop" routing paradigm. Since = > the > current Internet uses only the "hop-by-hop" inter-AS routing = > paradigm > and since BGP can support any policy that conforms to that paradigm, > BGP is highly applicable as an inter-AS routing protocol for the > current Internet. > > A more complete discussion of what policies can and cannot be > enforced with BGP is outside the scope of this document (but refer = > to > the companion document discussing BGP usage [5]). > > BGP runs over a reliable transport protocol. This eliminates the = > need > to implement explicit update fragmentation, retransmission, > acknowledgment, and sequencing. Any authentication scheme used by = > the > transport protocol (e.g., RFC2385 [10]) may be used in addition to > BGP's own authentication mechanisms. The error notification = > mechanism > used in BGP assumes that the transport protocol supports a = > "graceful" > close, i.e., that all outstanding data will be delivered before the > connection is closed. > > BGP uses TCP [4] as its transport protocol. TCP meets BGP's = > transport > requirements and is present in virtually all commercial routers and > hosts. In the following descriptions the phrase "transport protocol > connection" can be understood to refer to a TCP connection. BGP uses > TCP port 179 for establishing its connections. > > This document uses the term `Autonomous System' (AS) throughout. = > The > classic definition of an Autonomous System is a set of routers under > a single technical administration, using an interior gateway = > protocol > and common metrics to determine how to route packets within the AS, > and using an exterior gateway protocol to determine how to route > packets to other ASs. Since this classic definition was developed, = > it > has become common for a single AS to use several interior gateway > protocols and sometimes several sets of metrics within an AS. The = > use > of the term Autonomous System here stresses the fact that, even when > multiple IGPs and metrics are used, the administration of an AS > appears to other ASs to have a single coherent interior routing plan > and presents a consistent picture of what destinations are reachable > > > > Expiration Date July 2002 = > =0C[Page 3] > > > > > > RFC DRAFT January = > 2002 > > > through it. > > The planned use of BGP in the Internet environment, including such > issues as topology, the interaction between BGP and IGPs, and the > enforcement of routing policy rules is presented in a companion > document [5]. This document is the first of a series of documents > planned to explore various aspects of BGP application. > > > 3. Summary of Operation > > Two systems form a transport protocol connection between one = > another. > They exchange messages to open and confirm the connection = > parameters. > > The initial data flow is the portion of the BGP routing table that = is > allowed by the export policy, called the Adj-Ribs-Out (see 3.2). > Incremental updates are sent as the routing tables change. BGP does > not require periodic refresh of the routing table. Therefore, a BGP > speaker must retain the current version of the routes advertised by > all of its peers for the duration of the connection. If the > implementation decides to not store the routes that have been > received from a peer, but have been filtered out according to > configured local policy, the BGP Route Refresh extension [12] may be > used to request the full set of routes from a peer without resetting > the BGP session when the local policy configuration changes. > > KEEPALIVE messages may be sent periodically to ensure the liveness = > of > the connection. NOTIFICATION messages are sent in response to errors > or special conditions. If a connection encounters an error = > condition, > a NOTIFICATION message is sent and the connection is closed. > > The hosts executing the Border Gateway Protocol need not be routers. > A non-routing host could exchange routing information with routers > via EGP or even an interior routing protocol. That non-routing host > could then use BGP to exchange routing information with a border > router in another Autonomous System. The implications and > applications of this architecture are for further study. > > Connections between BGP speakers of different ASs are referred to as > "external" links. BGP connections between BGP speakers within the > same AS are referred to as "internal" links. Similarly, a peer in a > different AS is referred to as an external peer, while a peer in the > same AS may be described as an internal peer. Internal BGP and > external BGP are commonly abbreviated IBGP and EBGP. > > If a particular AS has multiple BGP speakers and is providing = > transit > service for other ASs, then care must be taken to ensure a = > consistent > view of routing within the AS. A consistent view of the interior > > > > Expiration Date July 2002 = > =0C[Page 4] > > > > > > RFC DRAFT January = > 2002 > > > routes of the AS is provided by the interior routing protocol. A > consistent view of the routes exterior to the AS can be provided by > having all BGP speakers within the AS maintain direct IBGP > connections with each other. Alternately the interior routing > protocol can pass BGP information among routers within an AS, taking > care not to lose BGP attributes that will be needed by EBGP speakers > if transit connectivity is being provided. For the purpose of > discussion, it is assumed that BGP information is passed within an = > AS > using IBGP. Care must be taken to ensure that the interior routers > have all been updated with transit information before the EBGP > speakers announce to other ASs that transit service is being > provided. > > > 3.1 Routes: Advertisement and Storage > > For the purpose of this protocol, a route is defined as a unit of > information that pairs a set of destinations with the attributes of = > a > path to those destinations. The set of destinations are the systems > whose IP addresses are reported in the Network Layer Reachability > Information (NLRI) field and the path is the information reported in > the path attributes field of the same UPDATE message. > > Routes are advertised between BGP speakers in UPDATE messages. > > Routes are stored in the Routing Information Bases (RIBs): namely, > the Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out. Routes that will > be advertised to other BGP speakers must be present in the Adj-RIB- > Out. Routes that will be used by the local BGP speaker must be > present in the Loc-RIB, and the next hop for each of these routes > must be resolvable via the local BGP speaker's Routing Table. = > Routes > that are received from other BGP speakers are present in the Adj- > RIBs-In. > > If a BGP speaker chooses to advertise the route, it may add to or > modify the path attributes of the route before advertising it to a > peer. > > BGP provides mechanisms by which a BGP speaker can inform its peer > that a previously advertised route is no longer available for use. > There are three methods by which a given BGP speaker can indicate > that a route has been withdrawn from service: > > a) the IP prefix that expresses the destination for a previously > advertised route can be advertised in the WITHDRAWN ROUTES field > in the UPDATE message, thus marking the associated route as being > no longer available for use > > > > > Expiration Date July 2002 = > =0C[Page 5] > > > > > > RFC DRAFT January = > 2002 > > > b) a replacement route with the same NLRI can be advertised, or > > c) the BGP speaker - BGP speaker connection can be closed, which > implicitly removes from service all routes which the pair of > speakers had advertised to each other. > > > 3.2 Routing Information Bases > > The Routing Information Base (RIB) within a BGP speaker consists of > three distinct parts: > > a) Adj-RIBs-In: The Adj-RIBs-In store routing information that = > has > been learned from inbound UPDATE messages. Their contents > represent routes that are available as an input to the Decision > Process. > > b) Loc-RIB: The Loc-RIB contains the local routing information > that the BGP speaker has selected by applying its local policies > to the routing information contained in its Adj-RIBs-In. > > c) Adj-RIBs-Out: The Adj-RIBs-Out store the information that the > local BGP speaker has selected for advertisement to its peers. = > The > routing information stored in the Adj-RIBs-Out will be carried in > the local BGP speaker's UPDATE messages and advertised to its > peers. > > In summary, the Adj-RIBs-In contain unprocessed routing information > that has been advertised to the local BGP speaker by its peers; the > Loc-RIB contains the routes that have been selected by the local BGP > speaker's Decision Process; and the Adj-RIBs-Out organize the routes > for advertisement to specific peers by means of the local speaker's > UPDATE messages. > > Although the conceptual model distinguishes between Adj-RIBs-In, = > Loc- > RIB, and Adj-RIBs-Out, this neither implies nor requires that an > implementation must maintain three separate copies of the routing > information. The choice of implementation (for example, 3 copies of > the information vs 1 copy with pointers) is not constrained by the > protocol. > > Routing information that the router uses to forward packets (or to > construct the forwarding table that is used for packet forwarding) = > is > maintained in the Routing Table. The Routing Table accumulates = > routes > to directly connected networks, static routes, routes learned from > the IGP protocols, and routes learned from BGP. Whether or not a > specific BGP route should be installed in the Routing Table, and > whether a BGP route should override a route to the same destination > > > > Expiration Date July 2002 = > =0C[Page 6] > > > > > > RFC DRAFT January = > 2002 > > > installed by another source is a local policy decision, not = > specified > in this document. Besides actual packet forwarding, the Routing = > Table > is used for resolution of the next-hop addresses specified in BGP > updates (see Section 9.1.2). > > > 4. Message Formats > > This section describes message formats used by BGP. > > Messages are sent over a reliable transport protocol connection. A > message is processed only after it is entirely received. The maximum > message size is 4096 octets. All implementations are required to > support this maximum message size. The smallest message that may be > sent consists of a BGP header without a data portion, or 19 octets. > > > 4.1 Message Header Format > > Each message has a fixed-size header. There may or may not be a data > portion following the header, depending on the message type. The > layout of these fields is shown below: > > 0 1 2 3 > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | | > + + > | | > + + > | Marker | > + + > | | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | Length | Type | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > Marker: > > This 16-octet field contains a value that the receiver of the > message can predict. If the Type of the message is OPEN, or if > the OPEN message carries no Authentication Information (as an > Optional Parameter), then the Marker must be all ones. > Otherwise, the value of the marker can be predicted by some a > computation specified as part of the authentication mechanism > (which is specified as part of the Authentication Information) > used. The Marker can be used to detect loss of synchronization > > > > Expiration Date July 2002 = > =0C[Page 7] > > > > > > RFC DRAFT January = > 2002 > > > between a pair of BGP peers, and to authenticate incoming BGP > messages. > > Length: > > This 2-octet unsigned integer indicates the total length of = > the > message, including the header, in octets. Thus, e.g., it = > allows > one to locate in the transport-level stream the (Marker field > of the) next message. The value of the Length field must = > always > be at least 19 and no greater than 4096, and may be further > constrained, depending on the message type. No "padding" of > extra data after the message is allowed, so the Length field > must have the smallest value required given the rest of the > message. > > Type: > > This 1-octet unsigned integer indicates the type code of the > message. The following type codes are defined: > > 1 - OPEN > 2 - UPDATE > 3 - NOTIFICATION > 4 - KEEPALIVE > > 4.2 OPEN Message Format > > After a transport protocol connection is established, the first > message sent by each side is an OPEN message. If the OPEN message is > acceptable, a KEEPALIVE message confirming the OPEN is sent back. > Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION > messages may be exchanged. > > In addition to the fixed-size BGP header, the OPEN message contains > the following fields: > > 0 1 2 3 > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > +-+-+-+-+-+-+-+-+ > | Version | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | My Autonomous System | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | Hold Time | > = > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | BGP Identifier = > | > = > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | Opt Parm Len | > > > > Expiration Date July 2002 = > =0C[Page 8] > > > > > > RFC DRAFT January = > 2002 > > > = > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | = > | > | Optional Parameters (variable) = > | > | = > | > = > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > Version: > > This 1-octet unsigned integer indicates the protocol version > number of the message. The current BGP version number is 4. > > My Autonomous System: > > This 2-octet unsigned integer indicates the Autonomous System > number of the sender. > > Hold Time: > > This 2-octet unsigned integer indicates the number of seconds > that the sender proposes for the value of the Hold Timer. Upon > receipt of an OPEN message, a BGP speaker MUST calculate the > value of the Hold Timer by using the smaller of its configured > Hold Time and the Hold Time received in the OPEN message. The > Hold Time MUST be either zero or at least three seconds. An > implementation may reject connections on the basis of the Hold > Time. The calculated value indicates the maximum number of > seconds that may elapse between the receipt of successive > KEEPALIVE, and/or UPDATE messages by the sender. > > BGP Identifier: > > This 4-octet unsigned integer indicates the BGP Identifier of > the sender. A given BGP speaker sets the value of its BGP > Identifier to an IP address assigned to that BGP speaker. The > value of the BGP Identifier is determined on startup and is = > the > same for every local interface and every BGP peer. > > Optional Parameters Length: > > This 1-octet unsigned integer indicates the total length of = > the > Optional Parameters field in octets. If the value of this = > field > is zero, no Optional Parameters are present. > > Optional Parameters: > > This field may contain a list of optional parameters, where > each parameter is encoded as a <Parameter Type, Parameter > > > > Expiration Date July 2002 = > =0C[Page 9] > > > > > > RFC DRAFT January = > 2002 > > > Length, Parameter Value> triplet. > > 0 1 > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... > | Parm. Type | Parm. Length | Parameter Value = > (variable) > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... > > Parameter Type is a one octet field that unambiguously > identifies individual parameters. Parameter Length is a one > octet field that contains the length of the Parameter Value > field in octets. Parameter Value is a variable length field > that is interpreted according to the value of the Parameter > Type field. > > This document defines the following Optional Parameters: > > a) Authentication Information (Parameter Type 1): > > > This optional parameter may be used to authenticate a BGP > peer. The Parameter Value field contains a 1-octet > Authentication Code followed by a variable length > Authentication Data. > > 0 1 2 3 4 5 6 7 8 > +-+-+-+-+-+-+-+-+ > | Auth. Code | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | | > | Authentication Data | > | | > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > Authentication Code: > > This 1-octet unsigned integer indicates the > authentication mechanism being used. Whenever an > authentication mechanism is specified for use within > BGP, three things must be included in the > specification: > > - the value of the Authentication Code which = > indicates > use of the mechanism, > - the form and meaning of the Authentication Data, = > and > - the algorithm for computing values of Marker = > fields. > > > > > Expiration Date July 2002 =0C[Page = > 10] > > > > > > RFC DRAFT January = > 2002 > > > Note that a separate authentication mechanism may be > used in establishing the transport level connection. > > Authentication Data: > > Authentication Data is a variable length field that = > is interpreted according to the value of the > Authentication Code field. > > > The minimum length of the OPEN message is 29 octets (including > message header). > > > 4.3 UPDATE Message Format > > > UPDATE messages are used to transfer routing information between BGP > peers. The information in the UPDATE packet can be used to construct > a graph describing the relationships of the various Autonomous > Systems. By applying rules to be discussed, routing information = > loops > and some other anomalies may be detected and removed from inter-AS > routing. > > An UPDATE message is used to advertise feasible routes sharing = > common > path attribute to a peer, or to withdraw multiple unfeasible routes > from service (see 3.1). An UPDATE message may simultaneously > advertise a feasible route and withdraw multiple unfeasible routes > from service. The UPDATE message always includes the fixed-size BGP > header, and also includes the other fields as shown below (note, = some > of the shown fields may not be present in every UPDATE message): > > > +-----------------------------------------------------+ > | Withdrawn Routes Length (2 octets) | > +-----------------------------------------------------+ > | Withdrawn Routes (variable) | > +-----------------------------------------------------+ > | Total Path Attribute Length (2 octets) | > +-----------------------------------------------------+ > | Path Attributes (variable) | > +-----------------------------------------------------+ > | Network Layer Reachability Information (variable) | > +-----------------------------------------------------+ > > > > Withdrawn Routes Length: > > > > Expiration Date July 2002 =0C[Page = > 11] > > > > > > RFC DRAFT January = > 2002 > > > This 2-octets unsigned integer indicates the total length of the Withdrawn Routes field in octets. Its value must allow = > the > length of the Network Layer Reachability Information field to > be determined as specified below. > > A value of 0 indicates that no routes are being withdrawn from > service, and that the WITHDRAWN ROUTES field is not present in > this UPDATE message. > > Withdrawn Routes: > > > This is a variable length field that contains a list of IP > address prefixes for the routes that are being withdrawn from > service. Each IP address prefix is encoded as a 2-tuple of the > form <length, prefix>, whose fields are described below: > > +---------------------------+ > | Length (1 octet) | > +---------------------------+ > | Prefix (variable) | > +---------------------------+ > > > The use and the meaning of these fields are as follows: > > a) Length: > > The Length field indicates the length in bits of the IP > address prefix. A length of zero indicates a prefix that > matches all IP addresses (with prefix, itself, of zero > octets). > > b) Prefix: > > The Prefix field contains an IP address prefix followed by > enough trailing bits to make the end of the field fall on = > an > octet boundary. Note that the value of trailing bits is > irrelevant. > > Total Path Attribute Length: > > This 2-octet unsigned integer indicates the total length of = > the > Path Attributes field in octets. Its value must allow the > length of the Network Layer Reachability field to be = > determined > as specified below. > > A value of 0 indicates that no Network Layer Reachability > > > > Expiration Date July 2002 =0C[Page = > 12] > > > > > > RFC DRAFT January = > 2002 > > > Information field is present in this UPDATE message. > > Path Attributes: > > A variable length sequence of path attributes is present in > every UPDATE. Each path attribute is a triple <attribute type, > attribute length, attribute value> of variable length. > > Attribute Type is a two-octet field that consists of the > Attribute Flags octet followed by the Attribute Type Code > octet. > > > > > 0 1 > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | Attr. Flags |Attr. Type Code| > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > The high-order bit (bit 0) of the Attribute Flags octet is the > Optional bit. It defines whether the attribute is optional (if > set to 1) or well-known (if set to 0). > > The second high-order bit (bit 1) of the Attribute Flags octet > is the Transitive bit. It defines whether an optional = > attribute > is transitive (if set to 1) or non-transitive (if set to 0). > For well-known attributes, the Transitive bit must be set to = > 1. > (See Section 5 for a discussion of transitive attributes.) > > The third high-order bit (bit 2) of the Attribute Flags octet > is the Partial bit. It defines whether the information > contained in the optional transitive attribute is partial (if > set to 1) or complete (if set to 0). For well-known attributes > and for optional non-transitive attributes the Partial bit = > must > be set to 0. > > The fourth high-order bit (bit 3) of the Attribute Flags octet > is the Extended Length bit. It defines whether the Attribute > Length is one octet (if set to 0) or two octets (if set to 1). > > The lower-order four bits of the Attribute Flags octet are > unused. They must be zero when sent and must be ignored when > received. > > The Attribute Type Code octet contains the Attribute Type = > Code. > > > > Expiration Date July 2002 =0C[Page = > 13] > > > > > > RFC DRAFT January = > 2002 > > > Currently defined Attribute Type Codes are discussed in = > Section > 5. > > If the Extended Length bit of the Attribute Flags octet is set > to 0, the third octet of the Path Attribute contains the = > length > of the attribute data in octets. > > If the Extended Length bit of the Attribute Flags octet is set > to 1, then the third and the fourth octets of the path > attribute contain the length of the attribute data in octets. > > The remaining octets of the Path Attribute represent the > attribute value and are interpreted according to the Attribute > Flags and the Attribute Type Code. The supported Attribute = > Type > Codes, their attribute values and uses are the following: > > a) ORIGIN (Type Code 1): > > ORIGIN is a well-known mandatory attribute that defines the > origin of the path information. The data octet can assume > the following values: > > Value Meaning > > 0 IGP - Network Layer Reachability = > Information > is interior to the originating AS > > 1 EGP - Network Layer Reachability = > Information > learned via the EGP protocol > > 2 INCOMPLETE - Network Layer Reachability > Information learned by some other means > > Its usage is defined in 5.1.1 > > b) AS_PATH (Type Code 2): > > AS_PATH is a well-known mandatory attribute that is = > composed > of a sequence of AS path segments. Each AS path segment is > represented by a triple <path segment type, path segment > length, path segment value>. > > The path segment type is a 1-octet long field with the > following values defined: > > Value Segment Type > > 1 AS_SET: unordered set of ASs a route in the > > > > Expiration Date July 2002 =0C[Page = > 14] > > > > > > RFC DRAFT January = > 2002 > > > UPDATE message has traversed > > 2 AS_SEQUENCE: ordered set of ASs a route in > the UPDATE message has traversed > > The path segment length is a 1-octet long field containing > the number of ASs in the path segment value field. > > The path segment value field contains one or more AS > numbers, each encoded as a 2-octets long field. > > Usage of this attribute is defined in 5.1.2. > > c) NEXT_HOP (Type Code 3): > > This is a well-known mandatory attribute that defines the = > IP > address of the border router that should be used as the = > next > hop to the destinations listed in the Network Layer > Reachability Information field of the UPDATE message. > > Usage of this attribute is defined in 5.1.3. > > > d) MULTI_EXIT_DISC (Type Code 4): > > This is an optional non-transitive attribute that is a four > octet non-negative integer. The value of this attribute may > be used by a BGP speaker's decision process to discriminate > among multiple entry points to a neighboring autonomous > system. > > Its usage is defined in 5.1.4. > > e) LOCAL_PREF (Type Code 5): > > LOCAL_PREF is a well-known attribute that is a four octet > non-negative integer. A BGP speaker uses it to inform other > internal peers of the advertising speaker's degree of > preference for an advertised route. Usage of this attribute > is described in 5.1.5. > > f) ATOMIC_AGGREGATE (Type Code 6) > > ATOMIC_AGGREGATE is a well-known discretionary attribute of > length 0. Usage of this attribute is described in 5.1.6. > > g) AGGREGATOR (Type Code 7) > > > > > Expiration Date July 2002 =0C[Page = > 15] > > > > > > RFC DRAFT January = > 2002 > > > AGGREGATOR is an optional transitive attribute of length 6. > The attribute contains the last AS number that formed the > aggregate route (encoded as 2 octets), followed by the IP > address of the BGP speaker that formed the aggregate route > (encoded as 4 octets). This should be the same address as > the one used for the BGP Identifier of the speaker. Usage > of this attribute is described in 5.1.7. > > Network Layer Reachability Information: > > This variable length field contains a list of IP address > prefixes. The length in octets of the Network Layer > Reachability Information is not encoded explicitly, but can be > calculated as: > > UPDATE message Length - 23 - Total Path Attributes Length - > Withdrawn Routes Length > > where UPDATE message Length is the value encoded in the fixed- > size BGP header, Total Path Attribute Length and Withdrawn > Routes Length are the values encoded in the variable part of > the UPDATE message, and 23 is a combined length of the fixed- > size BGP header, the Total Path Attribute Length field and the > Withdrawn Routes Length field. > > Reachability information is encoded as one or more 2-tuples of > the form <length, prefix>, whose fields are described below: > > > +---------------------------+ > | Length (1 octet) | > +---------------------------+ > | Prefix (variable) | > +---------------------------+ > > > The use and the meaning of these fields are as follows: > > a) Length: > > The Length field indicates the length in bits of the IP > address prefix. A length of zero indicates a prefix that > matches all IP addresses (with prefix, itself, of zero > octets). > > b) Prefix: > > The Prefix field contains IP address prefixes followed by > > > > Expiration Date July 2002 =0C[Page = > 16] > > > > > > RFC DRAFT January = > 2002 > > > enough trailing bits to make the end of the field fall on = > an > octet boundary. Note that the value of the trailing bits is > irrelevant. > > The minimum length of the UPDATE message is 23 octets -- 19 octets > for the fixed header + 2 octets for the Withdrawn Routes Length + 2 > octets for the Total Path Attribute Length (the value of Withdrawn > Routes Length is 0 and the value of Total Path Attribute Length is > 0). > > An UPDATE message can advertise at most one set of path attributes, > but multiple destinations, provided that the destinations share = > these > attributes. All path attributes contained in a given UPDATE message > apply to all destinations carried in the NLRI field of the UPDATE > message. > > An UPDATE message can list multiple routes to be withdrawn from > service. Each such route is identified by its destination = > (expressed > as an IP prefix), which unambiguously identifies the route in the > context of the BGP speaker - BGP speaker connection to which it has > been previously advertised. > > An UPDATE message might advertise only routes to be withdrawn from > service, in which case it will not include path attributes or = > Network > Layer Reachability Information. Conversely, it may advertise only a > feasible route, in which case the WITHDRAWN ROUTES field need not be > present. > > An UPDATE message should not include the same address prefix in the > WITHDRAWN ROUTES and Network Layer Reachability Information fields, > however a BGP speaker MUST be able to process UPDATE messages in = > this > form. A BGP speaker should treat an UPDATE message of this form as = > if > the WITHDRAWN ROUTES doesn't contain the address prefix. > > > 4.4 KEEPALIVE Message Format > > > BGP does not use any transport protocol-based keep-alive mechanism = > to > determine if peers are reachable. Instead, KEEPALIVE messages are > exchanged between peers often enough as not to cause the Hold Timer > to expire. A reasonable maximum time between KEEPALIVE messages = > would > be one third of the Hold Time interval. KEEPALIVE messages MUST NOT > be sent more frequently than one per second. An implementation MAY > adjust the rate at which it sends KEEPALIVE messages as a function = > of > the Hold Time interval. > > If the negotiated Hold Time interval is zero, then periodic = > KEEPALIVE > > > > Expiration Date July 2002 =0C[Page = > 17] > > > > > > RFC DRAFT January = > 2002 > > > messages MUST NOT be sent. > > KEEPALIVE message consists of only message header and has a length = > of > 19 octets. > > > 4.5 NOTIFICATION Message Format > > > A NOTIFICATION message is sent when an error condition is detected. > The BGP connection is closed immediately after sending it. > > In addition to the fixed-size BGP header, the NOTIFICATION message > contains the following fields: > > > 0 1 2 3 > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > = > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > | Error code | Error subcode | Data (variable) = > | > = > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > > > Error Code: > > This 1-octet unsigned integer indicates the type of > NOTIFICATION. The following Error Codes have been defined: > > Error Code Symbolic Name Reference > > 1 Message Header Error Section 6.1 > > 2 OPEN Message Error Section 6.2 > > 3 UPDATE Message Error Section 6.3 > > 4 Hold Timer Expired Section 6.5 > > 5 Finite State Machine Error Section 6.6 > > 6 Cease Section 6.7 > > > Error subcode: > > This 1-octet unsigned integer provides more specific > information about the nature of the reported error. Each = > Error > > > > Expiration Date July 2002 =0C[Page = > 18] > > > > > > RFC DRAFT January = > 2002 > > > Code may have one or more Error Subcodes associated with it. = > If > no appropriate Error Subcode is defined, then a zero > (Unspecific) value is used for the Error Subcode field. > > Message Header Error subcodes: > > 1 - Connection Not Synchronized. > 2 - Bad Message Length. > 3 - Bad Message Type. > > OPEN Message Error subcodes: > > 1 - Unsupported Version Number. > 2 - Bad Peer AS. > 3 - Bad BGP Identifier. > 4 - Unsupported Optional Parameter. > 5 - Authentication Failure. > 6 - Unacceptable Hold Time. > > UPDATE Message Error subcodes: > > 1 - Malformed Attribute List. > 2 - Unrecognized Well-known Attribute. > 3 - Missing Well-known Attribute. > 4 - Attribute Flags Error. > 5 - Attribute Length Error. > 6 - Invalid ORIGIN Attribute > 8 - Invalid NEXT_HOP Attribute. > 9 - Optional Attribute Error. > 10 - Invalid Network Field. > 11 - Malformed AS_PATH. > > > Data: > > This variable-length field is used to diagnose the reason for > the NOTIFICATION. The contents of the Data field depend upon > the Error Code and Error Subcode. See Section 6 below for more > details. > > Note that the length of the Data field can be determined from > the message Length field by the formula: > > Message Length =3D 21 + Data Length > > > The minimum length of the NOTIFICATION message is 21 octets > (including message header). > > > > Expiration Date July 2002 =0C[Page = > 19] > > > > > > RFC DRAFT January = > 2002 > > > 5. Path Attributes > > > This section discusses the path attributes of the UPDATE message. > > Path attributes fall into four separate categories: > > 1. Well-known mandatory. > 2. Well-known discretionary. > 3. Optional transitive. > 4. Optional non-transitive. > > Well-known attributes must be recognized by all BGP implementations. > Some of these attributes are mandatory and must be included in every > UPDATE message that contains NLRI. Others are discretionary and may > or may not be sent in a particular UPDATE message. > > All well-known attributes must be passed along (after proper > updating, if necessary) to other BGP peers. > > In addition to well-known attributes, each path may contain one or > more optional attributes. It is not required or expected that all = > BGP > implementations support all optional attributes. The handling of an > unrecognized optional attribute is determined by the setting of the > Transitive bit in the attribute flags octet. Paths with unrecognized > transitive optional attributes should be accepted. If a path with > unrecognized transitive optional attribute is accepted and passed > along to other BGP peers, then the unrecognized transitive optional > attribute of that path must be passed along with the path to other > BGP peers with the Partial bit in the Attribute Flags octet set to = > 1. > If a path with recognized transitive optional attribute is accepted > and passed along to other BGP peers and the Partial bit in the > Attribute Flags octet is set to 1 by some previous AS, it is not set > back to 0 by the current AS. Unrecognized non-transitive optional > attributes must be quietly ignored and not passed along to other BGP > peers. > > New transitive optional attributes may be attached to the path by = > the > originator or by any other BGP speaker in the path. If they are not > attached by the originator, the Partial bit in the Attribute Flags > octet is set to 1. The rules for attaching new non-transitive > optional attributes will depend on the nature of the specific > attribute. The documentation of each new non-transitive optional > attribute will be expected to include such rules. (The description = > of > the MULTI_EXIT_DISC attribute gives an example.) All optional > attributes (both transitive and non-transitive) may be updated (if > appropriate) by BGP speakers in the path. > > > > Expiration Date July 2002 =0C[Page = > 20] > > > > > > RFC DRAFT January = > 2002 > > > The sender of an UPDATE message should order path attributes within > the UPDATE message in ascending order of attribute type. The = > receiver > of an UPDATE message must be prepared to handle path attributes > within the UPDATE message that are out of order. > > The same attribute cannot appear more than once within the Path > Attributes field of a particular UPDATE message. > > The mandatory category refers to an attribute which must be present > in both IBGP and EBGP exchanges if NLRI are contained in the UPDATE > message. Attributes classified as optional for the purpose of the > protocol extension mechanism may be purely discretionary, or > discretionary, required, or disallowed in certain contexts. > > attribute EBGP IBGP > ORIGIN mandatory mandatory > AS_PATH mandatory mandatory > NEXT_HOP mandatory mandatory > MULTI_EXIT_DISC discretionary discretionary > LOCAL_PREF disallowed required > ATOMIC_AGGREGATE see section 5.1.6 and 9.1.4 > AGGREGATOR discretionary discretionary > > > > > 5.1 Path Attribute Usage > > > The usage of each BGP path attributes is described in the following > clauses. > > > > 5.1.1 ORIGIN > > > ORIGIN is a well-known mandatory attribute. The ORIGIN attribute > shall be generated by the autonomous system that originates the > associated routing information. It shall be included in the UPDATE > messages of all BGP speakers that choose to propagate this > information to other BGP speakers. > > > 5.1.2 AS_PATH > > > AS_PATH is a well-known mandatory attribute. This attribute > > > > Expiration Date July 2002 =0C[Page = > 21] > > > > > > RFC DRAFT January = > 2002 > > > identifies the autonomous systems through which routing information > carried in this UPDATE message has passed. The components of this > list can be AS_SETs or AS_SEQUENCEs. > > When a BGP speaker propagates a route which it has learned from > another BGP speaker's UPDATE message, it shall modify the route's > AS_PATH attribute based on the location of the BGP speaker to which > the route will be sent: > > a) When a given BGP speaker advertises the route to an internal > peer, the advertising speaker shall not modify the AS_PATH > attribute associated with the route. > > b) When a given BGP speaker advertises the route to an external > peer, then the advertising speaker shall update the AS_PATH > attribute as follows: > > 1) if the first path segment of the AS_PATH is of type > AS_SEQUENCE, the local system shall prepend its own AS number > as the last element of the sequence (put it in the leftmost > position). If the act of prepending will cause an overflow in > the AS_PATH segment, i.e. more than 255 elements, it shall be > legal to prepend a new segment of type AS_SEQUENCE and prepend > its own AS number to this new segment. > > 2) if the first path segment of the AS_PATH is of type AS_SET, > the local system shall prepend a new path segment of type > AS_SEQUENCE to the AS_PATH, including its own AS number in = > that > segment. > > When a BGP speaker originates a route then: > > a) the originating speaker shall include its own AS number in a > path segment of type AS_SEQUENCE in the AS_PATH attribute of all > UPDATE messages sent to an external peer. (In this case, the AS > number of the originating speaker's autonomous system will be the > only entry the path segment, and this path segment will be the > only segment in the AS_PATH attribute). > > b) the originating speaker shall include an empty AS_PATH > attribute in all UPDATE messages sent to internal peers. (An > empty AS_PATH attribute is one whose length field contains the > value zero). > > Whenever the modification of the AS_PATH attribute calls for > including or prepending the AS number of the local system, the local > system may include/prepend more than one instance of its own AS > number in the AS_PATH attribute. This is controlled via local > > > > Expiration Date July 2002 =0C[Page = > 22] > > > > > > RFC DRAFT January = > 2002 > > > configuration. > > > 5.1.3 NEXT_HOP > > > > The NEXT_HOP path attribute defines the IP address of the border > router that should be used as the next hop to the destinations = > listed > in the UPDATE message. The NEXT_HOP attribute is calculated as > follows. > > 1) When sending a message to an internal peer, the BGP speaker > should not modify the NEXT_HOP attribute, unless it has been > explicitly configured to announce its own IP address as the > NEXT_HOP. > > 2) When sending a message to an external peer X, and the peer is > one IP hop away from the speaker: > > - If the route being announced was learned from an internal > peer or is locally originated, the BGP speaker can use for the > NEXT_HOP attribute an interface address of the internal peer > router (or the internal router) through which the announced > network is reachable for the speaker, provided that peer X > shares a common subnet with this address. This is a form of > "third party" NEXT_HOP attribute. > > - If the route being announced was learned from an external > peer, the speaker can use in the NEXT_HOP attribute an IP > address of any adjacent router (known from the received > NEXT_HOP attribute) that the speaker itself uses for local > route calculation, provided that peer X shares a common subnet > with this address. This is a second form of "third party" > NEXT_HOP attribute. > > - If the external peer to which the route is being advertised > shares a common subnet with one of the announcing router's own > interfaces, the router may use the IP address associated with > such an interface in the NEXT_HOP attribute. This is known as = > a > "first party" NEXT_HOP attribute. > > - By default (if none of the above conditions apply), the BGP > speaker should use in the NEXT_HOP attribute the IP address of > the interface that the speaker uses to establish the BGP > session to peer X. > > 3) When sending a message to an external peer X, and the peer is > > > > Expiration Date July 2002 =0C[Page = > 23] > > > > > > RFC DRAFT January = > 2002 > > > multiple IP hops away from the speaker (aka "multihop EBGP"): > > - The speaker may be configured to propagate the NEXT_HOP > attribute. In this case when advertising a route that the > speaker learned from one of its peers, the NEXT_HOP attribute > of the advertised route is exactly the same as the NEXT_HOP > attribute of the learned route (the speaker just doesn't = > modify > the NEXT_HOP attribute). > > - By default, the BGP speaker should use in the NEXT_HOP > attribute the IP address of the interface that the speaker = > uses > to establish the BGP session to peer X. > > Normally the NEXT_HOP attribute is chosen such that the shortest > available path will be taken. A BGP speaker must be able to support > disabling advertisement of third party NEXT_HOP attributes to handle > imperfectly bridged media. > > A BGP speaker must never advertise an address of a peer to that peer > as a NEXT_HOP, for a route that the speaker is originating. A BGP > speaker must never install a route with itself as the next hop. > > The NEXT_HOP attribute is used by the BGP speaker to determine the > actual outbound interface and immediate next-hop address that should > be used to forward transit packets to the associated destinations. > The immediate next-hop address is determined by performing a > recursive route lookup operation for the IP address in the NEXT_HOP > attribute using the contents of the Routing Table (see Section > 9.1.2.2). The resolving route will always specify the outbound > interface. If the resolving route specifies the next-hop address, > this address should be used as the immediate address for packet > forwarding. If the address in the NEXT_HOP attribute is directly > resolved through a route to an attached subnet (such a route will = > not > specify the next-hop address), the outbound interface should be = > taken > from the resolving route and the address in the NEXT_HOP attribute > should be used as the immediate next-hop address. > > > 5.1.4 MULTI_EXIT_DISC > > > The MULTI_EXIT_DISC attribute may be used on external (inter-AS) > links to discriminate among multiple exit or entry points to the = > same > neighboring AS. The value of the MULTI_EXIT_DISC attribute is a four > octet unsigned number which is called a metric. All other factors > being equal, the exit point with lower metric should be preferred. = > If > received over external links, the MULTI_EXIT_DISC attribute MAY be > propagated over internal links to other BGP speakers within the same > > > > Expiration Date July 2002 =0C[Page = > 24] > > > > > > RFC DRAFT January = > 2002 > > > AS. The MULTI_EXIT_DISC attribute received from a neighboring AS = MUST > NOT be propagated to other neighboring ASs. > > A BGP speaker MUST IMPLEMENT a mechanism based on local = > configuration > which allows the MULTI_EXIT_DISC attribute to be removed from a > route. This MAY be done prior to determining the degree of = > preference > of the route and performing route selection (decision process phases > 1 and 2). > > An implementation MAY also (based on local configuration) alter the > value of the MULTI_EXIT_DISC attribute received over an external > link. If it does so, it shall do so prior to determining the degree > of preference of the route and performing route selection (decision > process phases 1 and 2). > > > 5.1.5 LOCAL_PREF > > > LOCAL_PREF is a well-known attribute that SHALL be included in all > UPDATE messages that a given BGP speaker sends to the other internal > peers. A BGP speaker SHALL calculate the degree of preference for > each external route based on the locally configured policy, and > include the degree of preference when advertising a route to its > internal peers. The higher degree of preference MUST be preferred. = > A > BGP speaker shall use the degree of preference learned via = > LOCAL_PREF > in its decision process (see section 9.1.1). > > A BGP speaker MUST NOT include this attribute in UPDATE messages = > that > it sends to external peers, except for the case of BGP = > Confederations > [13]. If it is contained in an UPDATE message that is received from > an external peer, then this attribute MUST be ignored by the > receiving speaker, except for the case of BGP Confederations [13]. > > > 5.1.6 ATOMIC_AGGREGATE > > > ATOMIC_AGGREGATE is a well-known discretionary attribute. > > When a router aggregates several routes for the purpose of > advertisement to a particular peer, and the AS_PATH of the = > aggregated > route excludes at least some of the AS numbers present in the = > AS_PATH > of the routes that are aggregated, the aggregated route, when > advertised to the peer, MUST include the ATOMIC_AGGREGATE attribute. > > A BGP speaker that receives a route with the ATOMIC_AGGREGATE > attribute MUST NOT remove the attribute from the route when > > > > Expiration Date July 2002 =0C[Page = > 25] > > > > > > RFC DRAFT January = > 2002 > > > propagating it to other speakers. > > A BGP speaker that receives a route with the ATOMIC_AGGREGATE > attribute MUST NOT make any NLRI of that route more specific (as > defined in 9.1.4) when advertising this route to other BGP speakers. > A BGP speaker that receives a route with the ATOMIC_AGGREGATE > attribute needs to be cognizant of the fact that the actual path to > destinations, as specified in the NLRI of the route, while having = > the > loop-free property, may not be the path specified in the AS_PATH > attribute of the route. > > > 5.1.7 AGGREGATOR > > > AGGREGATOR is an optional transitive attribute which may be included > in updates which are formed by aggregation (see Section 9.2.2.2). A > BGP speaker which performs route aggregation may add the AGGREGATOR > attribute which shall contain its own AS number and IP address. The > IP address should be the same as the BGP Identifier of the speaker. > > > 6. BGP Error Handling. > > > This section describes actions to be taken when errors are detected > while processing BGP messages. > > When any of the conditions described here are detected, a > NOTIFICATION message with the indicated Error Code, Error Subcode, > and Data fields is sent, and the BGP connection is closed. If no > Error Subcode is specified, then a zero must be used. > > The phrase "the BGP connection is closed" means that the transport > protocol connection has been closed, the associated Adj-RIB-In has > been cleared, and that all resources for that BGP connection have > been deallocated. Entries in the Loc-RIB associated with the remote > peer are marked as invalid. The fact that the routes have become > invalid is passed to other BGP peers before the routes are deleted > from the system. > > Unless specified explicitly, the Data field of the NOTIFICATION > message that is sent to indicate an error is empty. > > > > > > > > Expiration Date July 2002 =0C[Page = > 26] > > > > > > RFC DRAFT January = > 2002 > > > 6.1 Message Header error handling. > > > All errors detected while processing the Message Header are = > indicated > by sending the NOTIFICATION message with Error Code Message Header > Error. The Error Subcode elaborates on the specific nature of the > error. > > The expected value of the Marker field of the message header is all > ones if the message type is OPEN. The expected value of the Marker > field for all other types of BGP messages determined based on the > presence of the Authentication Information Optional Parameter in the > BGP OPEN message and the actual authentication mechanism (if the > Authentication Information in the BGP OPEN message is present). The > Marker field should be all ones if the OPEN message carried no > authentication information. If the Marker field of the message = > header > is not the expected one, then a synchronization error has occurred > and the Error Subcode is set to Connection Not Synchronized. > > If the Length field of the message header is less than 19 or greater > than 4096, or if the Length field of an OPEN message is less than = > the > minimum length of the OPEN message, or if the Length field of an > UPDATE message is less than the minimum length of the UPDATE = > message, > or if the Length field of a KEEPALIVE message is not equal to 19, or > if the Length field of a NOTIFICATION message is less than the > minimum length of the NOTIFICATION message, then the Error Subcode = > is > set to Bad Message Length. The Data field contains the erroneous > Length field. > > If the Type field of the message header is not recognized, then the > Error Subcode is set to Bad Message Type. The Data field contains = > the > erroneous Type field. > > > 6.2 OPEN message error handling. > > > All errors detected while processing the OPEN message are indicated > by sending the NOTIFICATION message with Error Code OPEN Message > Error. The Error Subcode elaborates on the specific nature of the > error. > > If the version number contained in the Version field of the received > OPEN message is not supported, then the Error Subcode is set to > Unsupported Version Number. The Data field is a 2-octets unsigned > integer, which indicates the largest locally supported version = > number > less than the version the remote BGP peer bid (as indicated in the > received OPEN message), or if the smallest locally supported version > > > > Expiration Date July 2002 =0C[Page = > 27] > > > > > > RFC DRAFT January = > 2002 > > > number is greater than the version the remote BGP peer bid, then the > smallest locally supported version number. > > If the Autonomous System field of the OPEN message is unacceptable, > then the Error Subcode is set to Bad Peer AS. The determination of > acceptable Autonomous System numbers is outside the scope of this > protocol. > > If the Hold Time field of the OPEN message is unacceptable, then the > Error Subcode MUST be set to Unacceptable Hold Time. An > implementation MUST reject Hold Time values of one or two seconds. > An implementation MAY reject any proposed Hold Time. An > implementation which accepts a Hold Time MUST use the negotiated > value for the Hold Time. > > If the BGP Identifier field of the OPEN message is syntactically > incorrect, then the Error Subcode is set to Bad BGP Identifier. > Syntactic correctness means that the BGP Identifier field represents > a valid IP host address. > > If one of the Optional Parameters in the OPEN message is not > recognized, then the Error Subcode is set to Unsupported Optional > Parameters. > > If one of the Optional Parameters in the OPEN message is recognized, > but is malformed, then the Error Subcode is set to 0 (Unspecific). > > > If the OPEN message carries Authentication Information (as an > Optional Parameter), then the corresponding authentication procedure > is invoked. If the authentication procedure (based on Authentication > Code and Authentication Data) fails, then the Error Subcode is set = > to > Authentication Failure. > > > > 6.3 UPDATE message error handling. > > > All errors detected while processing the UPDATE message are = > indicated > by sending the NOTIFICATION message with Error Code UPDATE Message > Error. The error subcode elaborates on the specific nature of the > error. > > Error checking of an UPDATE message begins by examining the path > attributes. If the Withdrawn Routes Length or Total Attribute Length > is too large (i.e., if Withdrawn Routes Length + Total Attribute > Length + 23 exceeds the message Length), then the Error Subcode is > > > > Expiration Date July 2002 =0C[Page = > 28] > > > > > > RFC DRAFT January = > 2002 > > > set to Malformed Attribute List. > > If any recognized attribute has Attribute Flags that conflict with > the Attribute Type Code, then the Error Subcode is set to Attribute > Flags Error. The Data field contains the erroneous attribute (type, > length and value). > > If any recognized attribute has Attribute Length that conflicts with > the expected length (based on the attribute type code), then the > Error Subcode is set to Attribute Length Error. The Data field > contains the erroneous attribute (type, length and value). > > If any of the mandatory well-known attributes are not present, then > the Error Subcode is set to Missing Well-known Attribute. The Data > field contains the Attribute Type Code of the missing well-known > attribute. > > If any of the mandatory well-known attributes are not recognized, > then the Error Subcode is set to Unrecognized Well-known Attribute. > The Data field contains the unrecognized attribute (type, length and > value). > > If the ORIGIN attribute has an undefined value, then the Error > Subcode is set to Invalid Origin Attribute. The Data field contains > the unrecognized attribute (type, length and value). > > If the NEXT_HOP attribute field is syntactically incorrect, then the > Error Subcode is set to Invalid NEXT_HOP Attribute. The Data field > contains the incorrect attribute (type, length and value). = > Syntactic > correctness means that the NEXT_HOP attribute represents a valid IP > host address. Semantic correctness applies only to the external BGP > links, and only when the sender and the receiving speaker are one IP > hop away from each other. To be semantically correct, the IP address > in the NEXT_HOP must not be the IP address of the receiving speaker, > and the NEXT_HOP IP address must either be the sender's IP address > (used to establish the BGP session), or the interface associated = > with > the NEXT_HOP IP address must share a common subnet with the = > receiving > BGP speaker. If the NEXT_HOP attribute is semantically incorrect, = > the > error should be logged, and the route should be ignored. In this > case, no NOTIFICATION message should be sent. > > The AS_PATH attribute is checked for syntactic correctness. If the > path is syntactically incorrect, then the Error Subcode is set to > Malformed AS_PATH. > > > The information carried by the AS_PATH attribute is checked for AS > loops. AS loop detection is done by scanning the full AS path (as > > > > Expiration Date July 2002 =0C[Page = > 29] > > > > > > RFC DRAFT January = > 2002 > > > specified in the AS_PATH attribute), and checking that the = > autonomous > system number of the local system does not appear in the AS path. If > the autonomous system number appears in the AS path the route may be > stored in the Adj-RIB-In, but unless the router is configured to > accept routes with its own autonomous system in the AS path, the > route shall not be passed to the BGP Decision Process. Operations = > of > a router that is configured to accept routes with its own autonomous > system number in the AS path are outside the scope of this document. > > If an optional attribute is recognized, then the value of this > attribute is checked. If an error is detected, the attribute is > discarded, and the Error Subcode is set to Optional Attribute Error. > The Data field contains the attribute (type, length and value). > > If any attribute appears more than once in the UPDATE message, then > the Error Subcode is set to Malformed Attribute List. > > The NLRI field in the UPDATE message is checked for syntactic > validity. If the field is syntactically incorrect, then the Error > Subcode is set to Invalid Network Field. > > If a prefix in the NLRI field is semantically incorrect (e.g., an > unexpected multicast IP address), an error should be logged locally, > and the prefix should be ignored. > > An UPDATE message that contains correct path attributes, but no = > NLRI, > shall be treated as a valid UPDATE message. > > > 6.4 NOTIFICATION message error handling. > > > If a peer sends a NOTIFICATION message, and there is an error in = > that > message, there is unfortunately no means of reporting this error via > a subsequent NOTIFICATION message. Any such error, such as an > unrecognized Error Code or Error Subcode, should be noticed, logged > locally, and brought to the attention of the administration of the > peer. The means to do this, however, lies outside the scope of this > document. > > > 6.5 Hold Timer Expired error handling. > > > If a system does not receive successive KEEPALIVE and/or UPDATE > and/or NOTIFICATION messages within the period specified in the Hold > Time field of the OPEN message, then the NOTIFICATION message with > Hold Timer Expired Error Code must be sent and the BGP connection > > > > Expiration Date July 2002 =0C[Page = > 30] > > > > > > RFC DRAFT January = > 2002 > > > closed. > > > 6.6 Finite State Machine error handling. > > > Any error detected by the BGP Finite State Machine (e.g., receipt of > an unexpected event) is indicated by sending the NOTIFICATION = > message > with Error Code Finite State Machine Error. > > > 6.7 Cease. > > > In absence of any fatal errors (that are indicated in this section), > a BGP peer may choose at any given time to close its BGP connection > by sending the NOTIFICATION message with Error Code Cease. However, > the Cease NOTIFICATION message must not be used when a fatal error > indicated by this section does exist. > > A BGP speaker may support the ability to impose an (locally > configured) upper bound on the number of address prefixes the = > speaker > is willing to accept from a neighbor. When the upper bound is > reached, the speaker (under control of local configuration) may > either (a) discard new address prefixes from the neighbor, or (b) > terminate the BGP peering with the neighbor. If the BGP speaker > decides to terminate its peering with a neighbor because the number > of address prefixes received from the neighbor exceeds the locally > configured upper bound, then the speaker must send to the neighbor a > NOTIFICATION message with the Error Code Cease. > > > 6.8 Connection collision detection. > > > If a pair of BGP speakers try simultaneously to establish a BGP > connection to each other, then two parallel connections between this > pair of speakers might well be formed. If the source IP address used > by one of these connections is the same as the destination IP = > address > used by the other, and the destination IP address used by the first > connection is the same as the source IP address used by the other, = > we > refer to this situation as connection collision. Clearly in the > presence of connection collision, one of these connections must be > closed. > > Based on the value of the BGP Identifier a convention is established > for detecting which BGP connection is to be preserved when a > collision does occur. The convention is to compare the BGP > > > > Expiration Date July 2002 =0C[Page = > 31] > > > > > > RFC DRAFT January = > 2002 > > > Identifiers of the peers involved in the collision and to retain = > only > the connection initiated by the BGP speaker with the higher-valued > BGP Identifier. > > Upon receipt of an OPEN message, the local system must examine all = > of > its connections that are in the OpenConfirm state. A BGP speaker may > also examine connections in an OpenSent state if it knows the BGP > Identifier of the peer by means outside of the protocol. If among > these connections there is a connection to a remote BGP speaker = > whose > BGP Identifier equals the one in the OPEN message, and this > connection collides with the connection over which the OPEN message > is received then the local system performs the following collision > resolution procedure: > > > 1. The BGP Identifier of the local system is compared to the BGP > Identifier of the remote system (as specified in the OPEN > message). > > 2. If the value of the local BGP Identifier is less than the > remote one, the local system closes BGP connection that already > exists (the one that is already in the OpenConfirm state), and > accepts BGP connection initiated by the remote system. > > 3. Otherwise, the local system closes newly created BGP = > connection > (the one associated with the newly received OPEN message), and > continues to use the existing one (the one that is already in the > OpenConfirm state). > > Comparing BGP Identifiers is done by treating them as (4-octet > long) unsigned integers. > > Unless allowed via configuration, a connection collision with an > existing BGP connection that is in Established state causes > closing of the newly created connection. > > Note that a connection collision cannot be detected with > connections that are in Idle, or Connect, or Active states. > > Closing the BGP connection (that results from the collision > resolution procedure) is accomplished by sending the NOTIFICATION > message with the Error Code Cease. > > > 7. BGP Version Negotiation. > > > BGP speakers may negotiate the version of the protocol by making > > > > Expiration Date July 2002 =0C[Page = > 32] > > > > > > RFC DRAFT January = > 2002 > > > multiple attempts to open a BGP connection, starting with the = > highest > version number each supports. If an open attempt fails with an Error > Code OPEN Message Error, and an Error Subcode Unsupported Version > Number, then the BGP speaker has available the version number it > tried, the version number its peer tried, the version number passed > by its peer in the NOTIFICATION message, and the version numbers = > that > it supports. If the two peers do support one or more common = > versions, > then this will allow them to rapidly determine the highest common > version. In order to support BGP version negotiation, future = > versions > of BGP must retain the format of the OPEN and NOTIFICATION messages. > > > 8. BGP Finite State machine. > > > This section specifies BGP operation in terms of a Finite State > Machine (FSM). Following is a brief summary and overview of BGP > operations by state as determined by this FSM. > > Initially BGP is in the Idle state. > > Idle state: > > A manual start event is a start event initiated by an = > operator. > An automatic start event is a start event generated by the > system. > > In this state BGP refuses all incoming BGP connections. No > resources are allocated to the peer. In response to a Start > event (manual or automatic), the local system: > > - initializes all BGP resources, > > - starts the ConnectRetry timer, > > - initiates a transport connection to the other BGP peer, > > - listens for a connection that may be initiated by the > remote BGP peer, and > > - changes its state to connect. > > The exact value of the ConnectRetry timer is a local matter, > but it should be sufficiently large to allow TCP > initialization. > > Any other event received in the IDLE state, is ignored. > > > > > Expiration Date July 2002 =0C[Page = > 33] > > > > > > RFC DRAFT January = > 2002 > > > IdleHold state: > > The IdleHold state keeps the system in "Idle" mode until a > certain time period has passed or an operator intervenes to > manually restart the connection. This "IdleHold timeout" > prevents persistent flapping of a BGP peering session. > > Upon entering the Idle Hold state, if the IdleHoldTimer = > exceeds > the local limit the "Keep Idle" flag is set. > > Upon receiving a Manual start, the local system: > > - clears the IdleHoldtimer, > > - clears "keep Idle" flag > > - initializes all BGP resources, > > - starts the ConnectRetry timer, > > - initiates a transport connection to the other BGP peer, > > - listens for a connection that may be initiated by the > remote BGPPeer, and > > - changes its state to connect. > > Upon receiving a IdleHoldtimer expired event, the local system > checks to see that the Keep Idle flag is set. If the Keep = > Idle > flag is set, the system stays in the "Idle Hold" state. > > If the Keep Idle flag is not set, the local system: > > - clears the IdleHoldtimer, > > - and transitions the state to Idle. > > Getting out of the IdleHoldstate requires either operator > intervention via a manual start or the IdleHoldtimer to expire > with the "Keep Idle" flag to be clear. > > Any other event received in the IdleHold state is ignored. > > Connect State: > > In this state, BGP is waiting for the transport protocol > connection to be completed. > > > > > Expiration Date July 2002 =0C[Page = > 34] > > > > > > RFC DRAFT January = > 2002 > > > If the transport connection succeeds, the local system: > > - clears the ConnectRetry timer, > > - completes initialization, > > - send an Open message to its peer, > > - set Hold timer to a large value, and > > - changes its state to Open Sent. > > A hold timer value of 4 minutes is suggested. > > If the transport protocol connection fails (e.g., > retransmission timeout), the local system: > > - restarts the ConnectRetry timer, > > - continues to listen for a connection that may be = > initiated > by the remote BGP peer, and > > - changes its state to Active. > > In response to the ConnectRetry timer expired event, the local > system: > > - restarts the ConnectRetry timer, > > - initiates a transport connection to the other BGP peer, > > - continues to listen for a connection that may be = > initiated > by the remote BGP peer, and > > - stays in Connect state. > > The start event (manual or automatic) is ignored in the = > Connect > state. > > In response to any other event (initiated by the system or > operator), the local system: > > - IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increment ConnectRetryCnt by 1, > > - Set connect retry timer to zero, > > > > > Expiration Date July 2002 =0C[Page = > 35] > > > > > RFC DRAFT January = > 2002 > > > - Drops TCP connection, > > - Releases all BGP resources, and > > - Goes to IdleHoldstate > > Active State: > > In this state BGP is trying to acquire a peer by listening for > and accepting a transport protocol connection. > > If the transport connection succeeds, the local system: > > - clears the ConnectRetry timer, > > - completes the initialization, > > - sends the Open message to it's peer, > > - sets its Hold timer to a large value, > > - and changes its state to OpenSent. > > A Hold timer value of 4 minutes is suggested. > > In response the ConnectRetry timer expired event, the local > system: > > - restarts the ConnectRetry timer, > > - initiates a transport connection to the other BGP peer, > > - continues to listen for connection that may be initiated > by remote BGP peer, > > - and changes its state to Connect. > > If the local system does not allow BGP connections with > unconfigured peers, then the local system: > > - rejects connections from IP addresses that are not > configured peers, > > - and remains in the Active state. > > The start events (initiated by the system or operator) are > ignored in the Active state. > > > > > Expiration Date July 2002 =0C[Page = > 36] > > > > > > RFC DRAFT January = > 2002 > > > In response to any other event (initiated by the system or > operator), the local system: > > - IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increment ConnectRetryCnt by 1, > > - Set connect retry timer to zero, and > > - Drops TCP connection, > > - Releases all BGP resources, > > - Goes to IdleHold state. > > Open Sent: > > In this state BGP waits for an Open Message from its peer. > When an OPEN message is received, all fields are check for > correctness. If the BGP message header checking or OPEN > message check detects an error (see Section 6.2), or a > connection collision (see Section 6.8) the local system: > > - sends a NOTIFICATION message > > - IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increment ConnectRetryCnt by 1, > > - Set connect retry timer to zero, and > > - Drops TCP connection, > > - Releases all BGP resources, > > - Goes to IdleHold state. > > If there are no errors in the OPEN message, the local system: > > - sends a KEEPALIVE message and > > - sets a KeepAlive timer (via the text below) > > - set the Hold timer according to the negotiated value (see > section 4.2), > > - set the state to Open Confirm. > > > > > Expiration Date July 2002 =0C[Page = > 37] > > > > > > RFC DRAFT January = > 2002 > > > If the negotiated Hold time value is zero, then the Hold Time > timer and KeepAlive timers are not started. If the value of > the Autonomous System field is the same as the local = > Autonomous > System number, then the connection is an "internal" = > connection; > otherwise, it is an "external" connection. (This will impact > UPDATE processing as described below.) > > If a disconnect NOTIFICATION is received from the underlying > transport protocol, the local system: > > - closes the BGP connection, > > - restarts the Connect Retry timer, > > - and continues to listen for a connection that may be > initiated by the remote BGP peer, and goes into Active > state. > > If the Hold Timer expires, the local system: > > - send a NOTIFICATION message with error code Hold Timer > Expired, > > - IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increment ConnectRetryCnt by 1, > > - Set connect retry timer to zero, and > > - Drops TCP connection, > > - Releases all BGP resources, and > > - Goes to IdleHold state. > > The Start event (manual and automatic) is ignored in the > OpenSent state. > > If a NOTIFICATION message is received with a version error, = > the > local system: > > - Closes the transport connection > > - Releases BGP resources, > > - ConnectRetryCnt =3D 0, > > - Connect retry timer =3D 0, and > > > > Expiration Date July 2002 =0C[Page = > 38] > > > > > > RFC DRAFT January = > 2002 > > > - transition to Idle state. > > If any other NOTIFICATION is received, the local system: > > - IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increment ConnectRetryCnt by 1, > > - Set connect retry timer to zero, and > > - Drops TCP connection, > > - Releases all BGP resources, > > - Goes to IdleHold state. > > In response to any other event, the local system: > > - sends the NOTFICATION message with Error Code Finite = > State > Machine Error, > > - IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increment ConnectRetryCnt by 1, > > - Set connect retry timer to zero, > > - Drops TCP connection, > > - Releases all BGP resources, and > > - Goes to IdleHold state. > > Open Confirm State > > In this state BGP waits for a KEEPALIVE or NOTIFICATION > message. > > If the local system receives a KEEPALIVE message, it changes > its state to Established. > > If the Hold Timer expires before a KEEPALIVE message is > received, the local system: > > - send the NOTIFICATION message with the error code Hold > Timer Expired, > > - sets IdleHoldTimer =3D 2**(ConnectRetryCnt)*60 > > > > Expiration Date July 2002 =0C[Page = > 39] > > > > > > RFC DRAFT January = > 2002 > > > - Increments ConnectRetryCnt by 1, > > - Sets the connect retry timer to zero, > > - Drop the TCP connection, > > - Releases all BGP resources, > > - Goes to IdleHoldState. > > If the local system receives a NOTIFICATION message or = > receives > a disconnect NOTIFICATION from the underlying transport > protocol, the local system: > > - Sets IdleHold Timer =3D 2**(ConnectRetryCnt)*60 > > - Increments ConnectRetryCnt by 1, > > - Sets the connect retry timer to zero, > > - Drops the TCP connection, > > - Releases all BGP resources, > > - Goes to IdleHoldstate. > > In response to the Stop event initiated by the system, the > local system: > > - sends the NOTIFICATION message with Cease, > > - sets IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increments ConnectRetryCnt by 1, > > - Sets the Connect retry timer to zero, > > - Drops the TCP connection, > > - Releases all BGP resources, > > - Goes to IdleHoldstate. > > > In response to a Stop event initiated by the operator, the > local system: > > - sends the NOTIFICATION message with Cease, > > > > Expiration Date July 2002 =0C[Page = > 40] > > > > > > RFC DRAFT January = > 2002 > > > - releases all BGP resources > > - sets the ConnectRetryCnt to zero > > - sets the connect retry timer to 0 > > - transitions to Idle state. > > The Start event is ignored in the OpenConfirm state. > > In response to any other event, the local system: > > - sends a NOTIFICATION with a code of Finite State Machine > Error, > > - sets IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increments ConnectRetryCnt by 1, > > - Sets the Connect retry timer to zero, > > - Drops the TCP connection, > > - Releases all BGP resources, > > - Goes to IdleHoldstate. > > Established State: > > In the Established state BGP can exchange UPDATE, NOTFICATION, > and KEEPALIVE messages with its peer. > > If the local system receives an UPDATE or KEEPALIVE message, = > it > restarts its Hold Timer, if the negotiated Hold Time value is > non-zero. > > If the local system receives a NOTIFICATION message or a > disconnect from the underlying transport protocol, it: > > - sets IdleHoldtimer =3D 2**(ConnectRetryCnt)*60, > > - Increments ConnectRetryCnt by 1, > > - Sets the Connect retry timer to zero, > > - Drops the TCP connection, > > - Releases all BGP resources, and > > > > Expiration Date July 2002 =0C[Page = > 41] > > > > > > RFC DRAFT January = > 2002 > > > - Goes to IdleHoldstate. > > If the local system receives an UPDATE message, and the Update > message error handling procedure (see Section 6.3) detecs an > error, the local system: > > - sends a NOTIFICATION message with Update error, > > - sets IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increments ConnectRetryCnt by 1, > > - Sets the Connect retry timer to zero, > > - Drops the TCP connection, > > - Releases all BGP resources, and > > - Goes to IdleHoldstate. > > If the Hold timer expires, the local system: > > - sends a NOTIFICATION message with Error Code Hold Timer > Expired, > > - sets IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - Increments ConnectRetryCnt by 1, > > - Sets the connect retry timer to zero, > > - Drops the TCP connection, > > - Releases all BGP resources, > > - Goes to IdleHold state. > > If the KeepAlive timer expires, the local system sends a > KEEPALIVE message, it restarts its KeepAlive timer, unless the > negotiated Hold Time value is zero. > > Each time time the local system sends a KEEPALIVE or UPDATE > message, it restarts its KeepAlive timer, unless the = > negotiated > Hold Time value is zero. > > In response to the Stop event initiated by the system > (automatic), the local system: > > > > > Expiration Date July 2002 =0C[Page = > 42] > > > > > > RFC DRAFT January = > 2002 > > > - sends a NOTIFICATION with Cease, > > - sets IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - increments ConnectRetryCnt by 1, > > - sets the connect retry timer to zero, > > - drops the TCP connection, > > - releases all BGP resources, > > - goes to IdleHold state, and > > - deletes all routes. > > An example automatic stop event is exceeding the number of > prefixes for a given peer and the local system automatically > disconnecting the peer. > > In response to a stop event initiated by an operator: > > - release all resources (including deleting all routes), > > - set ConnectRetryCnt to zero (0), > > - set connect retry timer to zero (0), and > > - transition to the Idle. > > The Start event is ignored in the Established state. > > In response to any other event, the local system: > > - sends a NOTIFICATION message with Error Code Finite State > Machine Error, > > - sets IdleHoldtimer =3D 2**(ConnectRetryCnt)*60 > > - increments ConnectRetryCnt by 1, > > - sets the connect retry timer to zero, > > - drops the TCP connection, > > - releases all BGP resources > > - goes to IdleHoldstate, and > > > > Expiration Date July 2002 =0C[Page = > 43] > > > > > > RFC DRAFT January = > 2002 > > > - deletes all routes. > > > 9. UPDATE Message Handling > > > An UPDATE message may be received only in the Established state. > When an UPDATE message is received, each field is checked for > validity as specified in Section 6.3. > > If an optional non-transitive attribute is unrecognized, it is > quietly ignored. If an optional transitive attribute is = > unrecognized, > the Partial bit (the third high-order bit) in the attribute flags > octet is set to 1, and the attribute is retained for propagation to > other BGP speakers. > > If an optional attribute is recognized, and has a valid value, then, > depending on the type of the optional attribute, it is processed > locally, retained, and updated, if necessary, for possible > propagation to other BGP speakers. > > If the UPDATE message contains a non-empty WITHDRAWN ROUTES field, > the previously advertised routes whose destinations (expressed as IP > prefixes) contained in this field shall be removed from the Adj-RIB- > In. This BGP speaker shall run its Decision Process since the > previously advertised route is no longer available for use. > > If the UPDATE message contains a feasible route, the Adj-RIB-In will > be updated with this route as follows: if the NLRI of the new route > is identical to the one of the route currently stored in the = > Adj-RIB- > In, then the new route shall replace the older route in the Adj-RIB- > In, thus implicitly withdrawing the older route from service. > Otherwise, if the Adj-RIB-In has no route with NLRI identical to the > new route, the new route shall be placed in the Adj-RIB-In. > > Once the BGP speaker updates the Adj-RIB-In, the speaker shall run > its Decision Process. > > > 9.1 Decision Process > > > The Decision Process selects routes for subsequent advertisement by > applying the policies in the local Policy Information Base (PIB) to > the routes stored in its Adj-RIBs-In. The output of the Decision > Process is the set of routes that will be advertised to all peers; > the selected routes will be stored in the local speaker's Adj-RIB- > Out. > > > > Expiration Date July 2002 =0C[Page = > 44] > > > > > > RFC DRAFT January = > 2002 > > > The selection process is formalized by defining a function that = > takes > the attribute of a given route as an argument and returns either (a) > a non-negative integer denoting the degree of preference for the > route, or (b) a value denoting that this route is ineligible to be > installed in LocRib and will be excluded from the next phase of = > route > selection. > > The function that calculates the degree of preference for a given > route shall not use as its inputs any of the following: the = > existence > of other routes, the non-existence of other routes, or the path > attributes of other routes. Route selection then consists of > individual application of the degree of preference function to each > feasible route, followed by the choice of the one with the highest > degree of preference. > > The Decision Process operates on routes contained in the Adj-RIB-In, > and is responsible for: > > - selection of routes to be used locally by the speaker > > - selection of routes to be advertised to other BGP peers > > - route aggregation and route information reduction > > The Decision Process takes place in three distinct phases, each > triggered by a different event: > > a) Phase 1 is responsible for calculating the degree of = > preference > for each route received from a peer. > > b) Phase 2 is invoked on completion of phase 1. It is responsible > for choosing the best route out of all those available for each > distinct destination, and for installing each chosen route into > the Loc-RIB. > > c) Phase 3 is invoked after the Loc-RIB has been modified. It is > responsible for disseminating routes in the Loc-RIB to each peer, > according to the policies contained in the PIB. Route aggregation > and information reduction can optionally be performed within this > phase. > > > 9.1.1 Phase 1: Calculation of Degree of Preference > > > The Phase 1 decision function shall be invoked whenever the local = > BGP > speaker receives from a peer an UPDATE message that advertises a new > route, a replacement route, or withdrawn routes. > > > > Expiration Date July 2002 =0C[Page = > 45] > > > > > > RFC DRAFT January = > 2002 > > > The Phase 1 decision function is a separate process which completes > when it has no further work to do. > > The Phase 1 decision function shall lock an Adj-RIB-In prior to > operating on any route contained within it, and shall unlock it = > after > operating on all new or unfeasible routes contained within it. > > For each newly received or replacement feasible route, the local BGP > speaker shall determine a degree of preference as follows: > > If the route is learned from an internal peer, either the value = > of > the LOCAL_PREF attribute shall be taken as the degree of > preference, or the local system may compute the degree of > preference of the route based on preconfigured policy = > information. > Note that the latter (computing the degree of preference based on > preconfigured policy information) may result in formation of > persistent routing loops. > > If the route is learned from an external peer, then the local BGP > speaker computes the degree of preference based on preconfigured > policy information. If the return value indicates that the route > is ineligible, the route may not serve as an input to the next > phase of route selection; otherwise the return value is used as > the LOCAL_PREF value in any IBGP readvertisement. > > The exact nature of this policy information and the computation > involved is a local matter. > > > 9.1.2 Phase 2: Route Selection > > > The Phase 2 decision function shall be invoked on completion of = > Phase > 1. The Phase 2 function is a separate process which completes when = > it > has no further work to do. The Phase 2 process shall consider all > routes that are eligible in the Adj-RIBs-In. > > The Phase 2 decision function shall be blocked from running while = > the > Phase 3 decision function is in process. The Phase 2 function shall > lock all Adj-RIBs-In prior to commencing its function, and shall > unlock them on completion. > > If the NEXT_HOP attribute of a BGP route depicts an address that is > not resolvable, or it would become unresolvable if the route was > installed in the routing table the BGP route should be excluded from > the Phase 2 decision function. > > It is critical that routers within an AS do not make conflicting > > > > Expiration Date July 2002 =0C[Page = > 46] > > > > > > RFC DRAFT January = > 2002 > > > decisions regarding route selection that would cause forwarding = > loops > to occur. > > For each set of destinations for which a feasible route exists in = > the > Adj-RIBs-In, the local BGP speaker shall identify the route that = > has: > > a) the highest degree of preference of any route to the same set > of destinations, or > > b) is the only route to that destination, or > > c) is selected as a result of the Phase 2 tie breaking rules > specified in 9.1.2.2. > > The local speaker SHALL then install that route in the Loc-RIB, > replacing any route to the same destination that is currently being > held in the Loc-RIB. If the new BGP route is installed in the = > Routing > Table (as a result of the local policy decision), care must be taken > to ensure that invalid BGP routes to the same destination are = > removed > from the Routing Table. Whether or not the new route replaces an > already existing non-BGP route in the routing table depends on the > policy configured on the BGP speaker. > > The local speaker MUST determine the immediate next hop to the > address depicted by the NEXT_HOP attribute of the selected route by > performing a best matching route lookup in the Routing Table and > selecting one of the possible paths (if multiple best paths to the > same prefix are available). If the route to the address depicted by > the NEXT_HOP attribute changes such that the immediate next hop or > the IGP cost to the NEXT_HOP (if the NEXT_HOP is resolved through an > IGP route) changes, route selection should be recalculated as > specified above. > > Notice that even though BGP routes do not have to be installed in = > the > Routing Table with the immediate next hop(s), implementations must > take care that before any packets are forwarded along a BGP route, > its associated NEXT_HOP address is resolved to the immediate > (directly connected) next-hop address and this address (or multiple > addresses) is finally used for actual packet forwarding. > > Unresolvable routes SHALL be removed from the Loc-RIB and the = > routing > table. However, corresponding unresolvable routes SHOULD be kept in > the Adj-RIBs-In. > > > > > > > > > Expiration Date July 2002 =0C[Page = > 47] > > > > > > RFC DRAFT January = > 2002 > > > 9.1.2.1 Route Resolvability Condition > > > As indicated in Section 9.1.2, BGP routers should exclude > unresolvable routes from the Phase 2 decision. This ensures that = > only > valid routes are installed in Loc-RIB and the Routing Table. > > The route resolvability condition is defined as follows. > > 1. A route Rte1, referencing only the intermediate network > address, is considered resolvable if the Routing Table contains = > at > least one resolvable route Rte2 that matches Rte1's intermediate > network address and is not recursively resolved (directly or > indirectly) through Rte1. If multiple matching routes are > available, only the longest matching route should be considered. > > 2. Routes referencing interfaces (with or without intermediate > addresses) are considered resolvable if the state of the > referenced interface is up and IP processing is enabled on this > interface. > > BGP routes do not refer to interfaces, but can be resolved through > the routes in the Routing Table that can be of both types. IGP = > routes > and routes to directly connected networks are expected to specify = > the > outbound interface. > > Note that a BGP route is considered unresolvable not only in > situations where the router's Routing Table contains no route > matching the BGP route's NEXT_HOP. Mutually recursive routes (routes > resolving each other or themselves), also fail the resolvability > check. > > It is also important that implementations do not consider feasible > routes that would become unresolvable if they were installed in the > Routing Table even if their NEXT_HOPs are resolvable using the > current contents of the Routing Table (an example of such routes > would be mutually recursive routes). This check ensures that a BGP > speaker does not install in the Routing Table routes that will be > removed and not used by the speaker. Therefore, in addition to local > Routing Table stability, this check also improves behavior of the > protocol in the network. > > Whenever a BGP speaker identifies a route that fails the > resolvability check because of mutual recursion, an error message > should be logged. > > > > > > > Expiration Date July 2002 =0C[Page = > 48] > > > > > > RFC DRAFT January = > 2002 > > > 9.1.2.2 Breaking Ties (Phase 2) > > > In its Adj-RIBs-In a BGP speaker may have several routes to the same > destination that have the same degree of preference. The local > speaker can select only one of these routes for inclusion in the > associated Loc-RIB. The local speaker considers all routes with the > same degrees of preference, both those received from internal peers, > and those received from external peers. > > The following tie-breaking procedure assumes that for each candidate > route all the BGP speakers within an autonomous system can ascertain > the cost of a path (interior distance) to the address depicted by = > the > NEXT_HOP attribute of the route, and follow the same route selection > algorithm. > > The tie-breaking algorithm begins by considering all equally > preferable routes to the same destination, and then selects routes = > to > be removed from consideration. The algorithm terminates as soon as > only one route remains in consideration. The criteria must be > applied in the order specified. > > Several of the criteria are described using pseudo-code. Note that > the pseudo-code shown was chosen for clarity, not efficiency. It is > not intended to specify any particular implementation. BGP > implementations MAY use any algorithm which produces the same = > results > as those described here. > > a) Remove from consideration all routes which are not tied for > having the smallest number of AS numbers present in their AS_PATH > attributes. Note, that when counting this number, an AS_SET = > counts > as 1, no matter how many ASs are in the set, and that, if the > implementation supports [13], then AS numbers present in segments > of type AS_CONFED_SEQUENCE or AS_CONFED_SET are not included in > the count of AS numbers present in the AS_PATH. > > b) Remove from consideration all routes which are not tied for > having the lowest Origin number in their Origin attribute. > > c) Remove from consideration routes with less-preferred > MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable > between routes learned from the same neighboring AS. Routes which > do not have the MULTI_EXIT_DISC attribute are considered to have > the lowest possible MULTI_EXIT_DISC value. > > This is also described in the following procedure: > > for m =3D all routes still under consideration > > > > Expiration Date July 2002 =0C[Page = > 49] > > > > > > RFC DRAFT January = > 2002 > > > for n =3D all routes still under consideration > if (neighborAS(m) =3D=3D neighborAS(n)) and (MED(n) = > < MED(m)) > remove route m from consideration > > In the pseudo-code above, MED(n) is a function which returns the > value of route n's MULTI_EXIT_DISC attribute. If route n has no > MULTI_EXIT_DISC attribute, the function returns the lowest > possible MULTI_EXIT_DISC value, i.e. 0. > > Similarly, neighborAS(n) is a function which returns the neighbor > AS from which the route was received. > > d) If at least one of the candidate routes was received from an > external peer in a neighboring autonomous system, remove from > consideration all routes which were received from internal peers. > > e) Remove from consideration any routes with less-preferred > interior cost. The interior cost of a route is determined by > calculating the metric to the next hop for the route using the > Routing Table. If the next hop for a route is reachable, but no > cost can be determined, then this step should be skipped > (equivalently, consider all routes to have equal costs). > > This is also described in the following procedure. > > for m =3D all routes still under consideration > for n =3D all routes in still under consideration > if (cost(n) is better than cost(m)) > remove m from consideration > > In the pseudo-code above, cost(n) is a function which returns the > cost of the path (interior distance) to the address given in the > NEXT_HOP attribute of the route. > > f) Remove from consideration all routes other than the route that > was advertised by the BGP speaker whose BGP Identifier has the > lowest value. > > g) Prefer the route received from the lowest neighbor address. > > > 9.1.3 Phase 3: Route Dissemination > > > The Phase 3 decision function shall be invoked on completion of = > Phase > 2, or when any of the following events occur: > > a) when routes in the Loc-RIB to local destinations have changed > > > > Expiration Date July 2002 =0C[Page = > 50] > > > > > > RFC DRAFT January = > 2002 > > > b) when locally generated routes learned by means outside of BGP > have changed > > c) when a new BGP speaker - BGP speaker connection has been > established > > The Phase 3 function is a separate process which completes when it > has no further work to do. The Phase 3 Routing Decision function > shall be blocked from running while the Phase 2 decision function is > in process. > > All routes in the Loc-RIB shall be processed into Adj-RIBs-Out > according to configured policy. This policy may exclude a route in > the Loc-RIB from being installed in a particular Adj-RIB-Out. A > route shall not be installed in the Adj-Rib-Out unless the > destination and NEXT_HOP described by this route may be forwarded > appropriately by the Routing Table. If a route in Loc-RIB is = > excluded > from a particular Adj-RIB-Out the previously advertised route in = > that > Adj-RIB-Out must be withdrawn from service by means of an UPDATE > message (see 9.2). > > Route aggregation and information reduction techniques (see 9.2.2.1) > may optionally be applied. > > When the updating of the Adj-RIBs-Out and the Routing Table is > complete, the local BGP speaker shall run the Update-Send process of > 9.2. > > > 9.1.4 Overlapping Routes > > > A BGP speaker may transmit routes with overlapping Network Layer > Reachability Information (NLRI) to another BGP speaker. NLRI overlap > occurs when a set of destinations are identified in non-matching > multiple routes. Since BGP encodes NLRI using IP prefixes, overlap > will always exhibit subset relationships. A route describing a > smaller set of destinations (a longer prefix) is said to be more > specific than a route describing a larger set of destinations (a > shorted prefix); similarly, a route describing a larger set of > destinations (a shorter prefix) is said to be less specific than a > route describing a smaller set of destinations (a longer prefix). > > The precedence relationship effectively decomposes less specific > routes into two parts: > > - a set of destinations described only by the less specific = > route, > and > > > > Expiration Date July 2002 =0C[Page = > 51] > > > > > > RFC DRAFT January = > 2002 > > > - a set of destinations described by the overlap of the less > specific and the more specific routes > > > When overlapping routes are present in the same Adj-RIB-In, the more > specific route shall take precedence, in order from more specific to > least specific. > > The set of destinations described by the overlap represents a = > portion > of the less specific route that is feasible, but is not currently in > use. If a more specific route is later withdrawn, the set of > destinations described by the overlap will still be reachable using > the less specific route. > > If a BGP speaker receives overlapping routes, the Decision Process > MUST consider both routes based on the configured acceptance policy. > If both a less and a more specific route are accepted, then the > Decision Process MUST either install both the less and the more > specific routes or it MUST aggregate the two routes and install the > aggregated route, provided that both routes have the same value of > the NEXT_HOP attribute. > > If a BGP speaker chooses to aggregate, then it MUST add > ATOMIC_AGGREGATE attribute to the route. A route that carries > ATOMIC_AGGREGATE attribute can not be de-aggregated. That is, the > NLRI of this route can not be made more specific. Forwarding along > such a route does not guarantee that IP packets will actually > traverse only ASs listed in the AS_PATH attribute of the route. > > > 9.2 Update-Send Process > > > The Update-Send process is responsible for advertising UPDATE > messages to all peers. For example, it distributes the routes chosen > by the Decision Process to other BGP speakers which may be located = > in > either the same autonomous system or a neighboring autonomous = > system. > > When a BGP speaker receives an UPDATE message from an internal peer, > the receiving BGP speaker shall not re-distribute the routing > information contained in that UPDATE message to other internal = > peers, > unless the speaker acts as a BGP Route Reflector [11]. > > As part of Phase 3 of the route selection process, the BGP speaker > has updated its Adj-RIBs-Out. All newly installed routes and all > newly unfeasible routes for which there is no replacement route = > shall > be advertised to its peers by means of an UPDATE message. > > > > > Expiration Date July 2002 =0C[Page = > 52] > > > > > > RFC DRAFT January = > 2002 > > > A BGP speaker should not advertise a given feasible BGP route from > its Adj-RIB-Out if it would produce an UPDATE message containing the > same BGP route as was previously advertised. > > Any routes in the Loc-RIB marked as unfeasible shall be removed. > Changes to the reachable destinations within its own autonomous > system shall also be advertised in an UPDATE message. > > > 9.2.1 Controlling Routing Traffic Overhead > > > The BGP protocol constrains the amount of routing traffic (that is, > UPDATE messages) in order to limit both the link bandwidth needed to > advertise UPDATE messages and the processing power needed by the > Decision Process to digest the information contained in the UPDATE > messages. > > > 9.2.1.1 Frequency of Route Advertisement > > > The parameter MinRouteAdvertisementInterval determines the minimum > amount of time that must elapse between advertisement of routes to a > particular destination from a single BGP speaker. This rate limiting > procedure applies on a per-destination basis, although the value of > MinRouteAdvertisementInterval is set on a per BGP peer basis. > > Two UPDATE messages sent from a single BGP speaker that advertise > feasible routes to some common set of destinations received from > external peers must be separated by at least > MinRouteAdvertisementInterval. Clearly, this can only be achieved > precisely by keeping a separate timer for each common set of > destinations. This would be unwarranted overhead. Any technique = > which > ensures that the interval between two UPDATE messages sent from a > single BGP speaker that advertise feasible routes to some common set > of destinations received from external peers will be at least > MinRouteAdvertisementInterval, and will also ensure a constant upper > bound on the interval is acceptable. > > Since fast convergence is needed within an autonomous system, this > procedure does not apply for routes received from other internal > peers. To avoid long-lived black holes, the procedure does not = > apply > to the explicit withdrawal of unfeasible routes (that is, routes > whose destinations (expressed as IP prefixes) are listed in the > WITHDRAWN ROUTES field of an UPDATE message). > > This procedure does not limit the rate of route selection, but only > > > > Expiration Date July 2002 =0C[Page = > 53] > > > > > > RFC DRAFT January = > 2002 > > > the rate of route advertisement. If new routes are selected multiple > times while awaiting the expiration of = > MinRouteAdvertisementInterval, > the last route selected shall be advertised at the end of > MinRouteAdvertisementInterval. > > > 9.2.1.2 Frequency of Route Origination > > > The parameter MinASOriginationInterval determines the minimum amount > of time that must elapse between successive advertisements of UPDATE > messages that report changes within the advertising BGP speaker's = > own > autonomous systems. > > > 9.2.1.3 Jitter > > > To minimize the likelihood that the distribution of BGP messages by = > a > given BGP speaker will contain peaks, jitter should be applied to = > the > timers associated with MinASOriginationInterval, Keepalive, and > MinRouteAdvertisementInterval. A given BGP speaker shall apply the > same jitter to each of these quantities regardless of the > destinations to which the updates are being sent; that is, jitter > will not be applied on a "per peer" basis. > > The amount of jitter to be introduced shall be determined by > multiplying the base value of the appropriate timer by a random > factor which is uniformly distributed in the range from 0.75 to 1.0. > > > 9.2.2 Efficient Organization of Routing Information > > > Having selected the routing information which it will advertise, a > BGP speaker may avail itself of several methods to organize this > information in an efficient manner. > > > 9.2.2.1 Information Reduction > > > Information reduction may imply a reduction in granularity of policy > control - after information is collapsed, the same policies will > apply to all destinations and paths in the equivalence class. > > The Decision Process may optionally reduce the amount of information > that it will place in the Adj-RIBs-Out by any of the following > > > > Expiration Date July 2002 =0C[Page = > 54] > > > > > > RFC DRAFT January = > 2002 > > > methods: > > a) Network Layer Reachability Information (NLRI): > > Destination IP addresses can be represented as IP address > prefixes. In cases where there is a correspondence between the > address structure and the systems under control of an autonomous > system administrator, it will be possible to reduce the size of > the NLRI carried in the UPDATE messages. > > b) AS_PATHs: > > AS path information can be represented as ordered AS_SEQUENCEs or > unordered AS_SETs. AS_SETs are used in the route aggregation > algorithm described in 9.2.2.2. They reduce the size of the > AS_PATH information by listing each AS number only once, > regardless of how many times it may have appeared in multiple > AS_PATHs that were aggregated. > > An AS_SET implies that the destinations listed in the NLRI can be > reached through paths that traverse at least some of the > constituent autonomous systems. AS_SETs provide sufficient > information to avoid routing information looping; however their > use may prune potentially feasible paths, since such paths are no > longer listed individually as in the form of AS_SEQUENCEs. In > practice this is not likely to be a problem, since once an IP > packet arrives at the edge of a group of autonomous systems, the > BGP speaker at that point is likely to have more detailed path > information and can distinguish individual paths to destinations. > > > 9.2.2.2 Aggregating Routing Information > > > Aggregation is the process of combining the characteristics of > several different routes in such a way that a single route can be > advertised. Aggregation can occur as part of the decision process = > to > reduce the amount of routing information that will be placed in the > Adj-RIBs-Out. > > Aggregation reduces the amount of information that a BGP speaker = > must > store and exchange with other BGP speakers. Routes can be aggregated > by applying the following procedure separately to path attributes of > like type and to the Network Layer Reachability Information. > > Routes that have the following attributes shall not be aggregated > unless the corresponding attributes of each route are identical: > MULTI_EXIT_DISC, NEXT_HOP. > > > > Expiration Date July 2002 =0C[Page = > 55] > > > > > > RFC DRAFT January = > 2002 > > > If the aggregation occurs as part of the update process, routes with > different NEXT_HOP values can be aggregated when announced through = > an > external BGP session. > > Path attributes that have different type codes can not be aggregated > together. Path attributes of the same type code may be aggregated, > according to the following rules: > > ORIGIN attribute: If at least one route among routes that are > aggregated has ORIGIN with the value INCOMPLETE, then the > aggregated route must have the ORIGIN attribute with the value > INCOMPLETE. Otherwise, if at least one route among routes that > are aggregated has ORIGIN with the value EGP, then the aggregated > route must have the origin attribute with the value EGP. In all > other case the value of the ORIGIN attribute of the aggregated > route is IGP. > > AS_PATH attribute: If routes to be aggregated have identical > AS_PATH attributes, then the aggregated route has the same = > AS_PATH > attribute as each individual route. > > For the purpose of aggregating AS_PATH attributes we model each = > AS > within the AS_PATH attribute as a tuple <type, value>, where > "type" identifies a type of the path segment the AS belongs to > (e.g. AS_SEQUENCE, AS_SET), and "value" is the AS number. If the > routes to be aggregated have different AS_PATH attributes, then > the aggregated AS_PATH attribute shall satisfy all of the > following conditions: > > - all tuples of type AS_SEQUENCE in the aggregated AS_PATH > shall appear in all of the AS_PATH in the initial set of = > routes > to be aggregated. > > - all tuples of type AS_SET in the aggregated AS_PATH shall > appear in at least one of the AS_PATH in the initial set (they > may appear as either AS_SET or AS_SEQUENCE types). > > - for any tuple X of type AS_SEQUENCE in the aggregated = > AS_PATH > which precedes tuple Y in the aggregated AS_PATH, X precedes Y > in each AS_PATH in the initial set which contains Y, = > regardless > of the type of Y. > > - No tuple of type AS_SET with the same value shall appear = > more > than once in the aggregated AS_PATH. > > - Multiple tuples of type AS_SEQUENCE with the same value may > appear in the aggregated AS_PATH only when adjacent to another > tuple of the same type and value. > > > > Expiration Date July 2002 =0C[Page = > 56] > > > > > > RFC DRAFT January = > 2002 > > > An implementation may choose any algorithm which conforms to = > these > rules. At a minimum a conformant implementation shall be able to > perform the following algorithm that meets all of the above > conditions: > > - determine the longest leading sequence of tuples (as defined > above) common to all the AS_PATH attributes of the routes to = > be > aggregated. Make this sequence the leading sequence of the > aggregated AS_PATH attribute. > > - set the type of the rest of the tuples from the AS_PATH > attributes of the routes to be aggregated to AS_SET, and = > append > them to the aggregated AS_PATH attribute. > > - if the aggregated AS_PATH has more than one tuple with the > same value (regardless of tuple's type), eliminate all, but = > one > such tuple by deleting tuples of the type AS_SET from the > aggregated AS_PATH attribute. > > Appendix 6, section 6.8 presents another algorithm that satisfies > the conditions and allows for more complex policy configurations. > > ATOMIC_AGGREGATE: If at least one of the routes to be aggregated > has ATOMIC_AGGREGATE path attribute, then the aggregated route > shall have this attribute as well. > > AGGREGATOR: All AGGREGATOR attributes of all routes to be > aggregated should be ignored. The BGP speaker performing the = > route > aggregation may attach a new AGGREGATOR attribute (see Section > 5.1.7). > > > 9.3 Route Selection Criteria > > > Generally speaking, additional rules for comparing routes among > several alternatives are outside the scope of this document. There > are two exceptions: > > - If the local AS appears in the AS path of the new route being > considered, then that new route cannot be viewed as better than > any other route (provided that the speaker is configured to = > accept > such routes). If such a route were ever used, a routing loop = > could > result (see Section 6.3). > > - In order to achieve successful distributed operation, only > routes with a likelihood of stability can be chosen. Thus, an AS > must avoid using unstable routes, and it must not make rapid > > > > Expiration Date July 2002 =0C[Page = > 57] > > > > > > RFC DRAFT January = > 2002 > > > spontaneous changes to its choice of route. Quantifying the terms > "unstable" and "rapid" in the previous sentence will require > experience, but the principle is clear. > > Care must be taken to ensure that BGP speakers in the same AS do > not make inconsistent decisions. > > > 9.4 Originating BGP routes > > A BGP speaker may originate BGP routes by injecting routing > information acquired by some other means (e.g. via an IGP) into BGP. > A BGP speaker that originates BGP routes shall assign the degree of > preference to these routes by passing them through the Decision > Process (see Section 9.1). These routes may also be distributed to > other BGP speakers within the local AS as part of the update process > (see Section 9.2). The decision whether to distribute non-BGP > acquired routes within an AS via BGP or not depends on the > environment within the AS (e.g. type of IGP) and should be = > controlled > via configuration. > > > > > > Appendix 1. Comparison with RFC1771 > > > There are numerous editorial changes (too many to list here). > > The following list the technical changes: > > Changes to reflect the usages of such features as TCP MD5 [10], > BGP Route Reflectors [11], BGP Confederations [13], and BGP Route > Refresh [12]. > > Clarification on the use of the BGP Identifier in the AGGREGATOR > attribute. > > Procedures for imposing an upper bound on the number of prefixes > that a BGP speaker would accept from a peer. > > The ability of a BGP speaker to include more than one instance of > its own AS in the AS_PATH attribute for the purpose of inter-AS > traffic engineering. > > Clarifications on the various types of NEXT_HOPs. > > > > > Expiration Date July 2002 =0C[Page = > 58] > > > > > > RFC DRAFT January = > 2002 > > > Clarifications to the use of the ATOMIC_AGGREGATE attribute. > > The relationship between the immediate next hop, and the next hop > as specified in the NEXT_HOP path attribute. > > Clarifications on the tie-breaking procedures. > > > Appendix 2. Comparison with RFC1267 > > > All the changes listed in Appendix 1, plus the following. > > BGP-4 is capable of operating in an environment where a set of > reachable destinations may be expressed via a single IP prefix. The > concept of network classes, or subnetting is foreign to BGP-4. To > accommodate these capabilities BGP-4 changes semantics and encoding > associated with the AS_PATH attribute. New text has been added to > define semantics associated with IP prefixes. These abilities allow > BGP-4 to support the proposed supernetting scheme [9]. > > To simplify configuration this version introduces a new attribute, > LOCAL_PREF, that facilitates route selection procedures. > > The INTER_AS_METRIC attribute has been renamed to be = > MULTI_EXIT_DISC. > A new attribute, ATOMIC_AGGREGATE, has been introduced to insure = > that > certain aggregates are not de-aggregated. Another new attribute, > AGGREGATOR, can be added to aggregate routes in order to advertise > which AS and which BGP speaker within that AS caused the = > aggregation. > > To insure that Hold Timers are symmetric, the Hold Time is now > negotiated on a per-connection basis. Hold Times of zero are now > supported. > > Appendix 3. Comparison with RFC 1163 > > > All of the changes listed in Appendices 1 and 2, plus the following. > > To detect and recover from BGP connection collision, a new field = > (BGP > Identifier) has been added to the OPEN message. New text (Section > 6.8) has been added to specify the procedure for detecting and > recovering from collision. > > The new document no longer restricts the border router that is = > passed > in the NEXT_HOP path attribute to be part of the same Autonomous > System as the BGP Speaker. > > > > > Expiration Date July 2002 =0C[Page = > 59] > > > > > > RFC DRAFT January = > 2002 > > > New document optimizes and simplifies the exchange of the = > information about previously reachable routes. > > > Appendix 4. Comparison with RFC 1105 > > > All of the changes listed in Appendices 1, 2 and 3, plus the > following. > > Minor changes to the RFC1105 Finite State Machine were necessary to > accommodate the TCP user interface provided by 4.3 BSD. > > The notion of Up/Down/Horizontal relations present in RFC1105 has > been removed from the protocol. > > The changes in the message format from RFC1105 are as follows: > > 1. The Hold Time field has been removed from the BGP header and > added to the OPEN message. > > 2. The version field has been removed from the BGP header and > added to the OPEN message. > > 3. The Link Type field has been removed from the OPEN message. > > 4. The OPEN CONFIRM message has been eliminated and replaced = > with > implicit confirmation provided by the KEEPALIVE message. > > 5. The format of the UPDATE message has been changed > significantly. New fields were added to the UPDATE message to > support multiple path attributes. > > 6. The Marker field has been expanded and its role broadened to > support authentication. > > Note that quite often BGP, as specified in RFC 1105, is referred > to as BGP-1, BGP, as specified in RFC 1163, is referred to as > BGP-2, BGP, as specified in RFC1267 is referred to as BGP-3, and > BGP, as specified in this document is referred to as BGP-4. > > > Appendix 5. TCP options that may be used with BGP > > > If a local system TCP user interface supports TCP PUSH function, = > then > each BGP message should be transmitted with PUSH flag set. Setting > PUSH flag forces BGP messages to be transmitted promptly to the > > > > Expiration Date July 2002 =0C[Page = > 60] > > > > > > RFC DRAFT January = > 2002 > > > receiver. > > If a local system TCP user interface supports setting precedence for > TCP connection, then the BGP transport connection should be opened > with precedence set to Internetwork Control (110) value (see also > [6]). > > A local system may protect its BGP sessions by using the TCP MD5 > Signature Option [10]. > > > Appendix 6. Implementation Recommendations > > > This section presents some implementation recommendations. > > > 6.1 Multiple Networks Per Message > > > The BGP protocol allows for multiple address prefixes with the same > path attributes to be specified in one message. Making use of this > capability is highly recommended. With one address prefix per = > message > there is a substantial increase in overhead in the receiver. Not = > only > does the system overhead increase due to the reception of multiple > messages, but the overhead of scanning the routing table for updates > to BGP peers and other routing protocols (and sending the associated > messages) is incurred multiple times as well. > > One method of building messages containing many address prefixes per > a path attribute set from a routing table that is not organized on a > per path attribute set basis is to build many messages as the = > routing > table is scanned. As each address prefix is processed, a message for > the associated set of path attributes is allocated, if it does not > exist, and the new address prefix is added to it. If such a message > exists, the new address prefix is just appended to it. If the = > message > lacks the space to hold the new address prefix, it is transmitted, a > new message is allocated, and the new address prefix is inserted = > into > the new message. When the entire routing table has been scanned, all > allocated messages are sent and their resources released. Maximum > compression is achieved when all the destinations covered by the > address prefixes share a common set of path attributes making it > possible to send many address prefixes in one 4096-byte message. > > When peering with a BGP implementation that does not compress > multiple address prefixes into one message, it may be necessary to > take steps to reduce the overhead from the flood of data received > when a peer is acquired or a significant network topology change > > > > Expiration Date July 2002 =0C[Page = > 61] > > > > > > RFC DRAFT January = > 2002 > > > occurs. One method of doing this is to limit the rate of updates. > This will eliminate the redundant scanning of the routing table to > provide flash updates for BGP peers and other routing protocols. A > disadvantage of this approach is that it increases the propagation > latency of routing information. By choosing a minimum flash update > interval that is not much greater than the time it takes to process > the multiple messages this latency should be minimized. A better > method would be to read all received messages before sending = > updates. > > > 6.2 Processing Messages on a Stream Protocol > > > BGP uses TCP as a transport mechanism. Due to the stream nature of > TCP, all the data for received messages does not necessarily arrive > at the same time. This can make it difficult to process the data as > messages, especially on systems such as BSD Unix where it is not > possible to determine how much data has been received but not yet > processed. > > One method that can be used in this situation is to first try to = > read > just the message header. For the KEEPALIVE message type, this is a > complete message; for other message types, the header should first = > be > verified, in particular the total length. If all checks are > successful, the specified length, minus the size of the message > header is the amount of data left to read. An implementation that > would "hang" the routing information process while trying to read > from a peer could set up a message buffer (4096 bytes) per peer and > fill it with data as available until a complete message has been > received. > > > 6.3 Reducing route flapping > > > To avoid excessive route flapping a BGP speaker which needs to > withdraw a destination and send an update about a more specific or > less specific route SHOULD combine them into the same UPDATE = > message. > > > 6.4 BGP Timers > > > BGP employs five timers: ConnectRetry, Hold Time, KeepAlive, > MinASOriginationInterval, and MinRouteAdvertisementInterval The > suggested value for the ConnectRetry timer is 120 seconds. The > suggested value for the Hold Time is 90 seconds. The suggested = > value > for the KeepAlive timer is 1/3 of the Hold Time. The suggested = > value > > > > Expiration Date July 2002 =0C[Page = > 62] > > > > > > RFC DRAFT January = > 2002 > > > for the MinASOriginationInterval is 15 seconds. The suggested value > for the MinRouteAdvertisementInterval is 30 seconds. > > An implementation of BGP MUST allow the Hold Time timer to be > configurable, and MAY allow the other timers to be configurable. > > > > 6.5 Path attribute ordering > > > Implementations which combine update messages as described above in > 6.1 may prefer to see all path attributes presented in a known = > order. > This permits them to quickly identify sets of attributes from > different update messages which are semantically identical. To > facilitate this, it is a useful optimization to order the path > attributes according to type code. This optimization is entirely > optional. > > > 6.6 AS_SET sorting > > > Another useful optimization that can be done to simplify this > situation is to sort the AS numbers found in an AS_SET. This > optimization is entirely optional. > > > 6.7 Control over version negotiation > > > Since BGP-4 is capable of carrying aggregated routes which cannot be > properly represented in BGP-3, an implementation which supports = > BGP-4 > and another BGP version should provide the capability to only speak > BGP-4 on a per-peer basis. > > > 6.8 Complex AS_PATH aggregation > > > An implementation which chooses to provide a path aggregation > algorithm which retains significant amounts of path information may > wish to use the following procedure: > > For the purpose of aggregating AS_PATH attributes of two routes, > we model each AS as a tuple <type, value>, where "type" = > identifies > a type of the path segment the AS belongs to (e.g. AS_SEQUENCE, > AS_SET), and "value" is the AS number. Two ASs are said to be = > the > > > > Expiration Date July 2002 =0C[Page = > 63] > > > > > > RFC DRAFT January = > 2002 > > > same if their corresponding <type, value> tuples are the same. > > The algorithm to aggregate two AS_PATH attributes works as > follows: > > a) Identify the same ASs (as defined above) within each = > AS_PATH > attribute that are in the same relative order within both > AS_PATH attributes. Two ASs, X and Y, are said to be in the > same order if either: > - X precedes Y in both AS_PATH attributes, or - Y precedes = > X > in both AS_PATH attributes. > > b) The aggregated AS_PATH attribute consists of ASs identified > in (a) in exactly the same order as they appear in the AS_PATH > attributes to be aggregated. If two consecutive ASs identified > in (a) do not immediately follow each other in both of the > AS_PATH attributes to be aggregated, then the intervening ASs > (ASs that are between the two consecutive ASs that are the > same) in both attributes are combined into an AS_SET path > segment that consists of the intervening ASs from both AS_PATH > attributes; this segment is then placed in between the two > consecutive ASs identified in (a) of the aggregated attribute. > If two consecutive ASs identified in (a) immediately follow > each other in one attribute, but do not follow in another, = > then > the intervening ASs of the latter are combined into an AS_SET > path segment; this segment is then placed in between the two > consecutive ASs identified in (a) of the aggregated attribute. > > If as a result of the above procedure a given AS number appears > more than once within the aggregated AS_PATH attribute, all, but > the last instance (rightmost occurrence) of that AS number should > be removed from the aggregated AS_PATH attribute. > > > Security Considerations > > > BGP supports the ability to authenticate BGP messages by using BGP > authentication. The authentication could be done on a per peer = > basis. > In addition, BGP supports the ability to authenticate its data = > stream > by using [10]. This authentication could be done on a per peer = > basis. > Finally, BGP could also use IPSec to authenticate its data stream. > Among the mechanisms mentioned in this paragraph, [10] is the most > widely deployed. > > > > > > > > Expiration Date July 2002 =0C[Page = > 64] > > > > > > RFC DRAFT January = > 2002 > > > References > > > [1] Mills, D., "Exterior Gateway Protocol Formal Specification", > RFC904, April 1984. > > [2] Rekhter, Y., "EGP and Policy Based Routing in the New NSFNET > Backbone", RFC1092, February 1989. > > [3] Braun, H-W., "The NSFNET Routing Architecture", RFC1093, = > February > 1989. > > [4] Postel, J., "Transmission Control Protocol - DARPA Internet > Program Protocol Specification", RFC793, September 1981. > > [5] Rekhter, Y., and P. Gross, "Application of the Border Gateway > Protocol in the Internet", RFC1772, March 1995. > > [6] Postel, J., "Internet Protocol - DARPA Internet Program Protocol > Specification", RFC791, September 1981. > > [7] "Information Processing Systems - Telecommunications and > Information Exchange between Systems - Protocol for Exchange of > Inter-domain Routeing Information among Intermediate Systems to > Support Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993 > > [8] Fuller, V., Li, T., Yu, J., and Varadhan, K., ""Classless Inter- > Domain Routing (CIDR): an Address Assignment and Aggregation > Strategy", RFC1519, September 1993. > > [9] Rekhter, Y., Li, T., "An Architecture for IP Address Allocation > with CIDR", RFC 1518, September 1993. > > [10] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 > Signature Option", RFC2385, August 1998. > > [11] Bates, T., Chandra, R., Chen, E., "BGP Route Reflection - An > Alternative to Full Mesh IBGP", RFC2796, April 2000. > > [12] Chen, E., "Route Refresh Capability for BGP-4", RFC2918, > September 2000. > > [13] Traina, P, McPherson, D., Scudder, J., "Autonomous System > Confederations for BGP", RFC3065, February 2001. > > > > > > > > Expiration Date July 2002 =0C[Page = > 65] > > > > > > RFC DRAFT January = > 2002 > > > Editors' Addresses > > Yakov Rekhter > Juniper Networks > 1194 N. Mathilda Avenue > Sunnyvale, CA 94089 > email: yakov@juniper.net > > Tony Li > Procket Networks > 1100 Cadillac Ct. > Milpitas, CA 95035 > Email: tli@procket.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Expiration Date July 2002 =0C[Page = > 66] > > > > > ------_=_NextPart_000_01C281A9.64ABEC00--
- draft-ietf-idr-bgp4-18.txt Manav Bhatia
- Re: draft-ietf-idr-bgp4-18.txt lidefeng
- Re: draft-ietf-idr-bgp4-18.txt Manav Bhatia
- RE: draft-ietf-idr-bgp4-18.txt Natale, Jonathan
- Re: draft-ietf-idr-bgp4-18.txt Yakov Rekhter
- RE: draft-ietf-idr-bgp4-18.txt Gray, Eric
- Re: draft-ietf-idr-bgp4-18.txt Alex Zinin
- RE: draft-ietf-idr-bgp4-18.txt Bill Fenner
- RE: draft-ietf-idr-bgp4-18.txt Gray, Eric
- Re: draft-ietf-idr-bgp4-18.txt Jeffrey Haas
- RE: draft-ietf-idr-bgp4-18.txt Bill Fenner