IDR Agenda Items for Vienna
Yakov Rekhter <yakov@juniper.net> Tue, 27 May 2003 16:11 UTC
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id MAA18390 for <idr-archive@ietf.org>; Tue, 27 May 2003 12:11:25 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19Kh1N-0005on-00 for idr-archive@ietf.org; Tue, 27 May 2003 12:09:53 -0400
Received: from trapdoor.merit.edu ([198.108.1.26] ident=postfix) by ietf-mx with esmtp (Exim 4.12) id 19Kh1L-0005oD-00 for idr-archive@ietf.org; Tue, 27 May 2003 12:09:52 -0400
Received: by trapdoor.merit.edu (Postfix) id 546EA91217; Tue, 27 May 2003 12:10:46 -0400 (EDT)
Delivered-To: idr-outgoing@trapdoor.merit.edu
Received: by trapdoor.merit.edu (Postfix, from userid 56) id 2C2C691218; Tue, 27 May 2003 12:10:46 -0400 (EDT)
Delivered-To: idr@trapdoor.merit.edu
Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id DF7E891217 for <idr@trapdoor.merit.edu>; Tue, 27 May 2003 12:10:44 -0400 (EDT)
Received: by segue.merit.edu (Postfix) id C4B655DEB0; Tue, 27 May 2003 12:10:44 -0400 (EDT)
Delivered-To: idr@merit.edu
Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id 47B675DE2C for <idr@merit.edu>; Tue, 27 May 2003 12:10:44 -0400 (EDT)
Received: from juniper.net (garnet.juniper.net [172.17.28.17]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h4RGAhu21177; Tue, 27 May 2003 09:10:43 -0700 (PDT) (envelope-from yakov@juniper.net)
Message-Id: <200305271610.h4RGAhu21177@merlot.juniper.net>
To: idr@merit.edu
Cc: skh@nexthop.com
Subject: IDR Agenda Items for Vienna
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <96682.1054051843.1@juniper.net>
Date: Tue, 27 May 2003 09:10:43 -0700
From: Yakov Rekhter <yakov@juniper.net>
Sender: owner-idr@merit.edu
Precedence: bulk
Folks, Its about time to start thinking about agenda items for the Vienna IETF. Please forward any IDR agenda items you might have to me and Sue. Sue & Yakov. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id MAA26892 for <idr-archive@nic.merit.edu>; Tue, 27 May 2003 12:11:05 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 546EA91217; Tue, 27 May 2003 12:10:46 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 2C2C691218; Tue, 27 May 2003 12:10:46 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id DF7E891217 for <idr@trapdoor.merit.edu>; Tue, 27 May 2003 12:10:44 -0400 (EDT) Received: by segue.merit.edu (Postfix) id C4B655DEB0; Tue, 27 May 2003 12:10:44 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id 47B675DE2C for <idr@merit.edu>; Tue, 27 May 2003 12:10:44 -0400 (EDT) Received: from juniper.net (garnet.juniper.net [172.17.28.17]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h4RGAhu21177; Tue, 27 May 2003 09:10:43 -0700 (PDT) (envelope-from yakov@juniper.net) Message-Id: <200305271610.h4RGAhu21177@merlot.juniper.net> To: idr@merit.edu Cc: skh@nexthop.com Subject: IDR Agenda Items for Vienna MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <96682.1054051843.1@juniper.net> Date: Tue, 27 May 2003 09:10:43 -0700 From: Yakov Rekhter <yakov@juniper.net> Sender: owner-idr@merit.edu Precedence: bulk Folks, Its about time to start thinking about agenda items for the Vienna IETF. Please forward any IDR agenda items you might have to me and Sue. Sue & Yakov. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id LAA25304 for <idr-archive@nic.merit.edu>; Tue, 27 May 2003 11:16:51 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 35AA391207; Tue, 27 May 2003 11:16:34 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id EB56191213; Tue, 27 May 2003 11:16:33 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 61EC891207 for <idr@trapdoor.merit.edu>; Tue, 27 May 2003 11:16:31 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 43A7C5DE16; Tue, 27 May 2003 11:16:31 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id 7A0E65DDF9 for <idr@merit.edu>; Tue, 27 May 2003 11:16:30 -0400 (EDT) Received: from juniper.net (garnet.juniper.net [172.17.28.17]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h4RFGTu17871; Tue, 27 May 2003 08:16:29 -0700 (PDT) (envelope-from yakov@juniper.net) Message-Id: <200305271516.h4RFGTu17871@merlot.juniper.net> To: Alex Zinin <zinin@psg.com> Cc: idr@merit.edu, rtg-dir@ietf.org Subject: Re: AD-review comments on draft-ietf-idr-bgp4-20 In-Reply-To: Your message of "Mon, 05 May 2003 16:38:15 PDT." <177177649135.20030505163815@psg.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <77283.1054048589.1@juniper.net> Date: Tue, 27 May 2003 08:16:29 -0700 From: Yakov Rekhter <yakov@juniper.net> Sender: owner-idr@merit.edu Precedence: bulk Alex, > Folks, > > Please find below my AD-review comments. Hopefully they will help > improve the document. I tried to consult Andrew's list as much as > possible, but do feel free to point out if something has already been > discussed and agreed upon. > > Thanks go to Yakov for kicking me often enough ;) > -- > Alex Zinin > > Some nits: > - run it by a spelling checker, please > - disable hyphenation if possible > - include boilerplates for IPR notice, Copyright notice Sure. > > General comment: > > in some places I highlighted the fact that required behavior is not > described using the 2119 language, so it is not clear if a MUST or > SHOULD or MAY is applicable. I am sure I've missed some more places > like this. I'd like to ask the editors to go through the doc and > check this. Sure. > > Status of this Memo > > > > > ... > > The list of Internet-Draft Shadow Directories can be accessed at > > http://www.ietf.org/shadow.html. > > > > Specification of Requirements > > Nit: move Abstract here. Move requirements after the Acks. Ok. > > Abstract > > Should the Abstract say that this spec covers IPv4 only? Sure. > > 3. Summary of Operation > ... > > This document uses the term `Autonomous System' (AS) throughout. The > > classic definition of an Autonomous System is a set of routers under > > a single technical administration, using an interior gateway protocol > > (IGP) and common metrics to determine how to route packets within the > > AS, and using an inter-AS routing protocol to determine how to route > > packets to other ASs. Since this classic definition was developed, it > > has become common for a single AS to use several IGPs and sometimes > > several sets of metrics within an AS. The use of the term Autonomous > > System here stresses the fact that, even when multiple IGPs and met- > > rics are used, the administration of an AS appears to other ASs to > > have a single coherent interior routing plan and presents a consis- > > tent picture of what destinations are reachable through it. > > Ed: Since 'AS' has been defined before, do we need to repeat the > definition here? The definition section before presents a *summary* of the definitions used in the document. I think that the text reads fine as is, so I would prefer not to change it. > ... > > peer in the same AS is referred to as an internal peer. Internal BGP > > and external BGP are commonly abbreviated IBGP and EBGP. > > Ed: These two have been defined before too See my previous comment. > ... > > Care must be taken to > > ensure that the interior routers have all been updated with transit > > information before the BGP speakers announce to other ASs that tran- > > sit service is being provided. > > What does the last sentence really mean from the implementation > perspective? It used to mean the BGP/IGP synchronization check. Now > that iBGP everywhere is assumed, how do we check this condition? In the absence of any objections by June 10 I suggest to take this sentence out. > > This document specifies the base behavior of the BGP protocol. This > > behavior can and is modified by extention specifications. When the > Ed: "extension" Sure. > > protocol is extended the new behavior is fully documented in the > > extention specifications. > Ed: "extension" Sure. > > > 3.1 Routes: Advertisement and Storage > > > > For the purpose of this protocol, a route is defined as a unit of > > information that pairs a set of destinations with the attributes of a > > path to those destinations. The set of destinations are systems whose > > IP addresses are contained in one IP address prefix carried in the > > Network Layer Reachability Information (NLRI) field of an UPDATE mes- > > sage, and the path is the information reported in the path attributes > > field of the same UPDATE message. > Ed: Repeated definition again See above. > ... > > If a BGP speaker chooses to advertise the route, it MAY add to or > > modify the path attributes of the route before advertising it to a > > peer. > > The intent here is to say that it's ok to modify the attribute set of > a previously received route when it's announced further. The way it > reads though is that self-originated routes are also within the > context and MAY sounds like you don't have to add attributes when > announcing those. I will replace "If a BGP speaker chooses to advertise the route" with "If a BGP speaker chooses to advertise a previously received route". > > ... > > > Changing attribute of a route is accomplished by advertising a > > replacement route. The replacement route carries new (changed) > > attributes and has the same NLRI as the original route. > > "same NLRI" implies the same prefix, but not the NLRI field, which can > be different (containing other routes), should the use of this term be > normalized throughout the document? I will replace "the same NLRI" with "the same address prefix". > > > 4.2 OPEN Message Format > > > > After a TCP is established, the first message sent by each side is an > > "TCP connection" ok. > > 5. Path Attributes > ... > > If a path with recognized transitive optional attribute is accepted > > and passed along to other BGP peers and the Partial bit in the > > Attribute Flags octet is set to 1 by some previous AS, it is not > > 'MUST NOT' here? Sure. > > set > > back to 0 by the current AS. Unrecognized non-transitive optional > > attributes MUST be quietly ignored and not passed along to other BGP > > peers. > ... > > The same attribute (attribute with the same type) can not appear more > > than once within the Path Attributes field of a particular UPDATE > > message. > > What should an implementation do if this happens? See section 6.3: If any attribute appears more than once in the UPDATE message, then the Error Subcode is set to Malformed Attribute List. > > The mandatory category refers to an attribute which MUST be present > > in both IBGP and EBGP exchanges if NLRI are contained in the UPDATE > > Ed: "if the NLRI field is contained" instead? No, as the NLRI field is always present in the UPDATE message (although if NLRI is not present, then the NLRI field is empty). > > > 5.1.2 AS_PATH > ... > > b) When a given BGP speaker advertises the route to an external > > peer, then the advertising speaker updates the AS_PATH attribute > > as follows: > > > > 1) if the first path segment of the AS_PATH is of type > > AS_SEQUENCE, the local system prepends its own AS number as the > > last element of the sequence (put it in the leftmost position). > > 'Leftmost position'... isn't this still open for interpretation? How > about wording this relative to the position of the octets in the > protocol message? I'll replace "the leftmost position" with "the leftmost position with respect to the position of octets in the protocol message". > > If the act of prepending will cause an overflow in the AS_PATH > > segment, i.e. more than 255 ASs, it is legal to prepend a new > > segment of type AS_SEQUENCE and prepend its own AS number to > > this new segment. > > What's the recommended behavior here? "it is legal to prepend" really means "it SHOULD prepend". In the absence of any objections by June 10 I'll update the text. > > > > 5.1.4 MULTI_EXIT_DISC > > > > > > The MULTI_EXIT_DISC is an optional non-transitive attribute which is > > intended to be used on external (inter-AS) links to discriminate > > among multiple exit or entry points to the same neighboring AS. The > > value of the MULTI_EXIT_DISC attribute is a four octet unsigned num- > > ber which is called a metric. All other factors being equal, the exit > > point with lower metric SHOULD be preferred. If received over EBGP, > > the MULTI_EXIT_DISC attribute MAY be propagated over IBGP to other > > BGP speakers within the same AS. The MULTI_EXIT_DISC attribute > > seems that a reference to 9.1.2.2 is due here, as using MED in local > route calculation and not propagating it further is dangerous Sure. > > received from a neighboring AS MUST NOT be propagated to other neigh- > > boring ASs. > > > > A BGP speaker MUST IMPLEMENT a mechanism based on local configuration > ^^^^^^^^^lower-case Sure. > > which allows the MULTI_EXIT_DISC attribute to be removed from a > > route. This MAY be done prior to determining the degree of preference > > what's the recommended behavior here? What the text is saying is that a BGP speaker optionally (MAY) remove MED from a route. If the speaker does this, then this *has to* happen prior to determining the degree of preference for the route. So, what "This MAY" refers to is the fact that removing MED is optional. To clarify I would replace "This MAY be done" with "Removal of the MULTI_EXIT_DISC attribute MAY be done". > > of the route and performing route selection (decision process phases > > 1 and 2). > > > > An implementation MAY also (based on local configuration) alter the > > value of the MULTI_EXIT_DISC attribute received over EBGP. This MAY > > be done prior to determining the degree of preference of the route > > what's the recommended behavior here? The same as the previous comment. > > 5.1.5 LOCAL_PREF > ... > > A BGP speaker SHALL calculate the degree of preference for > > each external route based on the locally configured policy, and > > Should we be more honest here and say that the implementation must > allow the admin to SET the degree of preference through the local > policy to influence the best-path selection process, i.e., I don't > think any implementation really *calculates* it. Please see my answer to you comment on 9.1.1 > > 5.1.6 ATOMIC_AGGREGATE > ... > > A BGP speaker that receives a route with the ATOMIC_AGGREGATE > > attribute MUST NOT make any NLRI of that route more specific (as > > defined in 9.1.4) when advertising this route to other BGP speakers. > > Since deaggregation is not described in this document, do we need this > para? I would prefer to keep the current text, as to make sure that an implementation wouldn't do deaggregation. > > A BGP speaker that receives a route with the ATOMIC_AGGREGATE > > attribute needs to be cognizant of the fact that the actual path to > > destinations, as specified in the NLRI of the route, while having the > > loop-free property, may not be the path specified in the AS_PATH > > attribute of the route. > > What does this really mean from the implementation perspective? This is mostly FYI. It has to do with the user of BGP... > > 5.1.7 AGGREGATOR > > > > > > AGGREGATOR is an optional transitive attribute which MAY be included > > in updates which are formed by aggregation (see Section 9.2.2.2). A > > BGP speaker which performs route aggregation MAY add the AGGREGATOR > > What's the recommended behavior here? Include or not, and under what > circumstances? The spec doesn't provide any recommendation on this, as it is optional. > > 6. BGP Error Handling. > ... > > The phrase "the BGP connection is closed" means that the TCP connec- > > tion has been closed, the associated Adj-RIB-In has been cleared, and > > that all resources for that BGP connection have been deallocated. > > Entries in the Loc-RIB associated with the remote peer are marked as > > invalid. The fact that the routes have become invalid is passed to > > other BGP peers before the routes are deleted from the system. > > What does "the fact is passed" mean? Should we instead say that local > route recalculation happens and peers are sent either updated best > routes or withdrawals? How about the following replacement for the last sentence: The local system recalculates its best routes for the destinations of the routes marked as invalid, and advertises to its peers either withdraws for the routes marked as invalid, or the new best routes before the invalid routes are deleted from the system. > > 6.2 OPEN message error handling. > ... > > If the Autonomous System field of the OPEN message is unacceptable, > > then the Error Subcode is set to Bad Peer AS. The determination of > > acceptable Autonomous System numbers is outside the scope of this > > protocol. > > Shouldn't we say that configuration based detection should be > supported, i.e., when remote-as is configured for the peer? No. > ... > > If the BGP Identifier field of the OPEN message is syntactically > > incorrect, then the Error Subcode is set to Bad BGP Identifier. Syn- > > tactic correctness means that the BGP Identifier field represents a > > valid IP host address. > > Is "valid IP host address" defined somewhere, btw? Certainly not in this document. Perhaps for clarity I'll add "unicast" in front of "IP host address". > > 6.3 UPDATE message error handling. > > > > > > All errors detected while processing the UPDATE message are indicated > > by sending the NOTIFICATION message with Error Code UPDATE Message > > Error. The error subcode elaborates on the specific nature of the > > error. > > "are indicated..." is this a MUST, SHOULD, or MAY? MUST. > ... > > If the ORIGIN attribute has an undefined value, then the Error Sub- > > code is set to Invalid Origin Attribute. The Data field contains the > > unrecognized attribute (type, length and value). > > Curious: do we really have to drop a session on this condition? Given > that the attribute was syntactically correct and the TLV was not > broken, so the stream is still in sync and we can move on? Of course, > if this is what current implementations do, we have no other choice. In the current spec all the errors are fatal. Including errors in the ORIGIN attribute. > ... > > If the UPDATE message is received from an external peer, the local > > system MAY check whether the leftmost AS in the AS_PATH attribute is > > Same comment about 'leftmost'... Maybe we should define this somewhere > in the beginning of the spec? I will replace "the leftmost AS" with "the leftmost AS with respect to the position of octets in the protocol message". > ... > > The NLRI field in the UPDATE message is checked for syntactic valid- > > ity. If the field is syntactically incorrect, then the Error Subcode > > is set to Invalid Network Field. > > Should we give more data on what syntactic validity means in this case > so people behave consistently? As Curtis suggested a while ago: If the document is unclear to the well qualified reader (one possessing a thorough understanding of foundations of this work, including IP routing, TCP, TCP programming, and the referenced documents) then the document may need to be changed to improve clarity. The case you mentioned above suggests that the reader is not well qualified. > > 6.7 Cease. > ... > > If the BGP speaker decides to terminate its BGP > > connection with a neighbor because the number of address prefixes > > received from the neighbor exceeds the locally configured upper > > bound, then the speaker MUST send to the neighbor a NOTIFICATION mes- > > sage with the Error Code Cease. > > Should we also say that when the peer decides to discard incoming > prefixes, this event should be logged locally? In the absence of any objections by June 10 I'll add the following to the text: The speaker MAY also log this locally. > > 9. UPDATE Message Handling > > > > > > An UPDATE message may be received only in the Established state. > > What if it is received in another state? It is an error. To make this clear I'll add the following to the text: Receiving an UPDATE message in any other state is an error. > ... > > 9.1 Decision Process > > > > > > The Decision Process selects routes for subsequent advertisement by > > applying the policies in the local Policy Information Base (PIB) to > > the routes stored in its Adj-RIBs-In. The output of the Decision Pro- > > cess is the set of routes that will be advertised to peers; the > > selected routes will be stored in the local speaker's Adj-RIB-Out > RIB-Out or RIBs-out (plural)? Plural. > > according to policy. > > > > The selection process is formalized by defining a function that takes > > the attribute of a given route as an argument and returns either (a) > > a non-negative integer denoting the degree of preference for the > > route, or (b) a value denoting that this route is ineligible to be > > installed in LocRib and will be excluded from the next phase of route > > Loc-RIB Ok. > > selection. > ... > > The Decision Process operates on routes contained in the Adj-RIB-In, > Adj-RIBs-In (plural) ? Plural. > > and is responsible for: > > > 9.1.1 Phase 1: Calculation of Degree of Preference > ... > > If the route is learned from an external peer, then the local BGP > > speaker computes the degree of preference based on preconfigured > > policy information. If the return value indicates that the route > > is ineligible, the route MAY NOT serve as an input to the next > > phase of route selection; otherwise the return value is used as > > the LOCAL_PREF value in any IBGP readvertisement. > > So, AFAIK, the major implementations do not follow this step > (calculating the degree of preference, and then announcing). Instead, > implementations allow setting the LOCAL_PREF value locally, which is > taken into consideration during the best path selection, and is also > reannounced further. It is important to keep in mind that the whole section on the BGP decision process does *not* mean that an implementation must implement it precisely as it is described in the spec, as long as the implementation support the described functionality and its externally visible behavior is the same. With this in mind how about if I'll add the following: The BGP Decision Process in this document is conceptual and do not have to be implemented precisely as described here, as long as the implementations support the described functionality and their externally visible behavior is the same. > Also "is used" is not specific enough. Is it SHOULD or MUST? MUST. > > 9.1.2 Phase 2: Route Selection > ... > > If the AS_PATH attribute of a BGP route contains an AS loop, the BGP > > route should be excluded from the Phase 2 decision function. AS loop > > detection is done by scanning the full AS path (as specified in the > > AS_PATH attribute), and checking that the autonomous system number of > > the local system does not appear in the AS path. Operations of a BGP > > speaker that is configured to accept routes with its own autonomous > > system number in the AS path are outside the scope of this document. > > If we're checking for an AS loop here (in Phase 2) as opposed to > during the UPDATE message sanity checking, the route is already > received and accepted in the peer's Adj-RIB-In. Those implementations > I know don't even install such routes in the RIB... This is the text that the WG agreed on (see e-mail from John Scudder on Mon, 02 Dec 2002 10:54:45 EST). Also, see my response to your previous comment. > > 9.1.2.2 Breaking Ties (Phase 2) > ... > > Similarly, neighborAS(n) is a function which returns the neighbor > > AS from which the route was received. If the route is learned via > > IBGP, and the other IBGP speaker didn't originate the route, it is > > the neighbor AS from which the other IBGP speaker learned the > > route. If the route is learned via IBGP, and the other IBGP > > speaker originated the route, it is the local AS. > > What if the route is locally originated? Breaking ties has to do with the routes received from other BGP speakers, not with the routes locally originated. > > 9.1.4 Overlapping Routes > ... > > When overlapping routes are present in the same Adj-RIB-In, the more > > specific route takes precedence, in order from more specific to least > > specific. > > > Doesn't this happen at the packet forwarding stage? Yes, it does. But only if both routes are present in the FIB. I also think that this sentence isn't needed, so in the absence of any objections by June 10 I propose to remove it. > > The set of destinations described by the overlap represents a portion > > of the less specific route that is feasible, but is not currently in > > use. If a more specific route is later withdrawn, the set of desti- > > nations described by the overlap will still be reachable using the > > less specific route. > > > > If a BGP speaker receives overlapping routes, the Decision Process > > MUST consider both routes based on the configured acceptance policy. > > If both a less and a more specific route are accepted, then the Deci- > > sion Process MUST either install both the less and the more specific > > Install where? In Loc-RIB. I'll insert "in Loc-RIB" to make this clear. > > routes or it MUST aggregate the two routes and install the aggregated > > route, provided that both routes have the same value of the NEXT_HOP > > attribute. > > anyone really does the latter? Will find this from the implemenation report. > > If a BGP speaker chooses to aggregate, then it SHOULD either include > > all AS used to form the aggreagate in an AS_SET or add the > > ATOMIC_AGGREGATE attribute to the route. This attribute is now pri- > > marily informational. With the elimination of IP routing protocols > > that do not support classless routing and the elimination of router > > and host implementations that do not support classless routing, there > > is no longer a need to deaggregate. Routes SHOULD NOT be de-aggre- > > gated. A route that carries ATOMIC_AGGREGATE attribute in particular > > MUST NOT be de-aggregated. That is, the NLRI of this route can not be > > made more specific. Forwarding along such a route does not guarantee > > that IP packets will actually traverse only ASs listed in the AS_PATH > > attribute of the route. > > Since we don't do deaggregation any more, should we remove the > discussion about it completely and indicate in the "changes" section > that deaggregation has been deprecated? As I said before, I would prefer to keep the text on de-aggregation in. > > 9.2 Update-Send Process > ... > > When a BGP speaker receives an UPDATE message from an internal peer, > > the receiving BGP speaker SHALL NOT re-distribute the routing infor- > > mation contained in that UPDATE message to other internal peers, > > unless the speaker acts as a BGP Route Reflector [RFC2796]. > > Suggest to put "unless..." in brackets () to make it more apparent > that this is not a normative ref. Ok. > > 9.2.1.1 Frequency of Route Advertisement > > Since fast convergence is needed within an autonomous system, either > > (a) the MinRouteAdvertisementInterval used for internal peers SHOULD > > be shorter than the MinRouteAdvertisementInterval used for external > > peers, or (b) the procedure describe in this section SHOULD NOT apply > > for routes sent to internal peers. > > It sounded like MinRouteAdvertisementInterval was an architectural > constant, but now it sounds like either this is a timer that can be > assigned different settings or there are two constants: > MinRouteAdvIntIBGP and MinRouteAdvIntEBGP. There is a timer (MinRouteAdvertisementInterval) that can be assigned different settings. > > 9.2.2.2 Aggregating Routing Information > > > > Hmmm... I expected to see in this section some text talking about when > and how an aggregate would be announced, i.e., when an aggregate > prefix is configured, and more specific routes are present, the > aggregate is announced, when no specifics are left--withdraw the > aggregate. I haven't found anything on this topic... That is outside the scope of the *protocol* spec. See rfc1519 for more on this. > > 9.3 Route Selection Criteria > > > > Generally speaking, additional rules for comparing routes among sev- > > eral alternatives are outside the scope of this document. There are > > two exceptions: > > > > - If the local AS appears in the AS path of the new route being > > considered, then that new route can not be viewed as better than > > any other route (provided that the speaker is configured to accept > > such routes). If such a route were ever used, a routing loop could > > result. > > > > - In order to achieve successful distributed operation, only > > routes with a likelihood of stability can be chosen. Thus, an AS > > SHOULD avoid using unstable routes, and it SHOULD NOT make rapid > > spontaneous changes to its choice of route. Quantifying the terms > > "unstable" and "rapid" in the previous sentence will require expe- > > rience, but the principle is clear. > > Where does this (the second one) fit within and how does this affect > the route selection criteria? Routes that flap often can be "penalize" (e.g., route dampening). I'll add a pointer to the route dampening spec here. > > Care must be taken to ensure that BGP speakers in the same AS do not > > make inconsistent decisions. > > How? By means outside of the protocol. How about if I'll just remove this sentence ? > What does this mean for the implementor? > > > 9.4 Originating BGP routes > > > > A BGP speaker may originate BGP routes by injecting routing informa- > > tion acquired by some other means (e.g. via an IGP) into BGP. A BGP > > speaker that originates BGP routes assigns the degree of preference > > > > "assigns the degree of preference"... how do the implementations > really do that? E.g., via CLI. I'll add "(e.g., via CLI") after "assigns the degree of preference". > > 10 BGP Timers > ... > > The suggested default value for the MinRouteAdvertisementInterval is > > 30 seconds. > > This was described as a parameter, not a timer. Further, it was > earlier suggested that it should be shorter for iBGP than it is for > eBGP. I'd expect the document to specify the recommended value for > both. This is for eBGP. For iBGP the suggested value is 5 secs (I'll add this to the draft). > > IANA Considerations > ... > > All extensions to this protocol, including new message types and Path > > Attributes MUST only be made using the Standards Action process > > defined in [RFC2434]. > > This section should include the description of each registry that > needs to be created (if needed) and maintained by IANA, as well as the > allocation policy that is in the text already. Sure. Yakov. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id QAA13684 for <idr-archive@nic.merit.edu>; Fri, 23 May 2003 16:52:03 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 8912791250; Fri, 23 May 2003 16:51:35 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 5CBDD91251; Fri, 23 May 2003 16:51:35 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 6106F91250 for <idr@trapdoor.merit.edu>; Fri, 23 May 2003 16:51:34 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 3F3995DE37; Fri, 23 May 2003 16:51:34 -0400 (EDT) Delivered-To: idr@merit.edu Received: from dog.tcb.net (dog.tcb.net [64.78.150.133]) by segue.merit.edu (Postfix) with ESMTP id 1B30D5DE29 for <idr@merit.edu>; Fri, 23 May 2003 16:51:34 -0400 (EDT) Received: from [192.168.1.39] (vdsl-151-118-3-177.dnvr.uswest.net [151.118.3.177]) by dog.tcb.net (Postfix) with ESMTP id 722DC2029E for <idr@merit.edu>; Fri, 23 May 2003 14:56:55 -0600 (MDT) User-Agent: Microsoft-Entourage/10.0.0.1309 Date: Fri, 23 May 2003 14:51:16 -0600 Subject: Re: EBGP - Setting Nexthop From: Danny McPherson <danny@tcb.net> To: <idr@merit.edu> Message-ID: <BAF3E5E4.6501%danny@tcb.net> In-Reply-To: <006c01c32168$9e56a620$cbc8c8c8@sdksoft.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Sender: owner-idr@merit.edu Precedence: bulk On 5/23/03 2:19 PM, "Parag Deshpande" <paragdeshpande@sdksoft.com> wrote: > Thanks Danny, > > Then what I get is that it really doesn't matter what the default > behavior is since knobs (policies) can be used to modify NEXT_HOP > as needed. (?) Correct. Both on the transmit and receive (e.g., enforce next-hop) sides. -danny Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id QAA13514 for <idr-archive@nic.merit.edu>; Fri, 23 May 2003 16:19:24 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 751749124B; Fri, 23 May 2003 16:19:02 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 4AD9F9124D; Fri, 23 May 2003 16:19:02 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id CD7829124B for <idr@trapdoor.merit.edu>; Fri, 23 May 2003 16:19:00 -0400 (EDT) Received: by segue.merit.edu (Postfix) id A74965DDF4; Fri, 23 May 2003 16:19:00 -0400 (EDT) Delivered-To: idr@merit.edu Received: from mpls-qmqp-01.inet.qwest.net (mpls-qmqp-01.inet.qwest.net [63.231.195.112]) by segue.merit.edu (Postfix) with SMTP id 543CA5DE03 for <idr@merit.edu>; Fri, 23 May 2003 16:19:00 -0400 (EDT) Received: (qmail 34830 invoked by uid 0); 23 May 2003 20:19:00 -0000 Received: from unknown (63.231.195.5) by mpls-qmqp-01.inet.qwest.net with QMQP; 23 May 2003 20:19:00 -0000 Received: from 0-1pool172-208.nas17.minneapolis1.mn.us.da.qwest.net (HELO charita) (67.4.172.208) by mpls-pop-05.inet.qwest.net with SMTP; 23 May 2003 20:18:59 -0000 Date: Fri, 23 May 2003 15:19:30 -0500 Message-ID: <006c01c32168$9e56a620$cbc8c8c8@sdksoft.com> From: "Parag Deshpande" <paragdeshpande@sdksoft.com> To: "Danny McPherson" <danny@tcb.net>, idr@merit.edu References: <BAF3CC9F.64AB%danny@tcb.net> Subject: Re: EBGP - Setting Nexthop MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-idr@merit.edu Precedence: bulk Thanks Danny, Then what I get is that it really doesn't matter what the default behavior is since knobs (policies) can be used to modify NEXT_HOP as needed. (?) Parag > > Hi, > > > > I have a doubt regarding setting of nexthop value in the following scenario: > > > > Router R has 2 ebgp peers A and B, all on same subnet S1. > > A - Sends a prefix to R with nexthop N1 where N1 = A. > > R - Installs and then forwards the prefix to B with N1 = ? > > > > In this case should R set N1 = A or N1 = R (on S1). > > It's a matter of policy, really. For instance, perhaps A and B don't peer > directly and A doesn't want to accept packets directly from B, so setting a > third party NEXT_HOP may break things (e.g., Link Layer filtering is > implemented or no Layer 2 connection exists directly between A and B, even > though A, B & R share a common subnet) or violate some policy (In a previous > job we peered with a network at a multi-access exchange point -- purely out > of goodwill. They began sending us lots of traffic and after some > investigation we realized they were selling transit across the local > exchange point via readvertising our routes to their transit customers and > preserving the NEXT_HOP, such that their customers were sending traffic > directly to us -- they never touched the outbound traffic! Needless to say, > MAC-Layer filtering was deployed shortly thereafter). > > On the other hand, perhaps they're all in agreement that this is a fine > thing in order to optimize the forwarding path AND connectivity exists such > that B can send traffic directly to A -- so it makes sense for R to preserve > the NEXT_HOP. > > > I saw a major vendor setting it to R. Is that a preffered practice? > > If yes why? > > Again, it's all a matter of policy, and all the "major vendors" I'm familiar > with provide the knobs to set it pretty much however you prefer, though I > have seen some variances in default behaviors. > > -danny > > > Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id PAA13100 for <idr-archive@nic.merit.edu>; Fri, 23 May 2003 15:04:13 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id AF4FB91249; Fri, 23 May 2003 15:03:46 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 80F359124A; Fri, 23 May 2003 15:03:46 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 6EE8291249 for <idr@trapdoor.merit.edu>; Fri, 23 May 2003 15:03:45 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 4D3A95DD91; Fri, 23 May 2003 15:03:45 -0400 (EDT) Delivered-To: idr@merit.edu Received: from dog.tcb.net (dog.tcb.net [64.78.150.133]) by segue.merit.edu (Postfix) with ESMTP id 280595DD8D for <idr@merit.edu>; Fri, 23 May 2003 15:03:45 -0400 (EDT) Received: from [192.168.1.39] (vdsl-151-118-3-177.dnvr.uswest.net [151.118.3.177]) by dog.tcb.net (Postfix) with ESMTP id 29964202A0 for <idr@merit.edu>; Fri, 23 May 2003 13:09:06 -0600 (MDT) User-Agent: Microsoft-Entourage/10.0.0.1309 Date: Fri, 23 May 2003 13:03:27 -0600 Subject: Re: EBGP - Setting Nexthop From: Danny McPherson <danny@tcb.net> To: <idr@merit.edu> Message-ID: <BAF3CC9F.64AB%danny@tcb.net> In-Reply-To: <001501c3214a$c0613b40$cbc8c8c8@sdksoft.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Sender: owner-idr@merit.edu Precedence: bulk On 5/23/03 10:45 AM, "Parag Deshpande" <paragdeshpande@sdksoft.com> wrote: > Hi, > > I have a doubt regarding setting of nexthop value in the following scenario: > > Router R has 2 ebgp peers A and B, all on same subnet S1. > A - Sends a prefix to R with nexthop N1 where N1 = A. > R - Installs and then forwards the prefix to B with N1 = ? > > In this case should R set N1 = A or N1 = R (on S1). It's a matter of policy, really. For instance, perhaps A and B don't peer directly and A doesn't want to accept packets directly from B, so setting a third party NEXT_HOP may break things (e.g., Link Layer filtering is implemented or no Layer 2 connection exists directly between A and B, even though A, B & R share a common subnet) or violate some policy (In a previous job we peered with a network at a multi-access exchange point -- purely out of goodwill. They began sending us lots of traffic and after some investigation we realized they were selling transit across the local exchange point via readvertising our routes to their transit customers and preserving the NEXT_HOP, such that their customers were sending traffic directly to us -- they never touched the outbound traffic! Needless to say, MAC-Layer filtering was deployed shortly thereafter). On the other hand, perhaps they're all in agreement that this is a fine thing in order to optimize the forwarding path AND connectivity exists such that B can send traffic directly to A -- so it makes sense for R to preserve the NEXT_HOP. > I saw a major vendor setting it to R. Is that a preffered practice? > If yes why? Again, it's all a matter of policy, and all the "major vendors" I'm familiar with provide the knobs to set it pretty much however you prefer, though I have seen some variances in default behaviors. -danny Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id MAA12196 for <idr-archive@nic.merit.edu>; Fri, 23 May 2003 12:46:20 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 755669123D; Fri, 23 May 2003 12:45:17 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 3CFB591244; Fri, 23 May 2003 12:45:17 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 7A7C59123D for <idr@trapdoor.merit.edu>; Fri, 23 May 2003 12:45:15 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 3CF325E77A; Fri, 23 May 2003 12:45:14 -0400 (EDT) Delivered-To: idr@merit.edu Received: from mpls-qmqp-01.inet.qwest.net (mpls-qmqp-01.inet.qwest.net [63.231.195.112]) by segue.merit.edu (Postfix) with SMTP id 5A1465E082 for <idr@merit.edu>; Fri, 23 May 2003 12:45:12 -0400 (EDT) Received: (qmail 19220 invoked by uid 0); 23 May 2003 16:45:12 -0000 Received: from unknown (63.231.195.13) by mpls-qmqp-01.inet.qwest.net with QMQP; 23 May 2003 16:45:12 -0000 Received: from 0-1pool172-208.nas17.minneapolis1.mn.us.da.qwest.net (HELO charita) (67.4.172.208) by mpls-pop-13.inet.qwest.net with SMTP; 23 May 2003 16:45:12 -0000 Date: Fri, 23 May 2003 11:45:38 -0500 Message-ID: <001501c3214a$c0613b40$cbc8c8c8@sdksoft.com> From: "Parag Deshpande" <paragdeshpande@sdksoft.com> To: idr@merit.edu References: <20030520145557.G16646@nexthop.com> Subject: EBGP - Setting Nexthop MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-idr@merit.edu Precedence: bulk Hi, I have a doubt regarding setting of nexthop value in the following scenario: Router R has 2 ebgp peers A and B, all on same subnet S1. A - Sends a prefix to R with nexthop N1 where N1 = A. R - Installs and then forwards the prefix to B with N1 = ? In this case should R set N1 = A or N1 = R (on S1). I saw a major vendor setting it to R. Is that a preffered practice? If yes why? Reference: 5.1.3 NEXT_HOP 2) When sending a message to an external peer X, and the peer is one IP hop away from the speaker: ........ - Otherwise, if the route being announced was learned from an external peer, the speaker can use in the NEXT_HOP attribute an IP address of any adjacent router (known from the received NEXT_HOP attribute) that the speaker itself uses for local route calculation, provided that peer X shares a common subnet with this address. This is a second form of "third party" NEXT_HOP attribute. >> N1 = A - Otherwise, if the external peer to which the route is being advertised shares a common subnet with one of the interfaces of the announcing BGP speaker, the speaker MAY use the IP address associated with such an interface in the NEXT_HOP attribute. This is known as a "first party" NEXT_HOP attribute. >> N1 = R ..... Thanks Parag Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id PAA19873 for <idr-archive@nic.merit.edu>; Wed, 21 May 2003 15:25:48 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 2F76391235; Wed, 21 May 2003 15:25:29 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id EF42C9123D; Wed, 21 May 2003 15:25:28 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id A853D91235 for <idr@trapdoor.merit.edu>; Wed, 21 May 2003 15:25:27 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 89C525DFD3; Wed, 21 May 2003 15:25:27 -0400 (EDT) Delivered-To: idr@merit.edu Received: from ietf.org (odin.ietf.org [132.151.1.176]) by segue.merit.edu (Postfix) with ESMTP id 7EAED5DFD0 for <idr@merit.edu>; Wed, 21 May 2003 15:25:25 -0400 (EDT) Received: from CNRI.Reston.VA.US (localhost [127.0.0.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA04849; Wed, 21 May 2003 15:25:14 -0400 (EDT) Message-Id: <200305211925.PAA04849@ietf.org> To: IETF-Announce: ; Cc: RFC Editor <rfc-editor@isi.edu>, Internet Architecture Board <iab@iab.org>, idr@merit.edu From: The IESG <iesg-secretary@ietf.org> Subject: Document Action: Security Requirements for Keys used with the TCP MD5 Signature Option to Informational Date: Wed, 21 May 2003 15:25:14 -0400 Sender: owner-idr@merit.edu Precedence: bulk The IESG has approved the Internet-Draft 'Security Requirements for Keys used with the TCP MD5 Signature Option' <draft-ietf-idr-md5-keys-00.txt> as an Informational RFC. This document is the product of the Inter-Domain Routing Working Group. The IESG contact persons are Bill Fenner and Alex Zinin. RFC Editor Note: Please change the title to "Key Management Considerations for the TCP MD5 Signature Option". Please change the following: In section 3, the first bullet: OLD: o Key lengths SHOULD be between 12 and 24 bytes, with larger keys having effectively zero cost when compared to shorter keys. NEW: o Key lengths SHOULD be between 12 and 24 bytes, with larger keys having effectively zero additional computational cost when ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ compared to shorter keys. In section 5, first paragraph: OLD: this option may have lifetimes on the order of months. It would seem prudent, then, to choose a *minimum* key length that guarantees that key-guessing runtimes are some reasonable [3-5??] multiple of the key-change interval under best-case (for the attacker) practical NEW: this option may have lifetimes on the order of months. It would seem prudent, then, to choose a minimum key length that guarantees that ^^^^^^^ (remove emphasis) key-guessing runtimes are some small multiple of the key-change ^^^^^^^^^^^^^^ interval under best-case (for the attacker) practical In section 6, first paragraph: OLD: that the reasonable upper-bound for software-based attack performance is 1.0e13 MD5 operations per second, then the *minimum* required key entropy is approximately 68 bits. It is reasonable to round this NEW: that the reasonable upper-bound for software-based attack performance is 1.0e13 MD5 operations per second, then the minimum required key ^^^^^^^ (remove emphasis) entropy is approximately 68 bits. It is reasonable to round this Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA09545 for <idr-archive@nic.merit.edu>; Tue, 20 May 2003 14:57:00 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 85A559126D; Tue, 20 May 2003 14:56:41 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 5938E9126E; Tue, 20 May 2003 14:56:41 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 474249126D for <idr@trapdoor.merit.edu>; Tue, 20 May 2003 14:56:40 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 347275DF33; Tue, 20 May 2003 14:56:40 -0400 (EDT) Delivered-To: idr@merit.edu Received: from presque.nexthop.com (dns.nexthop.com [65.247.36.216]) by segue.merit.edu (Postfix) with ESMTP id 0255E5DEE3 for <idr@merit.edu>; Tue, 20 May 2003 14:56:39 -0400 (EDT) Received: (from root@localhost) by presque.nexthop.com (8.12.8/8.11.1) id h4KIu8Q3028978; Tue, 20 May 2003 14:56:08 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: from jhaas.nexthop.com (jhaas.nexthop.com [65.247.36.31]) by presque.nexthop.com (8.12.8/8.12.8) with ESMTP id h4KIu28o028964; Tue, 20 May 2003 14:56:02 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: (from jhaas@localhost) by jhaas.nexthop.com (8.11.3nb1/8.11.3) id h4KItv618075; Tue, 20 May 2003 14:55:57 -0400 (EDT) Date: Tue, 20 May 2003 14:55:57 -0400 From: Jeffrey Haas <jhaas@nexthop.com> To: idr@merit.edu Cc: rtg-dir@ietf.org Subject: [ruwhite@cisco.com: Re: Comments on BGP Draft 20.....] Message-ID: <20030520145557.G16646@nexthop.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-Virus-Scanned: by AMaViS perl-11 Sender: owner-idr@merit.edu Precedence: bulk Yakov, ----- Forwarded message from Russ White <ruwhite@cisco.com> ----- Date: Tue, 20 May 2003 14:33:38 -0400 (EDT) From: Russ White <ruwhite@cisco.com> To: Jeffrey Haas <jhaas@nexthop.com> Subject: Re: Comments on BGP Draft 20..... Reply-To: Russ White <riw@cisco.com> X-Virus-Scanned: by AMaViS perl-11 X-OriginalArrivalTime: 20 May 2003 18:33:51.0941 (UTC) FILETIME=[5C876750:01C31EFE] Yeah, this sounds better.... :-) Russ On Tue, 20 May 2003, Jeffrey Haas wrote: > [off-list] > > Howzabout: > The primary function of a BGP speaking system is to exchange network > reachability information with other BGP systems. This network reacha- > bility information includes information on the list of Autonomous > Systems (ASs) that reachability information traverses. This informa- > tion is sufficient to construct a graph of AS connectivity > + for this reachability > from which > routing loops may be pruned and some policy decisions at the AS level > may be enforced. > > -- > Jeff Haas > NextHop Technologies > __________________________________ riw@cisco.com CCIE <>< Grace Alone ----- End forwarded message ----- -- Jeff Haas NextHop Technologies Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA09318 for <idr-archive@nic.merit.edu>; Tue, 20 May 2003 14:29:46 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 11EE19126A; Tue, 20 May 2003 14:29:20 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id D5AD99126B; Tue, 20 May 2003 14:29:19 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id C3A119126A for <idr@trapdoor.merit.edu>; Tue, 20 May 2003 14:29:18 -0400 (EDT) Received: by segue.merit.edu (Postfix) id AC8CC5DF33; Tue, 20 May 2003 14:29:18 -0400 (EDT) Delivered-To: idr@merit.edu Received: from presque.nexthop.com (dns.nexthop.com [65.247.36.216]) by segue.merit.edu (Postfix) with ESMTP id 809445DF30 for <idr@merit.edu>; Tue, 20 May 2003 14:29:18 -0400 (EDT) Received: (from root@localhost) by presque.nexthop.com (8.12.8/8.11.1) id h4KISbge028126; Tue, 20 May 2003 14:28:37 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: from jhaas.nexthop.com (jhaas.nexthop.com [65.247.36.31]) by presque.nexthop.com (8.12.8/8.12.8) with ESMTP id h4KISX8o028114; Tue, 20 May 2003 14:28:33 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: (from jhaas@localhost) by jhaas.nexthop.com (8.11.3nb1/8.11.3) id h4KISSp17972; Tue, 20 May 2003 14:28:28 -0400 (EDT) Date: Tue, 20 May 2003 14:28:28 -0400 From: Jeffrey Haas <jhaas@nexthop.com> To: Yakov Rekhter <yakov@juniper.net> Cc: Russ White <riw@cisco.com>, idr@merit.edu, rtg-dir@ietf.org Subject: Re: Comments on BGP Draft 20..... Message-ID: <20030520142828.E16646@nexthop.com> References: <Pine.OSX.4.51.0305201307370.23356@dhcp-64-102-48-215.cisco.com> <200305201810.h4KIAbu27841@merlot.juniper.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200305201810.h4KIAbu27841@merlot.juniper.net>; from yakov@juniper.net on Tue, May 20, 2003 at 11:10:37AM -0700 X-Virus-Scanned: by AMaViS perl-11 Sender: owner-idr@merit.edu Precedence: bulk On Tue, May 20, 2003 at 11:10:37AM -0700, Yakov Rekhter wrote: > > Okay: > > > > The Loc-RIB contains routes which are installed in the local routing table > > and used for forwarding packets received, based on the destination address, > > by the router. > > In the absence of any objections within a week I'll put this in the text. Except: : Whether or not the new BGP route replaces an existing : non-BGP route in the Routing Table depends on the policy configured : on the BGP speaker. I think the existing text is fine. The gotcha is one has to read ahead a bit in the document to find the bit I just quoted. > > It's value MUST NOT be changed by any other speaker. > > > In the absence of any objections within a week I'll put this in the text. Good grief. I refer all parties concerned back to the thread titled "Re: issue 32.1", specifically the consensus mail from Andrew with message-id: <20020927115144.F13901@demiurge.exodus.net Short summary: 1. You shouldn't change it. 2. People *do* change it, and do so for policy reasons. 3. You shouldn't change it, but since people do, we're going to tell you that you shouldn't and thus imply that you can if you really want to. :-) My own preference was to *not* change it and MUST would be fine with me, but consensus was previously reached. > Yakov. -- Jeff Haas NextHop Technologies Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA09217 for <idr-archive@nic.merit.edu>; Tue, 20 May 2003 14:11:07 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id DABC891268; Tue, 20 May 2003 14:10:46 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id A658191269; Tue, 20 May 2003 14:10:46 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 5143091268 for <idr@trapdoor.merit.edu>; Tue, 20 May 2003 14:10:45 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 3D3335DF11; Tue, 20 May 2003 14:10:45 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id B3CE75DF0F for <idr@merit.edu>; Tue, 20 May 2003 14:10:44 -0400 (EDT) Received: from juniper.net (garnet.juniper.net [172.17.28.17]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h4KIAbu27841; Tue, 20 May 2003 11:10:37 -0700 (PDT) (envelope-from yakov@juniper.net) Message-Id: <200305201810.h4KIAbu27841@merlot.juniper.net> To: Russ White <riw@cisco.com> Cc: idr@merit.edu, rtg-dir@ietf.org Subject: Re: Comments on BGP Draft 20..... In-Reply-To: Your message of "Tue, 20 May 2003 13:44:16 EDT." <Pine.OSX.4.51.0305201307370.23356@dhcp-64-102-48-215.cisco.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <68386.1053454237.1@juniper.net> Date: Tue, 20 May 2003 11:10:37 -0700 From: Yakov Rekhter <yakov@juniper.net> Sender: owner-idr@merit.edu Precedence: bulk Russ, > > > This information is sufficeint to construct a graph of the AS connectivity > > > from which routing loops may be pruned and some policy decisions at the AS > > > level may be enforced. > > > > > > UPDATE Message Format: > > > > > > The information in the UPDATE message can be used to construct a graph > > > describing the relationships of the various Autonomous Systems. > > > > > > In both cases this is true, I suppose, but in neither case does this > > > really describe what the AS Path is used for, right? > > > > Wrong. As the abstract states quite clearly that this information is used > > to prune routing loops and make some policy decisions at the AS level. > > Yes, but it's not used for inscribing a graph of interconnectivity between > the AS' in the internetwork, is it? It can be used for that, I suppose, so > the text is fine, but it's not used for that--I think that's what's > confusing about it. > > > > 3.2 Routing Information Bases > > > > > > b) Loc-RIB.... > > > > > > I think it might be useful to state the contents of the Loc-RIB are > > > actually installed in the local routing table, and thus used for forwardi ng > > > packets on this router. I don't see anyplace this connection is made > > > explicit, it seems more like it's implicit throughout the doc. > > > > Please propose the text. > > Okay: > > The Loc-RIB contains routes which are installed in the local routing table > and used for forwarding packets received, based on the destination address, > by the router. In the absence of any objections within a week I'll put this in the text. > > > Network Layer Reachability Information > > > > > > "An UPDATE message can list multiple routes to be withdrawn...." > > > > > > Actually, we don't withdraw routes, we withdraw prefixes, right? The next > > > paragraph shows this confusion, by talking about routes without attribute s, > > > but routes are prefixes combined with attributes, so.... They aren't > > > routes, they're prefixes. You remove routes based on withdrawn prefixes, I > > > think. > > > > We withdraw routes. The way BGP withdraws routes is by advertising > > the NLRI field of these routes in the Withdrawn Routes field of > > the UPDATE message. And that is precisely what the text said: > > > > An UPDATE message can list multiple routes to be withdrawn from service. > > Each such route is identified by its destination (expressed as an IP > > prefix), which unambiguously identifies the route in the context of the > > BGP speaker - BGP speaker connection to which it has been previously > > advertised. > > Hmmm... So, if you receive an update with no attributes, just prefixes in > the withdrawn section, you won't consider that a withdraw, and remove the > routes you have from the sending peer from the local tables? > > A route without the attributes is a prefix. :-) > > It depends on whether you are thinking of it in terms of what you're > sending, or what you're causing on the receiver. > > > > 5.1.1 ORIGIN > > > > > > "Its value SHOULD NOT be changed by any other speaker." > > > > > > I really think this should be "MUST NOT." I can't think of any reason it > > > wouldn't be, except in the case of aggregation, and that case could be > > > mentioned here as the only known exception (?). > > > > Please propose the text. > > It's value MUST NOT be changed by any other speaker. In the absence of any objections within a week I'll put this in the text. Yakov. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA09194 for <idr-archive@nic.merit.edu>; Tue, 20 May 2003 14:09:23 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id CAC6091267; Tue, 20 May 2003 14:08:56 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 9C60491268; Tue, 20 May 2003 14:08:56 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id EC54791267 for <idr@trapdoor.merit.edu>; Tue, 20 May 2003 14:08:54 -0400 (EDT) Received: by segue.merit.edu (Postfix) id CFDDC5DEE6; Tue, 20 May 2003 14:08:54 -0400 (EDT) Delivered-To: idr@merit.edu Received: from rtp-core-1.cisco.com (rtp-core-1.cisco.com [64.102.124.12]) by segue.merit.edu (Postfix) with ESMTP id 99C275DEE0 for <idr@merit.edu>; Tue, 20 May 2003 14:08:54 -0400 (EDT) Received: from cisco.com (uzura.cisco.com [64.102.17.77]) by rtp-core-1.cisco.com (8.12.9/8.12.6) with ESMTP id h4KI8LkL002486; Tue, 20 May 2003 14:08:21 -0400 (EDT) Received: from dhcp-64-102-60-183.cisco.com (dhcp-64-102-60-183.cisco.com [64.102.60.183]) by cisco.com (8.8.8/2.6/Cisco List Logging/8.8.8) with ESMTP id OAA19469; Tue, 20 May 2003 14:08:19 -0400 (EDT) Date: Tue, 20 May 2003 14:08:18 -0400 (EDT) From: Russ White <ruwhite@cisco.com> Reply-To: Russ White <riw@cisco.com> To: Jeffrey Haas <jhaas@nexthop.com> Cc: Yakov Rekhter <yakov@juniper.net>, idr@merit.edu, rtg-dir@ietf.org Subject: Re: Comments on BGP Draft 20..... In-Reply-To: <20030520140512.D16646@nexthop.com> Message-ID: <Pine.OSX.4.51.0305201406430.8886@dhcp-64-102-60-183.cisco.com> References: <200305201635.h4KGZ7u19549@merlot.juniper.net> <Pine.OSX.4.51.0305201307370.23356@dhcp-64-102-48-215.cisco.com> <20030520140512.D16646@nexthop.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-idr@merit.edu Precedence: bulk > > Yes, but it's not used for inscribing a graph of interconnectivity between > > the AS' in the internetwork, is it? It can be used for that, I suppose, so > > the text is fine, but it's not used for that--I think that's what's > > confusing about it. > > Perhaps to elaborate on Russ's point, the AS Path gives us the > graph for this prefix. Even with a collection of a bunch of routes, > we're not guaranteed to have the Internet's AS graph. > > The text is a *little* vague in this context, but I can't think > of better wording. I agree--I've been trying to come up with a better wayh of working it, but I can't think of one. I'd say it's too much of a nit to worry about it . :-) Russ __________________________________ riw@cisco.com CCIE <>< Grace Alone Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA09123 for <idr-archive@nic.merit.edu>; Tue, 20 May 2003 14:06:30 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 418D391266; Tue, 20 May 2003 14:06:08 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 112F991267; Tue, 20 May 2003 14:06:07 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id DEF0591266 for <idr@trapdoor.merit.edu>; Tue, 20 May 2003 14:06:06 -0400 (EDT) Received: by segue.merit.edu (Postfix) id C0DD55DEE6; Tue, 20 May 2003 14:06:06 -0400 (EDT) Delivered-To: idr@merit.edu Received: from presque.nexthop.com (dns.nexthop.com [65.247.36.216]) by segue.merit.edu (Postfix) with ESMTP id 5C3B35DF09 for <idr@merit.edu>; Tue, 20 May 2003 14:06:06 -0400 (EDT) Received: (from root@localhost) by presque.nexthop.com (8.12.8/8.11.1) id h4KI5NB6025817; Tue, 20 May 2003 14:05:23 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: from jhaas.nexthop.com (jhaas.nexthop.com [65.247.36.31]) by presque.nexthop.com (8.12.8/8.12.8) with ESMTP id h4KI5HWB025810; Tue, 20 May 2003 14:05:19 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: (from jhaas@localhost) by jhaas.nexthop.com (8.11.3nb1/8.11.3) id h4KI5Cl17752; Tue, 20 May 2003 14:05:12 -0400 (EDT) Date: Tue, 20 May 2003 14:05:12 -0400 From: Jeffrey Haas <jhaas@nexthop.com> To: Russ White <riw@cisco.com> Cc: Yakov Rekhter <yakov@juniper.net>, idr@merit.edu, rtg-dir@ietf.org Subject: Re: Comments on BGP Draft 20..... Message-ID: <20030520140512.D16646@nexthop.com> References: <200305201635.h4KGZ7u19549@merlot.juniper.net> <Pine.OSX.4.51.0305201307370.23356@dhcp-64-102-48-215.cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <Pine.OSX.4.51.0305201307370.23356@dhcp-64-102-48-215.cisco.com>; from ruwhite@cisco.com on Tue, May 20, 2003 at 01:44:16PM -0400 X-Virus-Scanned: by AMaViS perl-11 Sender: owner-idr@merit.edu Precedence: bulk On Tue, May 20, 2003 at 01:44:16PM -0400, Russ White wrote: > Yes, but it's not used for inscribing a graph of interconnectivity between > the AS' in the internetwork, is it? It can be used for that, I suppose, so > the text is fine, but it's not used for that--I think that's what's > confusing about it. Perhaps to elaborate on Russ's point, the AS Path gives us the graph for this prefix. Even with a collection of a bunch of routes, we're not guaranteed to have the Internet's AS graph. The text is a *little* vague in this context, but I can't think of better wording. -- Jeff Haas NextHop Technologies Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id NAA09034 for <idr-archive@nic.merit.edu>; Tue, 20 May 2003 13:45:15 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id DED1091253; Tue, 20 May 2003 13:44:52 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id B072D91265; Tue, 20 May 2003 13:44:52 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id F232091253 for <idr@trapdoor.merit.edu>; Tue, 20 May 2003 13:44:50 -0400 (EDT) Received: by segue.merit.edu (Postfix) id DA41F5DED9; Tue, 20 May 2003 13:44:50 -0400 (EDT) Delivered-To: idr@merit.edu Received: from rtp-core-2.cisco.com (rtp-core-2.cisco.com [64.102.124.13]) by segue.merit.edu (Postfix) with ESMTP id 77D6E5DED8 for <idr@merit.edu>; Tue, 20 May 2003 13:44:50 -0400 (EDT) Received: from cisco.com (uzura.cisco.com [64.102.17.77]) by rtp-core-2.cisco.com (8.12.9/8.12.6) with ESMTP id h4KHiGJh027888; Tue, 20 May 2003 13:44:17 -0400 (EDT) Received: from dhcp-64-102-60-183.cisco.com (dhcp-64-102-60-183.cisco.com [64.102.60.183]) by cisco.com (8.8.8/2.6/Cisco List Logging/8.8.8) with ESMTP id NAA17560; Tue, 20 May 2003 13:44:16 -0400 (EDT) Date: Tue, 20 May 2003 13:44:16 -0400 (EDT) From: Russ White <ruwhite@cisco.com> Reply-To: Russ White <riw@cisco.com> To: Yakov Rekhter <yakov@juniper.net> Cc: idr@merit.edu, rtg-dir@ietf.org Subject: Re: Comments on BGP Draft 20..... In-Reply-To: <200305201635.h4KGZ7u19549@merlot.juniper.net> Message-ID: <Pine.OSX.4.51.0305201307370.23356@dhcp-64-102-48-215.cisco.com> References: <200305201635.h4KGZ7u19549@merlot.juniper.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-idr@merit.edu Precedence: bulk > > This information is sufficeint to construct a graph of the AS connectivity > > from which routing loops may be pruned and some policy decisions at the AS > > level may be enforced. > > > > UPDATE Message Format: > > > > The information in the UPDATE message can be used to construct a graph > > describing the relationships of the various Autonomous Systems. > > > > In both cases this is true, I suppose, but in neither case does this really > > describe what the AS Path is used for, right? > > Wrong. As the abstract states quite clearly that this information is used > to prune routing loops and make some policy decisions at the AS level. Yes, but it's not used for inscribing a graph of interconnectivity between the AS' in the internetwork, is it? It can be used for that, I suppose, so the text is fine, but it's not used for that--I think that's what's confusing about it. > > 3.2 Routing Information Bases > > > > b) Loc-RIB.... > > > > I think it might be useful to state the contents of the Loc-RIB are > > actually installed in the local routing table, and thus used for forwarding > > packets on this router. I don't see anyplace this connection is made > > explicit, it seems more like it's implicit throughout the doc. > > Please propose the text. Okay: The Loc-RIB contains routes which are installed in the local routing table and used for forwarding packets received, based on the destination address, by the router. > > Network Layer Reachability Information > > > > "An UPDATE message can list multiple routes to be withdrawn...." > > > > Actually, we don't withdraw routes, we withdraw prefixes, right? The next > > paragraph shows this confusion, by talking about routes without attributes, > > but routes are prefixes combined with attributes, so.... They aren't > > routes, they're prefixes. You remove routes based on withdrawn prefixes, I > > think. > > We withdraw routes. The way BGP withdraws routes is by advertising > the NLRI field of these routes in the Withdrawn Routes field of > the UPDATE message. And that is precisely what the text said: > > An UPDATE message can list multiple routes to be withdrawn from service. > Each such route is identified by its destination (expressed as an IP > prefix), which unambiguously identifies the route in the context of the > BGP speaker - BGP speaker connection to which it has been previously > advertised. Hmmm... So, if you receive an update with no attributes, just prefixes in the withdrawn section, you won't consider that a withdraw, and remove the routes you have from the sending peer from the local tables? A route without the attributes is a prefix. :-) It depends on whether you are thinking of it in terms of what you're sending, or what you're causing on the receiver. > > 5.1.1 ORIGIN > > > > "Its value SHOULD NOT be changed by any other speaker." > > > > I really think this should be "MUST NOT." I can't think of any reason it > > wouldn't be, except in the case of aggregation, and that case could be > > mentioned here as the only known exception (?). > > Please propose the text. It's value MUST NOT be changed by any other speaker. :-) Russ __________________________________ riw@cisco.com CCIE <>< Grace Alone Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id MAA08614 for <idr-archive@nic.merit.edu>; Tue, 20 May 2003 12:35:49 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id DBB3D91261; Tue, 20 May 2003 12:35:26 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id A119491262; Tue, 20 May 2003 12:35:26 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 238C691261 for <idr@trapdoor.merit.edu>; Tue, 20 May 2003 12:35:25 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 07FD85DE63; Tue, 20 May 2003 12:35:25 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id 7DDDF5DE62 for <idr@merit.edu>; Tue, 20 May 2003 12:35:24 -0400 (EDT) Received: from juniper.net (garnet.juniper.net [172.17.28.17]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h4KGZ7u19549; Tue, 20 May 2003 09:35:07 -0700 (PDT) (envelope-from yakov@juniper.net) Message-Id: <200305201635.h4KGZ7u19549@merlot.juniper.net> To: Russ White <riw@cisco.com> Cc: idr@merit.edu, rtg-dir@ietf.org Subject: Re: Comments on BGP Draft 20..... In-Reply-To: Your message of "Fri, 09 May 2003 10:13:14 EDT." <Pine.WNT.4.53.0305090945390.2372@russpc> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <26545.1053448507.1@juniper.net> Date: Tue, 20 May 2003 09:35:07 -0700 From: Yakov Rekhter <yakov@juniper.net> Sender: owner-idr@merit.edu Precedence: bulk Russ, > > Some of these are going to echo Alex's comments, but that's okay, I think. > Mostly just nits.... > > :-) Thanks for the comments. My response is in-line... > > Russ > > __________________________________ > riw@cisco.com CCIE <>< Grace Alone > > ----- > > Abstract: > > This information is sufficeint to construct a graph of the AS connectivity > from which routing loops may be pruned and some policy decisions at the AS > level may be enforced. > > UPDATE Message Format: > > The information in the UPDATE message can be used to construct a graph > describing the relationships of the various Autonomous Systems. > > In both cases this is true, I suppose, but in neither case does this really > describe what the AS Path is used for, right? Wrong. As the abstract states quite clearly that this information is used to prune routing loops and make some policy decisions at the AS level. > I would think we'd want to > describe it less in terms of a "graph of the connectivity in the > internetwork," and more in terms of "a graph of the path through Autonomous > Systems ued to reach the destination advertised." It could be confusing, > since there isn't anyplace where we discuss building a graph of > inconnectivity between the Autonomous Systems.... > > ----- > > Forwarding Paradigm: > > This document uses the term "Autonomous System" (AS) throughout.... > > This entire paragraph is a repeat--I'd leave it just in the definitions. The definition section suppose to have a *summary* of the definitions used in the spec. > ----- > > Forwarding Paradigms: > > The initial data flow.... > > This paragraph has two different thoughts in it, one about incremental > updates, and the other about keeping data that you've received. It seems > like just putting a return after "as the routing tables change." The two are related, as the reason for keeing updates you've received is because the exchange of information is based on incremental updates. > ----- > > Forwarding Paradigms: > > The paragraph starting "KEEPALIVE messages" should, I think, be moved up > above the section on route exchange. I don't know why, it just seems less > like it's jumping all over the place that way. Disagree. > ----- > > 3.1 Routes: Advertisement and Storage > > It almost seems like the section about The initial data flow should maybe > be put entirely under this section someplace (?). > > The first paragraph in this section is really a definition of a route vs a > prefix, and should probably be in the definitions. see above. > The paragraph "Changing attribute of a route...." needs a "the," or > attribute needs an "s." Ok. > ----- > > 3.2 Routing Information Bases > > b) Loc-RIB.... > > I think it might be useful to state the contents of the Loc-RIB are > actually installed in the local routing table, and thus used for forwarding > packets on this router. I don't see anyplace this connection is made > explicit, it seems more like it's implicit throughout the doc. Please propose the text. > ----- > > Page 18, a) LOCAL_PREF > > "....to inform other peers...." should be "....to inform its other > peers...." Sure. > ----- > > Network Layer Reachability Information > > "This varibale length field contains a list of IP address prefixes." > > I think we can kill "address" here. Sure. > > a) Length > > "The Length field inidicates...." The sentence can start with > "Indicates..." I prefer to keep the current text. > > b) Prefix > > "The Prefix field indicates...." The sentence can start with > "Indicates...." Ditto. > > ----- > > Network Layer Reachability Information > > "An UPDATE message can list multiple routes to be withdrawn...." > > Actually, we don't withdraw routes, we withdraw prefixes, right? The next > paragraph shows this confusion, by talking about routes without attributes, > but routes are prefixes combined with attributes, so.... They aren't > routes, they're prefixes. You remove routes based on withdrawn prefixes, I > think. We withdraw routes. The way BGP withdraws routes is by advertising the NLRI field of these routes in the Withdrawn Routes field of the UPDATE message. And that is precisely what the text said: An UPDATE message can list multiple routes to be withdrawn from service. Each such route is identified by its destination (expressed as an IP prefix), which unambiguously identifies the route in the context of the BGP speaker - BGP speaker connection to which it has been previously advertised. > > ------ > > 5. Path Attributes > > "Well-known attributes MUST be recognized by all BGP implementations." > > This sentence, as strange as it may sound, implies it's the attributes > fault if the BGP implementation doesn't recogonize it, that it's up to the > attribute definers to, in some way, make certain that BGP implementations > will recognize it. I think it should probably be worded the other way > 'round: > > "BGP implementations MUST recognize all well-known attributes." Sure. > ----- > > 5. Path Attributes > > "All well-known attributes MUST be passed along (after proper updating, if > necessary) to other BGP peers." > > This just seems a little rough. Maybe this: > > "Once a BGP peer has updated any well-known attributes, it MUST pass these > attributes in any updates it transmits to its peers." Sure. > > ----- > > 5.1.1 ORIGIN > > "Its value SHOULD NOT be changed by any other speaker." > > I really think this should be "MUST NOT." I can't think of any reason it > wouldn't be, except in the case of aggregation, and that case could be > mentioned here as the only known exception (?). Please propose the text. Yakov. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id AAA19599 for <idr-archive@nic.merit.edu>; Mon, 19 May 2003 00:05:08 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id AC52F91208; Mon, 19 May 2003 00:04:49 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 6DD379124E; Mon, 19 May 2003 00:04:49 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 4FDDB91208 for <idr@trapdoor.merit.edu>; Mon, 19 May 2003 00:03:16 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 0A6665DE35; Mon, 19 May 2003 00:03:16 -0400 (EDT) Delivered-To: idr@merit.edu Received: from mailout3.samsung.com (u33.gpu114.samsung.co.kr [203.254.224.33]) by segue.merit.edu (Postfix) with ESMTP id 5012F5DE27 for <idr@merit.edu>; Mon, 19 May 2003 00:03:15 -0400 (EDT) Received: from custom-daemon.mailout3.samsung.com by mailout3.samsung.com (iPlanet Messaging Server 5.2 HotFix 1.05 (built Nov 6 2002)) id <0HF4007018LCDZ@mailout3.samsung.com> for idr@merit.edu; Mon, 19 May 2003 13:03:12 +0900 (KST) Received: from ep_mmp1 (localhost [127.0.0.1]) by mailout3.samsung.com (iPlanet Messaging Server 5.2 HotFix 1.05 (built Nov 6 2002)) with ESMTP id <0HF4004HD8LBTT@mailout3.samsung.com> for idr@merit.edu; Mon, 19 May 2003 13:03:12 +0900 (KST) Received: from Manav ([107.108.3.180]) by mmp1.samsung.com (iPlanet Messaging Server 5.2 HotFix 1.05 (built Nov 6 2002)) with ESMTPA id <0HF400CE58LAZE@mmp1.samsung.com> for idr@merit.edu; Mon, 19 May 2003 13:03:11 +0900 (KST) Date: Mon, 19 May 2003 09:29:21 +0530 From: Manav Bhatia <manav@samsung.com> Subject: Re: FW: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt To: Mareline Sheldon <marelines@yahoo.com> Cc: idr@merit.edu Reply-To: Manav Bhatia <manav@samsung.com> Message-id: <00b701c31dbb$082729a0$b4036c6b@sisodomain.com> MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Mailer: Microsoft Outlook Express 6.00.2800.1158 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT X-Priority: 3 X-MSMail-priority: Normal References: <20030518042424.71655.qmail@web20310.mail.yahoo.com> Sender: owner-idr@merit.edu Precedence: bulk Mareline, The design implicitly takes care of the scenario you explain. Though I confess that I have not been really clear on this in this version of the draft. To advertise two ECMP routes with different attributes we will use two UPDATEs where each will be sent with a blank ECMP_NEXT_HOP attribute (nos. of next-hops will be kept as zero). The receiver upon receiving such UPDATEs will know that since the ECMP_NEXT_HOP attribute is present in the UPDATE, it needs to be added in addition to what it has already. I guess the following text needs to be added in the draft. The receiver SHOULD not remove any previous route and add the route received with an ECMP_NEXT_HOP attribute rather than replace the previous routes. When advertising more than one ECMP hop with identical attributes the sender SHOULD send a single update with multiple hops listed in the ECMP_NEXT_HOP attribute. When advertising more than one ECMP hop which do not have identical attributes multiple BGP updates MUST be sent with the ECMP_NEXT_HOP attribute included to suppress route replacement. But a more important question is that whether we need this kind of mechanism in BGP or not. We already have multiple drafts proposed which strive to achieve similar goals using different techniques and mechanisms. I guess one motivation being, to allow inter-operatibility between different vendors to allow advertisement of multiple BGP paths of same preference. Once we're through with the above discussion, we can sit down and look into the nitty-gritties of each proposal. Regards, Manav ----- Original Message ----- From: "Mareline Sheldon" <marelines@yahoo.com> To: "Manav Bhatia" <manav@samsung.com> Cc: <idr@merit.edu> Sent: Sunday, May 18, 2003 9:54 AM Subject: Re: FW: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt > Manav, > Can we using this draft advertise two ECMP routes of equal preference but say, with different > AS Paths. As far as i could gather you use one additional attribute to describe equal cost > routes using the same path attributes. This way you can definitely advertise routes with all > the same attributes. But what happens say when one of my path attributes are different. Eg. AS > PATH 112 123 in one attribute and AS PATH 564 232 in the other? > > Can this be done here? > > Regards, > Mareline S. > > --- Manav Bhatia <manav@samsung.com> wrote: > > Hi, > > Please look into this new Internet draft which is available from the > > on-line Internet-Drafts directories. > > > > I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt > > To: IETF-Announce: ; > > Subject: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt > > From: Internet-Drafts@ietf.org > > Date: Thu, 15 May 2003 07:19:32 -0400 > > Reply-to: Internet-Drafts@ietf.org > > Sender: owner-ietf-announce@ietf.org > > > > A New Internet-Draft is available > > > > Title : Advertising Equal Cost Multi-Path (ECMP) routes in BGP > > Author(s) : M. Bhatia > > Filename : draft-bhatia-ecmp-routes-in-bgp-00.txt > > Pages : 7 > > Date : 2003-5-14 > > > > This document describes an extensible mechanism that will allow a BGP > > [BGP4] speaker to advertise equal cost multi-path (ECMP) routes for a > > destination to its peers without changing the semantics of the UPDATE > > message. > > > > A new BGP attribute is introduced that will be used to advertise the > > multiple next hops for the feasible and the un-feasible ECMP BGP routes to > > the remote peers. > > > > The mechanisms described in this document are applicable to all routers, > > both those with the ability to inject multiple routing entries in their > > forwarding table and those without (although the latter need not implement > > some extensions described in this document). > > > > A URL for this Internet-Draft is: > > http://www.ietf.org/internet-drafts/draft-bhatia-ecmp-routes-in-bgp-00.txt > > > > > > To remove yourself from the IETF Announcement list, send a message to > > ietf-announce-request with the word unsubscribe in the body of the message. > > > > Internet-Drafts are also available by anonymous FTP. Login with the > > username "anonymous" and a password of your e-mail address. After logging > > in, type "cd internet-drafts" and then > > "get draft-bhatia-ecmp-routes-in-bgp-00.txt". > > > > > > > > > __________________________________ > Do you Yahoo!? > The New Yahoo! Search - Faster. Easier. Bingo. > http://search.yahoo.com > Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id AAA10768 for <idr-archive@nic.merit.edu>; Sun, 18 May 2003 00:24:46 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 0132F91246; Sun, 18 May 2003 00:24:27 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id BABDA91248; Sun, 18 May 2003 00:24:26 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 9886791246 for <idr@trapdoor.merit.edu>; Sun, 18 May 2003 00:24:25 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 713DF5E40A; Sun, 18 May 2003 00:24:25 -0400 (EDT) Delivered-To: idr@merit.edu Received: from web20310.mail.yahoo.com (web20310.mail.yahoo.com [216.136.226.91]) by segue.merit.edu (Postfix) with SMTP id 065695E406 for <idr@merit.edu>; Sun, 18 May 2003 00:24:25 -0400 (EDT) Message-ID: <20030518042424.71655.qmail@web20310.mail.yahoo.com> Received: from [219.65.142.150] by web20310.mail.yahoo.com via HTTP; Sat, 17 May 2003 21:24:24 PDT Date: Sat, 17 May 2003 21:24:24 -0700 (PDT) From: Mareline Sheldon <marelines@yahoo.com> Subject: Re: FW: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt To: Manav Bhatia <manav@samsung.com> Cc: idr@merit.edu In-Reply-To: <068e01c31ae3$20cfe620$b4036c6b@sisodomain.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-idr@merit.edu Precedence: bulk Manav, Can we using this draft advertise two ECMP routes of equal preference but say, with different AS Paths. As far as i could gather you use one additional attribute to describe equal cost routes using the same path attributes. This way you can definitely advertise routes with all the same attributes. But what happens say when one of my path attributes are different. Eg. AS PATH 112 123 in one attribute and AS PATH 564 232 in the other? Can this be done here? Regards, Mareline S. --- Manav Bhatia <manav@samsung.com> wrote: > Hi, > Please look into this new Internet draft which is available from the > on-line Internet-Drafts directories. > > I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt > To: IETF-Announce: ; > Subject: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt > From: Internet-Drafts@ietf.org > Date: Thu, 15 May 2003 07:19:32 -0400 > Reply-to: Internet-Drafts@ietf.org > Sender: owner-ietf-announce@ietf.org > > A New Internet-Draft is available > > Title : Advertising Equal Cost Multi-Path (ECMP) routes in BGP > Author(s) : M. Bhatia > Filename : draft-bhatia-ecmp-routes-in-bgp-00.txt > Pages : 7 > Date : 2003-5-14 > > This document describes an extensible mechanism that will allow a BGP > [BGP4] speaker to advertise equal cost multi-path (ECMP) routes for a > destination to its peers without changing the semantics of the UPDATE > message. > > A new BGP attribute is introduced that will be used to advertise the > multiple next hops for the feasible and the un-feasible ECMP BGP routes to > the remote peers. > > The mechanisms described in this document are applicable to all routers, > both those with the ability to inject multiple routing entries in their > forwarding table and those without (although the latter need not implement > some extensions described in this document). > > A URL for this Internet-Draft is: > http://www.ietf.org/internet-drafts/draft-bhatia-ecmp-routes-in-bgp-00.txt > > > To remove yourself from the IETF Announcement list, send a message to > ietf-announce-request with the word unsubscribe in the body of the message. > > Internet-Drafts are also available by anonymous FTP. Login with the > username "anonymous" and a password of your e-mail address. After logging > in, type "cd internet-drafts" and then > "get draft-bhatia-ecmp-routes-in-bgp-00.txt". > > > __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id MAA28927 for <idr-archive@nic.merit.edu>; Thu, 15 May 2003 12:54:06 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 1AC7B9130E; Thu, 15 May 2003 12:53:00 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id F226B9134F; Thu, 15 May 2003 12:52:53 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id C8CC79135D for <idr@trapdoor.merit.edu>; Thu, 15 May 2003 12:52:29 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 7F3E85DF5C; Thu, 15 May 2003 12:52:29 -0400 (EDT) Delivered-To: idr@merit.edu Received: from workhorse.fictitious.org (workhorse.fictitious.org [209.150.1.230]) by segue.merit.edu (Postfix) with ESMTP id AC0F05DF60 for <idr@merit.edu>; Thu, 15 May 2003 12:52:28 -0400 (EDT) Received: from workhorse.fictitious.org (localhost.fictitious.org [127.0.0.1]) by workhorse.fictitious.org (8.9.3/8.9.3) with ESMTP id MAA85953; Thu, 15 May 2003 12:50:41 -0400 (EDT) (envelope-from curtis@workhorse.fictitious.org) Message-Id: <200305151650.MAA85953@workhorse.fictitious.org> To: Mireille Shammas <mireille.shammas@Alcatel.com> Cc: Manav Bhatia <manav@samsung.com>, idr@merit.edu Reply-To: curtis@fictitious.org Subject: Re: FW: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt In-reply-to: Your message of "Thu, 15 May 2003 10:38:44 EDT." <3EC3A674.7142D847@alcatel.com> Date: Thu, 15 May 2003 12:50:40 -0400 From: Curtis Villamizar <curtis@fictitious.org> Sender: owner-idr@merit.edu Precedence: bulk In message <3EC3A674.7142D847@alcatel.com>, Mireille Shammas writes: > Hi Manav, > In this draft you don't mention anything about ECMP in a BGP/MPLS VPN network > and more precisely ECMP between PE and CE where EBGP is used. Note that IBGP > is > used to relay the MPLS label information between PEs. I think this will total > ly > depend on how you choose the labels on a local PE to advertise to a remote PE > to carry the VPN traffic back to the CE. Sample scenario below: > > | |-------ebgp session--------| > | | | > | CE|-------ebgp session--------| PE1| =======IBGP/MPLS===========|PE2| ..... > .. > > | |-------ebgp session--------| > | | | > > I am mainly interested in what will happen on PE1, and how to balance > VPN/labelled traffic coming from PE1 towards CE . > Thanks > Mireille The MPLS LSP ends at PE1. PE2 doesn't care what happens at the other end of the LSP. PE1 is free to do the load split exactly as it does now without telling PE2 that there are more than one next-hop. The more interesting (difficult) cases are where PE1 has EBGP peers that advertise routes with different but equal preference attributes and it doesn't matter at all whether MPLS is in use. The really difficult (and common) case is where equal cost routes are advertised by two border routers into the IBGP mesh and a router internal to the mesh splits among the two routers. This router does not advertise anything to BGP about this route and therefore cannot advertise it as a multipath. Curtis Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id KAA25054 for <idr-archive@nic.merit.edu>; Thu, 15 May 2003 10:39:11 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id D0A54912BF; Thu, 15 May 2003 10:38:48 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 9C050912C0; Thu, 15 May 2003 10:38:48 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 6399A912BF for <idr@trapdoor.merit.edu>; Thu, 15 May 2003 10:38:47 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 4EB7E5DF04; Thu, 15 May 2003 10:38:47 -0400 (EDT) Delivered-To: idr@merit.edu Received: from kanmx2.ca.alcatel.com (kanfw1.ottawa.alcatel.ca [192.75.23.69]) by segue.merit.edu (Postfix) with SMTP id B43155DEE1 for <idr@merit.edu>; Thu, 15 May 2003 10:38:46 -0400 (EDT) Received: (qmail 22762 invoked from network); 15 May 2003 14:41:14 -0000 Received: from unknown (HELO alcatel.com) (138.120.105.202) by kanmx2.ca.alcatel.com with SMTP; 15 May 2003 14:41:14 -0000 Message-ID: <3EC3A674.7142D847@alcatel.com> Date: Thu, 15 May 2003 10:38:44 -0400 From: Mireille Shammas <mireille.shammas@alcatel.com> Organization: Alcatel CID X-Mailer: Mozilla 4.79 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Manav Bhatia <manav@samsung.com> Cc: idr@merit.edu Subject: Re: FW: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt References: <068e01c31ae3$20cfe620$b4036c6b@sisodomain.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-idr@merit.edu Precedence: bulk Hi Manav, In this draft you don't mention anything about ECMP in a BGP/MPLS VPN network and more precisely ECMP between PE and CE where EBGP is used. Note that IBGP is used to relay the MPLS label information between PEs. I think this will totally depend on how you choose the labels on a local PE to advertise to a remote PE to carry the VPN traffic back to the CE. Sample scenario below: | |-------ebgp session--------| | | | | CE|-------ebgp session--------| PE1| =======IBGP/MPLS===========|PE2| ....... | |-------ebgp session--------| | | | I am mainly interested in what will happen on PE1, and how to balance VPN/labelled traffic coming from PE1 towards CE . Thanks Mireille Manav Bhatia wrote: > Hi, > Please look into this new Internet draft which is available from the > on-line Internet-Drafts directories. > > I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt > To: IETF-Announce: ; > Subject: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt > From: Internet-Drafts@ietf.org > Date: Thu, 15 May 2003 07:19:32 -0400 > Reply-to: Internet-Drafts@ietf.org > Sender: owner-ietf-announce@ietf.org > > A New Internet-Draft is available > > Title : Advertising Equal Cost Multi-Path (ECMP) routes in BGP > Author(s) : M. Bhatia > Filename : draft-bhatia-ecmp-routes-in-bgp-00.txt > Pages : 7 > Date : 2003-5-14 > > This document describes an extensible mechanism that will allow a BGP > [BGP4] speaker to advertise equal cost multi-path (ECMP) routes for a > destination to its peers without changing the semantics of the UPDATE > message. > > A new BGP attribute is introduced that will be used to advertise the > multiple next hops for the feasible and the un-feasible ECMP BGP routes to > the remote peers. > > The mechanisms described in this document are applicable to all routers, > both those with the ability to inject multiple routing entries in their > forwarding table and those without (although the latter need not implement > some extensions described in this document). > > A URL for this Internet-Draft is: > http://www.ietf.org/internet-drafts/draft-bhatia-ecmp-routes-in-bgp-00.txt > > To remove yourself from the IETF Announcement list, send a message to > ietf-announce-request with the word unsubscribe in the body of the message. > > Internet-Drafts are also available by anonymous FTP. Login with the > username "anonymous" and a password of your e-mail address. After logging > in, type "cd internet-drafts" and then > "get draft-bhatia-ecmp-routes-in-bgp-00.txt". Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id JAA21781 for <idr-archive@nic.merit.edu>; Thu, 15 May 2003 09:13:05 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id A7480912BA; Thu, 15 May 2003 09:12:36 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 74A00912BB; Thu, 15 May 2003 09:12:36 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 27F3A912BA for <idr@trapdoor.merit.edu>; Thu, 15 May 2003 09:12:35 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 150305DECD; Thu, 15 May 2003 09:12:35 -0400 (EDT) Delivered-To: idr@merit.edu Received: from mailout1.samsung.com (u24.gpu114.samsung.co.kr [203.254.224.24]) by segue.merit.edu (Postfix) with ESMTP id BD3755DECC for <idr@merit.edu>; Thu, 15 May 2003 09:12:34 -0400 (EDT) Received: from custom-daemon.mailout1.samsung.com by mailout1.samsung.com (iPlanet Messaging Server 5.2 HotFix 1.05 (built Nov 6 2002)) id <0HEX00801JCWG2@mailout1.samsung.com> for idr@merit.edu; Thu, 15 May 2003 22:12:32 +0900 (KST) Received: from ep_mmp2 (localhost [127.0.0.1]) by mailout1.samsung.com (iPlanet Messaging Server 5.2 HotFix 1.05 (built Nov 6 2002)) with ESMTP id <0HEX00J5RJCVYY@mailout1.samsung.com> for idr@merit.edu; Thu, 15 May 2003 22:12:32 +0900 (KST) Received: from Manav ([107.108.3.180]) by mmp2.samsung.com (iPlanet Messaging Server 5.2 HotFix 1.05 (built Nov 6 2002)) with ESMTPA id <0HEX001NUJCUKJ@mmp2.samsung.com> for idr@merit.edu; Thu, 15 May 2003 22:12:31 +0900 (KST) Date: Thu, 15 May 2003 18:38:48 +0530 From: Manav Bhatia <manav@samsung.com> Subject: FW: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt To: idr@merit.edu Reply-To: Manav Bhatia <manav@samsung.com> Message-id: <068e01c31ae3$20cfe620$b4036c6b@sisodomain.com> MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Mailer: Microsoft Outlook Express 6.00.2800.1158 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT X-Priority: 3 X-MSMail-priority: Normal Sender: owner-idr@merit.edu Precedence: bulk Hi, Please look into this new Internet draft which is available from the on-line Internet-Drafts directories. I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt To: IETF-Announce: ; Subject: I-D ACTION:draft-bhatia-ecmp-routes-in-bgp-00.txt From: Internet-Drafts@ietf.org Date: Thu, 15 May 2003 07:19:32 -0400 Reply-to: Internet-Drafts@ietf.org Sender: owner-ietf-announce@ietf.org A New Internet-Draft is available Title : Advertising Equal Cost Multi-Path (ECMP) routes in BGP Author(s) : M. Bhatia Filename : draft-bhatia-ecmp-routes-in-bgp-00.txt Pages : 7 Date : 2003-5-14 This document describes an extensible mechanism that will allow a BGP [BGP4] speaker to advertise equal cost multi-path (ECMP) routes for a destination to its peers without changing the semantics of the UPDATE message. A new BGP attribute is introduced that will be used to advertise the multiple next hops for the feasible and the un-feasible ECMP BGP routes to the remote peers. The mechanisms described in this document are applicable to all routers, both those with the ability to inject multiple routing entries in their forwarding table and those without (although the latter need not implement some extensions described in this document). A URL for this Internet-Draft is: http://www.ietf.org/internet-drafts/draft-bhatia-ecmp-routes-in-bgp-00.txt To remove yourself from the IETF Announcement list, send a message to ietf-announce-request with the word unsubscribe in the body of the message. Internet-Drafts are also available by anonymous FTP. Login with the username "anonymous" and a password of your e-mail address. After logging in, type "cd internet-drafts" and then "get draft-bhatia-ecmp-routes-in-bgp-00.txt". Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id SAA26035 for <idr-archive@nic.merit.edu>; Wed, 14 May 2003 18:10:08 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 9C2C2912B2; Wed, 14 May 2003 18:07:57 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 40D40912B3; Wed, 14 May 2003 18:07:57 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 24D03912B2 for <idr@trapdoor.merit.edu>; Wed, 14 May 2003 18:07:49 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 0AC665E241; Wed, 14 May 2003 18:07:49 -0400 (EDT) Delivered-To: idr@merit.edu Received: from workhorse.fictitious.org (workhorse.fictitious.org [209.150.1.230]) by segue.merit.edu (Postfix) with ESMTP id BD8E75E23D for <idr@merit.edu>; Wed, 14 May 2003 18:07:47 -0400 (EDT) Received: from workhorse.fictitious.org (localhost.fictitious.org [127.0.0.1]) by workhorse.fictitious.org (8.9.3/8.9.3) with ESMTP id SAA82275; Wed, 14 May 2003 18:06:09 -0400 (EDT) (envelope-from curtis@workhorse.fictitious.org) Message-Id: <200305142206.SAA82275@workhorse.fictitious.org> To: David Meyer <dmm@maoz.com> Cc: Curtis Villamizar <curtis@fictitious.org>, idr@merit.edu Reply-To: curtis@fictitious.org Subject: Re: draft-ietf-idr-bgp-analysis-03.txt In-reply-to: Your message of "Wed, 14 May 2003 13:43:55 PDT." <20030514134355.A26408@maoz.com> Date: Wed, 14 May 2003 18:06:08 -0400 From: Curtis Villamizar <curtis@fictitious.org> Sender: owner-idr@merit.edu Precedence: bulk In message <20030514134355.A26408@maoz.com>, David Meyer writes: > > Curtis, > > These are great comments. One point > > >> ------------ > >> > >> The following statement is incorrect: > >> > >> Finally, since the dynamic properties of the network cannot be > >> quantitatively bounded, stability must be addressed via heuristics > >> such as BGP Route Flap Damping [RFC2439]. Due to the nature of BGP, > >> such damping should be viewed as a matter local to an autonomous > >> system matter (see also Appendix F.2 of [BGP4]). > >> > >> The amount of change is inherently bounded in BGP (as I described > >> above). BGP Route Flap Damping was initially proposed for two > >> reasons, 1) to protect a specific commercial implementation that was > >> not sufficiently robust, 2) to improve convergence of stable routes. > >> BGP Route Flap Damping is not necessary to bound the amount of change > >> in BPG routing. > > Yes, but route flap dampening is just a heuristic that we > use because the dynamics can't be bounded; that's what I > was after. Do you disagree with this? > > Dave I made a comment on that later in my earlier note but I'll provide more detail. There were two initial motivations for BGP Route Flap Damping. One was a certain BGP implementation wasn't robust when we first started but was much more robust by the time the RFC came out. Initially if you pushed hard enough it would fall over. The second reason was even after implementations out there were all quite robust, convergence for stable routes was a lot slower than we'd like if there were a lot of unstable routes. [A third reason was a certain router would drop packets when it had a route cache implementation but we'd all rather forget about that design error.] Anyway these reasons motivated the idea and kept it going. BGP is inherently work conserving, meaning that above some amount of incoming change for a given number of prefixes, the amount of work and the amount of outgoing change was either bound by the number of flapping prefixes or constant above some number of prefixes (saturated CPU). This amount of work and outgoing (advertised out) churn is bounded but still quite high. BGP Route Flap Damping was deployed in recognition of the fact that a very small percentage of the prefixes representing an even smaller part of the reachable address space where contributing most of the churn and the vast majority of prefixes were quite stable. That was the practical reason that providers were willing to turn on BGP Route Flap Damping and have to deal with the occasional headache of clearing the history for prefixes that they'd rather not get damped (or where the problem was know to have been fixed - like when some other NOC called). [ As a side note, BGP Route Flap Damping was never implemented correctly. It should take the AS path into consideration and only damp an unstable path if another existed that was quite stable, not damp the prefix. It was known that per prefix damping would cause problems so in the spec, the AS path as part of the key field was not optional. If it had been implemented correctly we would not have the problems with BGP Route Flap Damping that we have experienced. ] My argument amounts to 1) route churn IS bounded, however 2) BGP Route Flap Damping exists because a) the bounds is uncomfortably high, b) there used to be some broken implementations in use that were not sufficiently robust, c) a small percentage of prefixes contributed most of the churn, d) getting rid of that small percentage was percieved as good for the Internet at large (the majority of stable prefixes and reachable destinations). Curtis Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id NAA16226 for <idr-archive@nic.merit.edu>; Wed, 14 May 2003 13:16:29 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 1F49B9121F; Wed, 14 May 2003 13:15:40 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id D6FF891256; Wed, 14 May 2003 13:15:39 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 40E089121F for <idr@trapdoor.merit.edu>; Wed, 14 May 2003 13:15:38 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 0E5275E156; Wed, 14 May 2003 13:15:37 -0400 (EDT) Delivered-To: idr@merit.edu Received: from halt-in.cisco.com (halt-in.cisco.com [171.70.144.185]) by segue.merit.edu (Postfix) with ESMTP id 5720D5DF3F for <idr@merit.edu>; Wed, 14 May 2003 13:15:36 -0400 (EDT) Received: from cisco.com (171.71.163.13) by halt-in.cisco.com with ESMTP; 14 May 2003 10:15:51 -0800 Received: from cisco.com (keyupate-lnx.cisco.com [128.107.165.20]) by mira-sjc5-f.cisco.com (Mirapoint Messaging Server MOS 3.3.3-GR) with ESMTP id AGF19081; Wed, 14 May 2003 10:22:35 -0700 (PDT) Message-ID: <3EC279B0.1010107@cisco.com> Date: Wed, 14 May 2003 10:15:28 -0700 From: Keyur Patel <keyupate@cisco.com> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: curtis@fictitious.org Cc: idr@merit.edu, David Meyer <dmm@maoz.com> Subject: Re: draft-ietf-idr-bgp-analysis-03.txt References: <200305141408.KAA78532@workhorse.fictitious.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-idr@merit.edu Precedence: bulk Curtis: Thanks for your comments. We will incorporate them in next revision. -Keyur Curtis Villamizar wrote: >David, Keyur, > >I have some suggestions for improvement to this draft ( BGP-4 Protocol >Analysis <draft-ietf-idr-bgp-analysis-03.txt>). See comments inline >below. Feel free to take what you consider valid and worth changing. > >Curtis > > >In "1. Introduction" you should mention that BGP4 was the first to >support CIDR and due to their lack of support for CIDR versions 1-3 >are considered obsolete and unusable in today's Internet. > >------------ > >Somewhere in key features it should be mentioned that BGP makes the >assumption that packets are routed from source towards destination >independent of the source. A good place for this would be near the >statement "BGP does not make any assumptions about intra-autonomous >system routing protocols deployed within the various autonomous >systems". Or refer to the statement in the beginning of >"7. Applicability". > >------------ > >In the following paragraph, why don't we just say that this algorithm >is referred to as a "Path Vector" algorithm. > > BGP uses an algorithm that is neither a pure distance vector > algorithm or a pure link state algorithm. It is instead a modified > distance vector algorithm that uses path information to avoid > traditional distance vector problems. Each route within BGP pairs > destination with path information to that destination. Path > information (also known as AS_PATH information) is stored within the > AS_PATH attribute in BGP. This allows BGP to reconstruct large > portions of overall topology whenever required. > >------------ > >Alex probably made you put in some FSM stuff (don't read too much into >the wording here). I don't think it belongs in this document. That >is clearly something for the protocol spec. > >------------ > >In the section "4. BGP Persistent Peer Oscillations" or in a nearby >section (preferable) it should be mentioned that BGP is work >conserving. Here is some suggested text: > > A robust BGP implementation is work conserving. This means that if > the number of prefixes is bound, arbitrarily high levels of route > change can be tolerated with bounded impact on route convergence > for occasionaly changes in generally stable routes. > > A BGP implementation under high load conditions should empty as > much inbound routing updates from its input streams, processing > only the most recent route if the route for a given NLRI changes > multiple times. TCP also provides blocking on the writes on the > sender side. A BGP implementation under load should expect blocks > on write calls and send only the most recent routes when sockets > unblock rather than sending entire history. > > A robust implemention of BGP should have the following > characteristics: > > 1. It is able to operate in almost arbitrarily high levels of > route flap without loosing peerings (failing to send > keepalives) or loosing other protocol adjacencies as a > result of BGP load. > > 2. Instability of a subset of routes should not affect the > route advertisements or forwarding associated with the set > of stable routes. > > 3. High levels of instability and peers of different CPU speed > or load resulting in faster or slower processing of routes > should not cause instability and should have a bounded > impact on the convergence time for generally stable routes. > > Numerous robust BGP implementations exist. Producing a robust > implementation is not a trivial matter but clearly acheivable. > >------------ > >I find the following paragraph problematic without further >explanation. > > It is important to note that BGP does not require all the routers > within an autonomous system to participate in the BGP protocol. In > particular, only the border routers that provide connectivity between > the local autonomous system and their adjacent autonomous systems > need participate in BGP. The ability to constrain the set of BGP > speakers is one way to address scaling issues. > >Either you need to default to the borders and exit at any border or >you need some mechanism to tunnel between border routers for a pure >transport network. I favor removing the above paragraph. Tunnelling >to remove BGP is out of scope. Default routing to reach any arbitrary >border need not be mentioned. Things which actually do improve >scaling within an AS are RR and confeds. > >------------ > >Section "5.1 Link bandwidth and CPU utilization" may still be overly >simplistic and as a result may be incorrect. See comments below. > > >------------ > >In terms of bandwidth, the number of unique AS paths in practice is a >small number compared to the number of NLRI. Since many NLRI are >packed in a single update with the AS path included only once, in >practice the number of NLRI completely dominates the amount of >bandwidth consumed. > >The MR = 4 * (N + (M * A)) may be inaccurate for the reasons I gave in >the prior paragraph. The M*A may drasticly understate the impact of >the unique AS paths. Instead of defining A as the number of AS, A >could be defined as the number of unique AS paths with M*A then being >average AS path length times number of unique AS paths. > >Also why is both memory and bandwidth represented as MR? Wouldn't BW >be a better variable name for bandwidth? > >The O(C * M) thing in the next paragraph is also invalid above some >value of C but for different reasons. Above some value of C, either >the sender will begin pacing it sending of updates on its own >(suppressing multiple changes over very short periods) or the receiver >will be unable to keep up with the rate of change and force >suppression of multiple changes over very short periods by causing the >BGP socket to block on the sender. > >------------ > >The following statement is incorrect: > > Finally, since the dynamic properties of the network cannot be > quantitatively bounded, stability must be addressed via heuristics > such as BGP Route Flap Damping [RFC2439]. Due to the nature of BGP, > such damping should be viewed as a matter local to an autonomous > system matter (see also Appendix F.2 of [BGP4]). > >The amount of change is inherently bounded in BGP (as I described >above). BGP Route Flap Damping was initially proposed for two >reasons, 1) to protect a specific commercial implementation that was >not sufficiently robust, 2) to improve convergence of stable routes. >BGP Route Flap Damping is not necessary to bound the amount of change >in BPG routing. > >------------ > >We can drop the following comparison to a historic protocol: > > It may also be instructive to compare bandwidth and CPU requirements > of BGP with the Exterior Gateway Protocol (EGP). While with BGP the > complete information is exchanged only at the connection > establishment time, with EGP the complete information is exchanged > periodically (usually every 3 minutes). Note that both for BGP and > for EGP the amount of information exchanged is roughly on the order > of the number of networks reachable via a peer that sends the > information. Therefore, even if one assumes extreme instabilities of > BGP, its worst case behavior will be the same as the steady state > behavior of its predecessor, EGP. > > Operational experience with BGP showed that the incremental update > approach employed by BGP provides qualitative improvement in both > bandwidth and CPU utilization when compared with complete periodic > updates used by EGP (see also presentation by Dennis Ferguson at the > Twentieth IETF, March 11-15, 1991, St. Louis). > >We should drop other references to EGP. > >------------ > >In "5.1.2. Memory requirements", the MR = O((N + M * A) * K) the same >comment applies regarding the M*A term. A should be unique AS paths, >not number of AS and is not multiplied by K. In practice the K term >is small because it is the number of peers sending full routing, which >is generally much less than the worst case number of peers. Large >providers who carry full routing typically send each other only their >customer routes to avoid providing free transit to each other. This >reduces the impact of K. > >------------ > >We can drop: > > It is interesting to note that prior to the introduction of BGP in > the NSFNET Backbone, memory requirements on the NSFNET Backbone > routers running EGP were on the order of O(N *K). > >------------ > >In the MR = ((N*4) + (M*A)*2) * K, we can make this quite accurate by >defining N as the average number of routes advertised by each peer and >A as the number of unique AS paths, moving (M*A)*2) outside of the *K >and changing N*4 to N*R and (M*A)*2) to (M*A)*P) where R is the number >of bytes required to store a route and P is the number of bytes needed >to store one AS in an AS path. If K is small, then some overhead such >as the patricia trie storage figures into R and if K is large, the >data structures may be linked lists off the patricia trie. Claiming >that a route can be stored in 4 bytes is rather naive. > >The N*R*K term in practice dominates over the (M*A)*P term. If we >conservatively estimate a route as taking 16 bytes (large K allowing >patricia trie overhead to be ignored, indices or pointers to unique AS >Path or other attributes, etc). If we wanted to include a term to >make things accurate for small K we could add U*X where U is the >number of unique NRLI and X is the overhead per unique NLRI (and I ran >out of useful single letters). Typically X is greater than R. > >------------ > > Interestingly, in his review of the BGP protocol for the BGP review > committee in March of 1990, Paul Tsuchiya noted that "BGP does not > scale well. > >Paul was wrong. It does scale well and that's why it is being used. >BGP is a solution that scales as well as the problem allows and no >better. > >------------ > >In "10. Security Considerations" we can reference the separate >security analysis document. Or maybe not. > > > > Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id LAA12318 for <idr-archive@nic.merit.edu>; Wed, 14 May 2003 11:10:18 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 2FDC19129B; Wed, 14 May 2003 11:09:44 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id F376E9129C; Wed, 14 May 2003 11:09:43 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 63B849129B for <idr@trapdoor.merit.edu>; Wed, 14 May 2003 11:09:42 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 512425E0F7; Wed, 14 May 2003 11:09:42 -0400 (EDT) Delivered-To: idr@merit.edu Received: from workhorse.fictitious.org (workhorse.fictitious.org [209.150.1.230]) by segue.merit.edu (Postfix) with ESMTP id B5C3C5E089 for <idr@merit.edu>; Wed, 14 May 2003 11:09:39 -0400 (EDT) Received: from workhorse.fictitious.org (localhost.fictitious.org [127.0.0.1]) by workhorse.fictitious.org (8.9.3/8.9.3) with ESMTP id LAA79394; Wed, 14 May 2003 11:08:04 -0400 (EDT) (envelope-from curtis@workhorse.fictitious.org) Message-Id: <200305141508.LAA79394@workhorse.fictitious.org> To: "Joris Dobbelsteen" <joris.dobbelsteen@mail.com> Cc: "'IDR WG (E-mail)'" <idr@merit.edu> Reply-To: curtis@fictitious.org Subject: Re: Issue 19) Security Considerations In-reply-to: Your message of "Mon, 14 Apr 2003 23:32:56 +0200." <001201c302cd$6ba36f10$0d0ca8c0@joris2k.local> Date: Wed, 14 May 2003 11:08:04 -0400 From: Curtis Villamizar <curtis@fictitious.org> Sender: owner-idr@merit.edu Precedence: bulk In message <001201c302cd$6ba36f10$0d0ca8c0@joris2k.local>, "Joris Dobbelsteen" writes: > Curtis, it was not my intension to upset you in any way. Still it looks > like I successed doing that....... Any way, we do need some feedback > from practical deployments. I intended to respond to this message after clearing a few things up, but it got deferred for much longer than it should have. No offense taken. None intended either. > My personal consideration would be not to put it in a chapter 4.2 but > get it throughout the entire draft. That would be fine. > Packet filters provide the good protection for IBGP sessions. These are > harder/impractical for EBGP(?). > Internal attacks and hacks on the network channels are not considered > practical. The reason that packet filters are hard to get deployed on EBGP is the attack can come from your peer and you have no control over your peer's router, except to request that filters be put in place. If an internal router is compromised there are many attacks possible in addition to BGP. The same applies to an internal compromise to other critical infrastructure machines such as NMS. > IPSec is not recommended because at least three reasons: > TCP-MD5 is already widely implemented and deployed. > Complication in network management, because packet filters need to be > adapted for IPSec traffic (port 500 - IKE). > IPSec has higher processing demands that TCP-MD5 does, lowing the > barrier for a successful DoS attack. Hardware IPSec implementations are > considered impractical, due to the cost or performance. > > This brings up your statement that IPSec covers the ports. This is not > true, when encryption (ESP) is not used. The use of intigrity (AH) > will not hide the ports, it works sorta TCP-MD5, if you want... > I also did never consider using ESP, since it was not desired and it > would be the way to 'promote' DoS attacks. > Unfortunally I don't have any insight on IPSec hardware, maybe some else > can give some insight on this. > > I suppose this closes the IPSec/TCP-MD5 issue, leaving TCP-MD5 to the > current best option. Use of AH is an option. It offers little benefit over TCP-MD5. It would be better if there was a way for ESP to expose the ports. > Still this leaves EBGP open for consideration. > > The use of BGP TTL Security Hack (BTSH) might be a very simple and > effective way of protecting the external routers from attacks, > especially DoS attacks. Malicious data can be prevented using TCP-MD5. > The disadvantage is that it is not deployed widely, but only requires > routers to send there traffic with an initial TTL of 255. I don't > know if this is currently possible to archieve: whether current > implementations can be set to send with an initial TTL of 255. It is > not needed for both end-points to support BTSH, although desired. > <draft-gill-btsh-01.txt> It is simple. It is easy. It works. This TTL check can be done in hardware in many (most?) routers. Another mitigation of attack is to build routers such that the BGP TCP (and optionally the whole Adj-In and much of the AdjOut handling) is done on the line card, minimally the TCP-MD5 authentication. Some routers also include internal hardware queues that can be used for SFQ handling of traffic to the line card from outside. If an attack occurs on an EBGP peer, the attack only affects peers served by that line card and if SFQ is available may only affect that peer. If so, a peers that do not implement (or enable) filtering or BTSH affects only their own connectivity. > I couldn't find any information about dynamic 4-tuple filtering > (google turns up BGP dynamic capabilities), unfortunally. Vijay Gill gave a presentation at NANOG. Once established a 4-tuple filter (src/dst addr+port) is installed for a BGP peering giving that traffic priority over other 4-tuples. A DoS attack on the intigrity check then impacts the ability to get to the established state but does not impact established connections. > So what else can be done to prevent a EBGP session from attacks, > especially DoS attacks? Data insertion/manipulations can be guarded > against using TCP-MD5 (or similar). Traffic based DoS which overwhelms the intigrity check is the hardest problem we currently face. It is solved for IBGP by adequate filtering and keeping the routers or other infrastructure machines from being compromised. BTSH solves it for EBGP unless the peer is compromised. Limiting the impact of a full bandwidth attack from a peer is something that a limited set of routers may be capable of. > - Joris Sorry for the delay. I deferred a response to this and then it got lost/buried in my inbox. Curtis > >-----Original Message----- > >From: owner-idr@merit.edu [mailto:owner-idr@merit.edu]On Behalf Of > >Curtis Villamizar > >Sent: Thursday, 3 April 2003 16:36 > >To: Joris Dobbelsteen > >Cc: 'IDR WG (E-mail)' > >Subject: Re: Issue 19) Security Considerations > > > >In message <001b01c2f52e$48505480$0d0ca8c0@joris2k.local>, > >"Joris Dobbelsteen" > >writes: > >> [snip] > > > >Joris, > > > >Quite frankly I'm outraged at such comments. Only by looking at > >theoretical security issues and ignoring reality (not that the two > >don't highly but imperfectly overlap) can you come to the conclusion > >that IPSEC is needed and in its current form is viable as a security > >solution for BGP. > > > >I think its about time we injected some reality into > >draft-murphy-bgp-vuln-02.txt. > > > >I've added a practical considerations section. I stuck it in as 4.2. > > > >Comments are welcome, particularly comments from people actually > >running BGP networks or building BGP routers used by ISPs. > > > >I did not mention advanced filtering works-in-progress or proposals > >such as BTSH or dynamic 4-tuple EBGP filtering since these are not yet > >implemented or deployed afaik. [aside: I strongly believe that BTSH > >will prove to be a viable to protect EBGP and a preferable replacement > >for current filtering which some older TTM (time-to-market) line cards > >still in use are unable to support.] > > > >I should also note that the filtering best practices are far from > >universally deployed and in some cases are difficult to fully deploy > >due to residual use of TTM line cards unable to support filtering. > > > >Note that IPSEC with port numbers exposed would be a viable security > >solution. It would still be a greater computational burden than > >TCP-MD5 and still might be less preferred by ISPs for that reason for > >some architectures. This change to IPSEC would at least yield two > >viable options and might encourage implementation and deployment of > >IPSEC as a security solution for BGP. > > > >Curtis > > > > > >--- draft-murphy-bgp-vuln-02.txt Wed Mar 5 21:00:00 2003 > >+++ draft-murphy-bgp-vuln-02.txt++ Thu Apr 3 09:18:12 2003 > >@@ -149,6 +149,7 @@ > > 3.2.2.2 Timer events > >.............................................. 16 > > 4 Security Considerations > >......................................... 16 > > 4.1 Residual Risk > >................................................. 16 > >+4.2 Practical Considerations > >...................................... 16 > > 5 References > >...................................................... 17 > > 6 Author's Address > >................................................ 18 > > > >@@ -901,6 +902,79 @@ > > Filtering is in use near some customer attachment points, but is not > > effective near the Internet center. The other mechanisms are still > > controversial and are not yet in common use. > >+ > >+4.2 Practical Considerations > >+ > >+The primary usage of BGP is as a means to provide reachability > >+information to Autonomous Systems (AS) and to distribute external > >+reachability internally within an AS. BGP is the routing protocol > >+used to distribute global routing information in the Internet. BGP is > >+therefore used by all major Internet Service Providers (ISP) and many > >+smaller providers and other organizations. > >+ > >+The role which BGP plays in the Internet puts BGP implementations in > >+unique conditions and places unique security requirements on BGP. BGP > >+is operated over interprovider interfaces in which traffic levels push > >+the state of the art in specialized packet forwarding hardware and > >+exceed the performace capabilities of hardware implementation of > >+decryption by many decimal orders of magnitude. > >+ > >+ISP networks must be and are under tight control. The only viable > >+means to protect the network elements from Denial of Service (DoS) > >+attacks under such conditions are packet based filtering techniques > >+based on relatively simple inspections of packets. > >+ > >+To protect Internal BGP (IBGP) sessions, filters are applied at all > >+borders to an ISP network which remove all traffic destined for > >+addresses of network elements internal addresses (typically contained > >+within a single prefix) and the BGP port number (179). Packets from > >+within an ISP are not forwarded from an internal interface to the BGP > >+speaker's address on which External BGP (EBGP) sessions are supported, > >+or to a peer's EBGP address if the BGP port number is found. With > >+appropriate consideration in router design, in the event of failure of > >+a BGP peer to provide the equivalent filtering the risk of compromise > >+can be limited to the peering session on which filtering is not > >+performed by the peer or the interface or line card on which the > >+peering is supported. There is substantial motivation and little > >+effort for ISPs to maintain such filters. > >+ > >+Being composed entirely of specialized network equipment, under strict > >+control of the ISP, the ISP network is not subject to attacks from > >+within than enterprise networks are with more generalized computing > >+systems and staff less carefully trained in the area of secure > >+procedures. ... > > For me personally, the above sentence is a little hard to understand. > You mean that ??? > The Internal BGP routers (or specialized network equipment) that is under > strict control of the ISP is not subject to attacks, other than those that > are common in enterprise networks. These networks have staff that is > less carefully trained in the area of security procedures. > > >+ ........... Monitoring of traffic from within requires either > >+compromise of relatively physically secure and carefully administered > >+network elements or monitoring physical media. Injection of traffic > >+requires either compromise of network elements or intercept and > >+replacement of traffic on physical media. > >+ > >+The difficulty of compromise of network elements and of undetected > >+tapping into physical media carrying extremely high volumes of traffic > >+is much greater than the difficulty of injecting sufficient traffic > >+from outside a network to effect a DoS attack. As a result, the > >+ability to packet filter on the basis of port numbers far exceeds the > >+need to cryptographic strength in encapsulation. > >+ > >+These practical considerations yield the situation in which TCP-MD5, > >+though cryptographic weak, far better serves ISP security needs than > >+the cryptographicly much stronger IPSEC which makes packet filtering > >+infeasible. > >+ > >+Use of BGP in smaller networks yields similar requirements. The > >+capability of a single workstation with high speed interface to > >+generate false traffic far exceeds the capability of software based > >+decryption or appropriately priced cryptographic hardware. From a > >+practical standpoint, these networks are also better served by > >+appropriate administrative care, filtering, and TCP-MD5 than by IPSEC. > >+ > >+This situation is likely to persist unless either cryptographic > >+hardware becomes many orders of magnitude faster and cheaper or IPSEC > >+supports an ability to leave IP port numbers exposed. This > >+requirement has been made known to the IPSEC WG. > >+ > > See above, using intigrity only (AH), leaves TCP (not IP) port numbers > readable for everyone. This security is rather an IPSec configuration > option. > > >+Until such time as IPSEC is modified, there is little choice but to > >+mandate TCP-MD5 implementation and recommend TCP-MD5 usage for BGP and > >+discourage IPSEC usage for BGP. > > > > 5. References > > > > > Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id KAA09681 for <idr-archive@nic.merit.edu>; Wed, 14 May 2003 10:10:46 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id D1C9491214; Wed, 14 May 2003 10:10:13 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 995A791299; Wed, 14 May 2003 10:10:13 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 1580791214 for <idr@trapdoor.merit.edu>; Wed, 14 May 2003 10:10:12 -0400 (EDT) Received: by segue.merit.edu (Postfix) id F23CC5E0E3; Wed, 14 May 2003 10:10:11 -0400 (EDT) Delivered-To: idr@merit.edu Received: from workhorse.fictitious.org (workhorse.fictitious.org [209.150.1.230]) by segue.merit.edu (Postfix) with ESMTP id CDCB65E0D7 for <idr@merit.edu>; Wed, 14 May 2003 10:10:09 -0400 (EDT) Received: from workhorse.fictitious.org (localhost.fictitious.org [127.0.0.1]) by workhorse.fictitious.org (8.9.3/8.9.3) with ESMTP id KAA78532; Wed, 14 May 2003 10:08:41 -0400 (EDT) (envelope-from curtis@workhorse.fictitious.org) Message-Id: <200305141408.KAA78532@workhorse.fictitious.org> To: idr@merit.edu Cc: curtis@fictitious.org Reply-To: curtis@fictitious.org Subject: draft-ietf-idr-bgp-analysis-03.txt Date: Wed, 14 May 2003 10:08:40 -0400 From: Curtis Villamizar <curtis@fictitious.org> Sender: owner-idr@merit.edu Precedence: bulk David, Keyur, I have some suggestions for improvement to this draft ( BGP-4 Protocol Analysis <draft-ietf-idr-bgp-analysis-03.txt>). See comments inline below. Feel free to take what you consider valid and worth changing. Curtis In "1. Introduction" you should mention that BGP4 was the first to support CIDR and due to their lack of support for CIDR versions 1-3 are considered obsolete and unusable in today's Internet. ------------ Somewhere in key features it should be mentioned that BGP makes the assumption that packets are routed from source towards destination independent of the source. A good place for this would be near the statement "BGP does not make any assumptions about intra-autonomous system routing protocols deployed within the various autonomous systems". Or refer to the statement in the beginning of "7. Applicability". ------------ In the following paragraph, why don't we just say that this algorithm is referred to as a "Path Vector" algorithm. BGP uses an algorithm that is neither a pure distance vector algorithm or a pure link state algorithm. It is instead a modified distance vector algorithm that uses path information to avoid traditional distance vector problems. Each route within BGP pairs destination with path information to that destination. Path information (also known as AS_PATH information) is stored within the AS_PATH attribute in BGP. This allows BGP to reconstruct large portions of overall topology whenever required. ------------ Alex probably made you put in some FSM stuff (don't read too much into the wording here). I don't think it belongs in this document. That is clearly something for the protocol spec. ------------ In the section "4. BGP Persistent Peer Oscillations" or in a nearby section (preferable) it should be mentioned that BGP is work conserving. Here is some suggested text: A robust BGP implementation is work conserving. This means that if the number of prefixes is bound, arbitrarily high levels of route change can be tolerated with bounded impact on route convergence for occasionaly changes in generally stable routes. A BGP implementation under high load conditions should empty as much inbound routing updates from its input streams, processing only the most recent route if the route for a given NLRI changes multiple times. TCP also provides blocking on the writes on the sender side. A BGP implementation under load should expect blocks on write calls and send only the most recent routes when sockets unblock rather than sending entire history. A robust implemention of BGP should have the following characteristics: 1. It is able to operate in almost arbitrarily high levels of route flap without loosing peerings (failing to send keepalives) or loosing other protocol adjacencies as a result of BGP load. 2. Instability of a subset of routes should not affect the route advertisements or forwarding associated with the set of stable routes. 3. High levels of instability and peers of different CPU speed or load resulting in faster or slower processing of routes should not cause instability and should have a bounded impact on the convergence time for generally stable routes. Numerous robust BGP implementations exist. Producing a robust implementation is not a trivial matter but clearly acheivable. ------------ I find the following paragraph problematic without further explanation. It is important to note that BGP does not require all the routers within an autonomous system to participate in the BGP protocol. In particular, only the border routers that provide connectivity between the local autonomous system and their adjacent autonomous systems need participate in BGP. The ability to constrain the set of BGP speakers is one way to address scaling issues. Either you need to default to the borders and exit at any border or you need some mechanism to tunnel between border routers for a pure transport network. I favor removing the above paragraph. Tunnelling to remove BGP is out of scope. Default routing to reach any arbitrary border need not be mentioned. Things which actually do improve scaling within an AS are RR and confeds. ------------ Section "5.1 Link bandwidth and CPU utilization" may still be overly simplistic and as a result may be incorrect. See comments below. ------------ In terms of bandwidth, the number of unique AS paths in practice is a small number compared to the number of NLRI. Since many NLRI are packed in a single update with the AS path included only once, in practice the number of NLRI completely dominates the amount of bandwidth consumed. The MR = 4 * (N + (M * A)) may be inaccurate for the reasons I gave in the prior paragraph. The M*A may drasticly understate the impact of the unique AS paths. Instead of defining A as the number of AS, A could be defined as the number of unique AS paths with M*A then being average AS path length times number of unique AS paths. Also why is both memory and bandwidth represented as MR? Wouldn't BW be a better variable name for bandwidth? The O(C * M) thing in the next paragraph is also invalid above some value of C but for different reasons. Above some value of C, either the sender will begin pacing it sending of updates on its own (suppressing multiple changes over very short periods) or the receiver will be unable to keep up with the rate of change and force suppression of multiple changes over very short periods by causing the BGP socket to block on the sender. ------------ The following statement is incorrect: Finally, since the dynamic properties of the network cannot be quantitatively bounded, stability must be addressed via heuristics such as BGP Route Flap Damping [RFC2439]. Due to the nature of BGP, such damping should be viewed as a matter local to an autonomous system matter (see also Appendix F.2 of [BGP4]). The amount of change is inherently bounded in BGP (as I described above). BGP Route Flap Damping was initially proposed for two reasons, 1) to protect a specific commercial implementation that was not sufficiently robust, 2) to improve convergence of stable routes. BGP Route Flap Damping is not necessary to bound the amount of change in BPG routing. ------------ We can drop the following comparison to a historic protocol: It may also be instructive to compare bandwidth and CPU requirements of BGP with the Exterior Gateway Protocol (EGP). While with BGP the complete information is exchanged only at the connection establishment time, with EGP the complete information is exchanged periodically (usually every 3 minutes). Note that both for BGP and for EGP the amount of information exchanged is roughly on the order of the number of networks reachable via a peer that sends the information. Therefore, even if one assumes extreme instabilities of BGP, its worst case behavior will be the same as the steady state behavior of its predecessor, EGP. Operational experience with BGP showed that the incremental update approach employed by BGP provides qualitative improvement in both bandwidth and CPU utilization when compared with complete periodic updates used by EGP (see also presentation by Dennis Ferguson at the Twentieth IETF, March 11-15, 1991, St. Louis). We should drop other references to EGP. ------------ In "5.1.2. Memory requirements", the MR = O((N + M * A) * K) the same comment applies regarding the M*A term. A should be unique AS paths, not number of AS and is not multiplied by K. In practice the K term is small because it is the number of peers sending full routing, which is generally much less than the worst case number of peers. Large providers who carry full routing typically send each other only their customer routes to avoid providing free transit to each other. This reduces the impact of K. ------------ We can drop: It is interesting to note that prior to the introduction of BGP in the NSFNET Backbone, memory requirements on the NSFNET Backbone routers running EGP were on the order of O(N *K). ------------ In the MR = ((N*4) + (M*A)*2) * K, we can make this quite accurate by defining N as the average number of routes advertised by each peer and A as the number of unique AS paths, moving (M*A)*2) outside of the *K and changing N*4 to N*R and (M*A)*2) to (M*A)*P) where R is the number of bytes required to store a route and P is the number of bytes needed to store one AS in an AS path. If K is small, then some overhead such as the patricia trie storage figures into R and if K is large, the data structures may be linked lists off the patricia trie. Claiming that a route can be stored in 4 bytes is rather naive. The N*R*K term in practice dominates over the (M*A)*P term. If we conservatively estimate a route as taking 16 bytes (large K allowing patricia trie overhead to be ignored, indices or pointers to unique AS Path or other attributes, etc). If we wanted to include a term to make things accurate for small K we could add U*X where U is the number of unique NRLI and X is the overhead per unique NLRI (and I ran out of useful single letters). Typically X is greater than R. ------------ Interestingly, in his review of the BGP protocol for the BGP review committee in March of 1990, Paul Tsuchiya noted that "BGP does not scale well. Paul was wrong. It does scale well and that's why it is being used. BGP is a solution that scales as well as the problem allows and no better. ------------ In "10. Security Considerations" we can reference the separate security analysis document. Or maybe not. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id HAA04703 for <idr-archive@nic.merit.edu>; Wed, 14 May 2003 07:24:15 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id C64D591210; Wed, 14 May 2003 07:23:47 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 87C0291211; Wed, 14 May 2003 07:23:47 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 4D1A391210 for <idr@trapdoor.merit.edu>; Wed, 14 May 2003 07:23:46 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 356D35E05E; Wed, 14 May 2003 07:23:46 -0400 (EDT) Delivered-To: idr@merit.edu Received: from ietf.org (odin.ietf.org [132.151.1.176]) by segue.merit.edu (Postfix) with ESMTP id B505E5DEDD for <idr@merit.edu>; Wed, 14 May 2003 07:23:45 -0400 (EDT) Received: from CNRI.Reston.VA.US (localhost [127.0.0.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id HAA12720; Wed, 14 May 2003 07:20:42 -0400 (EDT) Message-Id: <200305141120.HAA12720@ietf.org> Mime-Version: 1.0 Content-Type: Multipart/Mixed; Boundary="NextPart" To: IETF-Announce: ; Cc: idr@merit.edu From: Internet-Drafts@ietf.org Reply-To: Internet-Drafts@ietf.org Subject: I-D ACTION:draft-ietf-idr-bgp-analysis-03.txt Date: Wed, 14 May 2003 07:20:42 -0400 Sender: owner-idr@merit.edu Precedence: bulk --NextPart A New Internet-Draft is available from the on-line Internet-Drafts directories. This draft is a work item of the Inter-Domain Routing Working Group of the IETF. Title : BGP-4 Protocol Analysis Author(s) : D. Meyer, K. Patel Filename : draft-ietf-idr-bgp-analysis-03.txt Pages : 19 Date : 2003-5-13 The purpose of this report is to document how the requirements for advancing a routing protocol from Draft Standard to full Standard have been satisfied by Border Gateway Protocol version 4 (BGP-4). This report satisfies the requirement for'the second report', as described in Section 6.0 of RFC 1264 [RFC1264]. In order to fulfill the requirement, this report augments RFC 1774 [RFC1774] and summarizes the key features of BGP protocol, and analyzes the protocol with respect to scaling and performance. A URL for this Internet-Draft is: http://www.ietf.org/internet-drafts/draft-ietf-idr-bgp-analysis-03.txt To remove yourself from the IETF Announcement list, send a message to ietf-announce-request with the word unsubscribe in the body of the message. Internet-Drafts are also available by anonymous FTP. Login with the username "anonymous" and a password of your e-mail address. After logging in, type "cd internet-drafts" and then "get draft-ietf-idr-bgp-analysis-03.txt". A list of Internet-Drafts directories can be found in http://www.ietf.org/shadow.html or ftp://ftp.ietf.org/ietf/1shadow-sites.txt Internet-Drafts can also be obtained by e-mail. Send a message to: mailserv@ietf.org. In the body type: "FILE /internet-drafts/draft-ietf-idr-bgp-analysis-03.txt". NOTE: The mail server at ietf.org can return the document in MIME-encoded form by using the "mpack" utility. To use this feature, insert the command "ENCODING mime" before the "FILE" command. To decode the response(s), you will need "munpack" or a MIME-compliant mail reader. Different MIME-compliant mail readers exhibit different behavior, especially when dealing with "multipart" MIME messages (i.e. documents which have been split up into multiple messages), so check your local documentation on how to manipulate these messages. Below is the data which will enable a MIME compliant mail reader implementation to automatically retrieve the ASCII version of the Internet-Draft. --NextPart Content-Type: Multipart/Alternative; Boundary="OtherAccess" --OtherAccess Content-Type: Message/External-body; access-type="mail-server"; server="mailserv@ietf.org" Content-Type: text/plain Content-ID: <2003-5-13133436.I-D@ietf.org> ENCODING mime FILE /internet-drafts/draft-ietf-idr-bgp-analysis-03.txt --OtherAccess Content-Type: Message/External-body; name="draft-ietf-idr-bgp-analysis-03.txt"; site="ftp.ietf.org"; access-type="anon-ftp"; directory="internet-drafts" Content-Type: text/plain Content-ID: <2003-5-13133436.I-D@ietf.org> --OtherAccess-- --NextPart-- Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id DAA08041 for <idr-archive@nic.merit.edu>; Tue, 13 May 2003 03:26:33 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id E89CA9126F; Tue, 13 May 2003 03:26:07 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id B00B391270; Tue, 13 May 2003 03:26:06 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 81A689126F for <idr@trapdoor.merit.edu>; Tue, 13 May 2003 03:26:05 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 5CC7A5E067; Tue, 13 May 2003 03:26:05 -0400 (EDT) Delivered-To: idr@merit.edu Received: from sj-core-2.cisco.com (sj-core-2.cisco.com [171.71.177.254]) by segue.merit.edu (Postfix) with ESMTP id 0B95F5E063 for <idr@merit.edu>; Tue, 13 May 2003 03:26:05 -0400 (EDT) Received: from cisco.com (router.cisco.com [171.69.182.20]) by sj-core-2.cisco.com (8.12.6/8.12.6) with ESMTP id h4D7Q2iQ010777; Tue, 13 May 2003 00:26:02 -0700 (PDT) Received: from [193.0.9.150] (ssh-ams-1.cisco.com [144.254.74.55]) by cisco.com (8.8.8/2.6/Cisco List Logging/8.8.8) with ESMTP id DAA15367; Tue, 13 May 2003 03:26:00 -0400 (EDT) Mime-Version: 1.0 X-Sender: jgs@router Message-Id: <p05210602bae64b4a3ff7@[193.0.9.150]> In-Reply-To: <20030512142546.E5895@nexthop.com> References: <20030512095637.B5895@nexthop.com> <200305121808.h4CI8fH9022253@rtp-core-1.cisco.com> <20030512142546.E5895@nexthop.com> Date: Tue, 13 May 2003 09:15:20 +0200 To: Jeffrey Haas <jhaas@nexthop.com> From: "John G. Scudder" <jgs@cisco.com> Subject: Re: On BGP and VPLS Cc: Eric Rosen <erosen@cisco.com>, idr@merit.edu Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-idr@merit.edu Precedence: bulk At 2:25 PM -0400 5/12/03, Jeffrey Haas wrote: >If you flood more than one type >of reachability, you get to make hard choices such as which reachability >do you want to converge faster and if you start exceeding the resource >bounds of your router, which information do you toss? OK, but this is orthogonal to whether you carry said information in BGP or some other protocol, since they're all sharing the same resources. By the way, I think this use of the term "flooding" is rather unfortunate since it already has a well understood meaning in the routing protocol context, and it suggests link-state. --John Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA20556 for <idr-archive@nic.merit.edu>; Mon, 12 May 2003 14:27:26 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 0A3E59125E; Mon, 12 May 2003 14:27:04 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id C5D0E9125F; Mon, 12 May 2003 14:27:03 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id B1D059125E for <idr@trapdoor.merit.edu>; Mon, 12 May 2003 14:27:02 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 8CF735DE20; Mon, 12 May 2003 14:27:02 -0400 (EDT) Delivered-To: idr@merit.edu Received: from presque.nexthop.com (dns.nexthop.com [65.247.36.216]) by segue.merit.edu (Postfix) with ESMTP id 5D5425DE1D for <idr@merit.edu>; Mon, 12 May 2003 14:27:02 -0400 (EDT) Received: (from root@localhost) by presque.nexthop.com (8.12.8/8.11.1) id h4CIPtIf039930; Mon, 12 May 2003 14:25:55 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: from jhaas.nexthop.com (jhaas.nexthop.com [65.247.36.31]) by presque.nexthop.com (8.12.8/8.12.8) with ESMTP id h4CIPpWB039922; Mon, 12 May 2003 14:25:51 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: (from jhaas@localhost) by jhaas.nexthop.com (8.11.3nb1/8.11.3) id h4CIPk209204; Mon, 12 May 2003 14:25:46 -0400 (EDT) Date: Mon, 12 May 2003 14:25:46 -0400 From: Jeffrey Haas <jhaas@nexthop.com> To: Eric Rosen <erosen@cisco.com> Cc: idr@merit.edu Subject: Re: On BGP and VPLS Message-ID: <20030512142546.E5895@nexthop.com> References: <20030512095637.B5895@nexthop.com> <200305121808.h4CI8fH9022253@rtp-core-1.cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200305121808.h4CI8fH9022253@rtp-core-1.cisco.com>; from erosen@cisco.com on Mon, May 12, 2003 at 02:08:41PM -0400 X-Virus-Scanned: by AMaViS perl-11 Sender: owner-idr@merit.edu Precedence: bulk Aside from noting that sctp would be a fine way to do the multiple streams, I think all of my issues have relatively little to do with technical issues and are really implementation/operational ones. If its reachability you're flooding, and the information changes on a hop-by-hop basis, BGP is fine. If you flood more than one type of reachability, you get to make hard choices such as which reachability do you want to converge faster and if you start exceeding the resource bounds of your router, which information do you toss? The more that you cram into one package, the more rope you give people. Feedback from several network operators often makes me think we've already given them too many chain-lengths of rope. :-) On Mon, May 12, 2003 at 02:08:41PM -0400, Eric Rosen wrote: > I think a vendor would be unlikely to "start from the spec". A more likely > implementation strategy would be to allow BGP connections on both the old > port and the new port. Capability advertisement or ORF or something would > be used to choose which kind of info gets forwarded on which BGP > connections. > > Of course, one day someone would notice that this might require multiple > parallel TCP connections where a single one might do just as well. So the > suggestion would probably be made that, as an optimization, one could use a > single port, but encode the different kinds of data as NLRI of different > address families. > > So saying "use a different port" doesn't really make the problem go away. -- Jeff Haas NextHop Technologies Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA20454 for <idr-archive@nic.merit.edu>; Mon, 12 May 2003 14:09:08 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id D70AC9125D; Mon, 12 May 2003 14:08:45 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id A48909125E; Mon, 12 May 2003 14:08:45 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id A2D2A9125D for <idr@trapdoor.merit.edu>; Mon, 12 May 2003 14:08:44 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 7C7365DE0E; Mon, 12 May 2003 14:08:44 -0400 (EDT) Delivered-To: idr@merit.edu Received: from rtp-core-1.cisco.com (rtp-core-1.cisco.com [64.102.124.12]) by segue.merit.edu (Postfix) with ESMTP id 17BFB5DE0B for <idr@merit.edu>; Mon, 12 May 2003 14:08:44 -0400 (EDT) Received: from cisco.com (erosen-u10.cisco.com [161.44.70.36]) by rtp-core-1.cisco.com (8.12.6/8.12.6) with ESMTP id h4CI8fH9022253; Mon, 12 May 2003 14:08:41 -0400 (EDT) Message-Id: <200305121808.h4CI8fH9022253@rtp-core-1.cisco.com> To: Jeffrey Haas <jhaas@nexthop.com> Cc: Pedro Roque Marques <roque@juniper.net>, idr@merit.edu Subject: Re: On BGP and VPLS In-reply-to: Your message of Mon, 12 May 2003 09:56:37 -0400. <20030512095637.B5895@nexthop.com> Reply-To: erosen@cisco.com User-Agent: EMH/1.14.1 SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-4?Q?Unebigory=F2mae?=) APEL/10.3 Emacs/21.3 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Date: Mon, 12 May 2003 14:08:41 -0400 From: Eric Rosen <erosen@cisco.com> Sender: owner-idr@merit.edu Precedence: bulk Jeff> But it'd be nice for everyone that wants to "co-opt just another Jeff> little piece of BGP because it works" that maybe you should start from Jeff> the spec, get your own TCP port and flood the stuff in parallel. I think a vendor would be unlikely to "start from the spec". A more likely implementation strategy would be to allow BGP connections on both the old port and the new port. Capability advertisement or ORF or something would be used to choose which kind of info gets forwarded on which BGP connections. Of course, one day someone would notice that this might require multiple parallel TCP connections where a single one might do just as well. So the suggestion would probably be made that, as an optimization, one could use a single port, but encode the different kinds of data as NLRI of different address families. So saying "use a different port" doesn't really make the problem go away. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id JAA18153 for <idr-archive@nic.merit.edu>; Mon, 12 May 2003 09:57:10 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 7BCA591236; Mon, 12 May 2003 09:56:50 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 4351D9124A; Mon, 12 May 2003 09:56:50 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 3F61291236 for <idr@trapdoor.merit.edu>; Mon, 12 May 2003 09:56:49 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 27BC75E552; Mon, 12 May 2003 09:56:49 -0400 (EDT) Delivered-To: idr@merit.edu Received: from presque.nexthop.com (dns.nexthop.com [65.247.36.216]) by segue.merit.edu (Postfix) with ESMTP id E8CB05E56C for <idr@merit.edu>; Mon, 12 May 2003 09:56:48 -0400 (EDT) Received: (from root@localhost) by presque.nexthop.com (8.12.8/8.11.1) id h4CDulVi032409; Mon, 12 May 2003 09:56:47 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: from jhaas.nexthop.com (jhaas.nexthop.com [65.247.36.31]) by presque.nexthop.com (8.12.8/8.12.8) with ESMTP id h4CDugWB032394; Mon, 12 May 2003 09:56:42 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: (from jhaas@localhost) by jhaas.nexthop.com (8.11.3nb1/8.11.3) id h4CDubW06114; Mon, 12 May 2003 09:56:37 -0400 (EDT) Date: Mon, 12 May 2003 09:56:37 -0400 From: Jeffrey Haas <jhaas@nexthop.com> To: Pedro Roque Marques <roque@juniper.net> Cc: idr@merit.edu Subject: Re: On BGP and VPLS Message-ID: <20030512095637.B5895@nexthop.com> References: <200305072030.h47KUOC64557@roque-bsd.juniper.net> <200305072106.RAA19411@workhorse.fictitious.org> <200305072115.h47LFdh64632@roque-bsd.juniper.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200305072115.h47LFdh64632@roque-bsd.juniper.net>; from roque@juniper.net on Wed, May 07, 2003 at 02:15:39PM -0700 X-Virus-Scanned: by AMaViS perl-11 Sender: owner-idr@merit.edu Precedence: bulk On Wed, May 07, 2003 at 02:15:39PM -0700, Pedro Roque Marques wrote: > My argument is that the flooding algorithm in the BGP spec is > applicable to other NLRI-types. You could flood it using NNTP too. It doesn't mean its the One True Answer. I think that is most of Alex's point. > And that the document can be, and has > been sucesfully used to do so. i.e. it is still a coherent document > when you interpret it in the context of a different NLRI-type. There comes a time when you're just distributing too much stuff in one protocol and take too many chances at destabilizing what sort of works well. I'm not saying that VPLS is like this, having never read the specs. But it'd be nice for everyone that wants to "co-opt just another little piece of BGP because it works" that maybe you should start from the spec, get your own TCP port and flood the stuff in parallel. Same kind of basket, just different basket. Hopefully fewer broken eggs. Now I know what the DNS folk feel like. > Pedro. -- Jeff Haas NextHop Technologies Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id KAA02786 for <idr-archive@nic.merit.edu>; Fri, 9 May 2003 10:14:26 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id ADBD99121A; Fri, 9 May 2003 10:14:02 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 715279121B; Fri, 9 May 2003 10:14:02 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 966A39121A for <idr@trapdoor.merit.edu>; Fri, 9 May 2003 10:14:00 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 7B3E55E0E3; Fri, 9 May 2003 10:14:00 -0400 (EDT) Delivered-To: idr@merit.edu Received: from rtp-core-2.cisco.com (rtp-core-2.cisco.com [64.102.124.13]) by segue.merit.edu (Postfix) with ESMTP id 9D98F5E0DE for <idr@merit.edu>; Fri, 9 May 2003 10:13:55 -0400 (EDT) Received: from cisco.com (uzura.cisco.com [64.102.17.77]) by rtp-core-2.cisco.com (8.12.6/8.12.6) with ESMTP id h49EDNNE019286; Fri, 9 May 2003 10:13:23 -0400 (EDT) Received: from russpc (rtp-vpn1-91.cisco.com [10.82.224.91]) by cisco.com (8.8.8/2.6/Cisco List Logging/8.8.8) with ESMTP id KAA11527; Fri, 9 May 2003 10:13:22 -0400 (EDT) Date: Fri, 9 May 2003 10:13:14 -0400 (Eastern Daylight Time) From: Russ White <ruwhite@cisco.com> Reply-To: Russ White <riw@cisco.com> To: idr@merit.edu Cc: rtg-dir@ietf.org Subject: Comments on BGP Draft 20..... Message-ID: <Pine.WNT.4.53.0305090945390.2372@russpc> X-X-Sender: ruwhite@uzura.cisco.com MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-idr@merit.edu Precedence: bulk Some of these are going to echo Alex's comments, but that's okay, I think. Mostly just nits.... :-) Russ __________________________________ riw@cisco.com CCIE <>< Grace Alone ----- Abstract: This information is sufficeint to construct a graph of the AS connectivity from which routing loops may be pruned and some policy decisions at the AS level may be enforced. UPDATE Message Format: The information in the UPDATE message can be used to construct a graph describing the relationships of the various Autonomous Systems. In both cases this is true, I suppose, but in neither case does this really describe what the AS Path is used for, right? I would think we'd want to describe it less in terms of a "graph of the connectivity in the internetwork," and more in terms of "a graph of the path through Autonomous Systems ued to reach the destination advertised." It could be confusing, since there isn't anyplace where we discuss building a graph of inconnectivity between the Autonomous Systems.... ----- Forwarding Paradigm: This document uses the term "Autonomous System" (AS) throughout.... This entire paragraph is a repeat--I'd leave it just in the definitions. ----- Forwarding Paradigms: The initial data flow.... This paragraph has two different thoughts in it, one about incremental updates, and the other about keeping data that you've received. It seems like just putting a return after "as the routing tables change." ----- Forwarding Paradigms: The paragraph starting "KEEPALIVE messages" should, I think, be moved up above the section on route exchange. I don't know why, it just seems less like it's jumping all over the place that way. ----- 3.1 Routes: Advertisement and Storage It almost seems like the section about The initial data flow should maybe be put entirely under this section someplace (?). The first paragraph in this section is really a definition of a route vs a prefix, and should probably be in the definitions. The paragraph "Changing attribute of a route...." needs a "the," or attribute needs an "s." ----- 3.2 Routing Information Bases b) Loc-RIB.... I think it might be useful to state the contents of the Loc-RIB are actually installed in the local routing table, and thus used for forwarding packets on this router. I don't see anyplace this connection is made explicit, it seems more like it's implicit throughout the doc. ----- Page 18, a) LOCAL_PREF "....to inform other peers...." should be "....to inform its other peers...." ----- Network Layer Reachability Information "This varibale length field contains a list of IP address prefixes." I think we can kill "address" here. a) Length "The Length field inidicates...." The sentence can start with "Indicates..." b) Prefix "The Prefix field indicates...." The sentence can start with "Indicates...." ----- Network Layer Reachability Information "An UPDATE message can list multiple routes to be withdrawn...." Actually, we don't withdraw routes, we withdraw prefixes, right? The next paragraph shows this confusion, by talking about routes without attributes, but routes are prefixes combined with attributes, so.... They aren't routes, they're prefixes. You remove routes based on withdrawn prefixes, I think. ------ 5. Path Attributes "Well-known attributes MUST be recognized by all BGP implementations." This sentence, as strange as it may sound, implies it's the attributes fault if the BGP implementation doesn't recogonize it, that it's up to the attribute definers to, in some way, make certain that BGP implementations will recognize it. I think it should probably be worded the other way 'round: "BGP implementations MUST recognize all well-known attributes." ----- 5. Path Attributes "All well-known attributes MUST be passed along (after proper updating, if necessary) to other BGP peers." This just seems a little rough. Maybe this: "Once a BGP peer has updated any well-known attributes, it MUST pass these attributes in any updates it transmits to its peers." ----- 5.1.1 ORIGIN "Its value SHOULD NOT be changed by any other speaker." I really think this should be "MUST NOT." I can't think of any reason it wouldn't be, except in the case of aggregation, and that case could be mentioned here as the only known exception (?). Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id RAA11337 for <idr-archive@nic.merit.edu>; Wed, 7 May 2003 17:47:15 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 3D584912A2; Wed, 7 May 2003 17:46:11 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 0495D912A5; Wed, 7 May 2003 17:46:10 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 7581F912A2 for <idr@trapdoor.merit.edu>; Wed, 7 May 2003 17:46:05 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 5D2325DE3E; Wed, 7 May 2003 17:46:05 -0400 (EDT) Delivered-To: idr@merit.edu Received: from workhorse.fictitious.org (workhorse.fictitious.org [209.150.1.230]) by segue.merit.edu (Postfix) with ESMTP id 9A6F45DE34 for <idr@merit.edu>; Wed, 7 May 2003 17:46:04 -0400 (EDT) Received: from workhorse.fictitious.org (localhost.fictitious.org [127.0.0.1]) by workhorse.fictitious.org (8.9.3/8.9.3) with ESMTP id RAA19938; Wed, 7 May 2003 17:45:48 -0400 (EDT) (envelope-from curtis@workhorse.fictitious.org) Message-Id: <200305072145.RAA19938@workhorse.fictitious.org> To: Pedro Roque Marques <roque@juniper.net> Cc: curtis@fictitious.org, ppvpn@nortelnetworks.com, idr@merit.edu Reply-To: curtis@fictitious.org Subject: Re: On BGP and VPLS In-reply-to: Your message of "Wed, 07 May 2003 14:15:39 PDT." <200305072115.h47LFdh64632@roque-bsd.juniper.net> Date: Wed, 07 May 2003 17:45:48 -0400 From: Curtis Villamizar <curtis@fictitious.org> Sender: owner-idr@merit.edu Precedence: bulk In message <200305072115.h47LFdh64632@roque-bsd.juniper.net>, Pedro Roque Marqu es writes: > Curtis Villamizar writes: > > > Pedro, > > > This is the BGP4 base spec. The interpretation of the BGP > > advertisements are IP prefix aka NLRI for which routes are > > advertised and not general keys mapping to general records. > > > If some extension of BGP4 such as VPN or PW makes some other use of > > BGP flooding then that's fine but need not be reflected in the base > > document. > > > The changes that you are only vaguely specifying (s/route/record/g > > s/IP prefix/key/g) doesn't at all pertain to the use of BGP as > > defined in the base spec, hasn't been interpreted as being in > > conflict with any existing document, and I don't think this is a > > productive discussion during last call of a very key document. > > Curtis, > In no way i was suggesting that we change the base spec. > > My argument is that the flooding algorithm in the BGP spec is > applicable to other NLRI-types. And that the document can be, and has > been sucesfully used to do so. i.e. it is still a coherent document > when you interpret it in the context of a different NLRI-type. > > regards, > Pedro. So you are not suggesting a change at all to BGP4. If so you don't need to involve the IDR WG in a semantic discussion. If the issue is whether to use BGP4 for distribution of VPLS information and the objection were along the lines of scaling, or some other technical matter then that was not at all clear. Unless I missed something, IDR was added to the Cc mid discussion. If so, maybe you should tell us what draft you are discussing and be clear about what the issue is. Curtis Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id RAA11007 for <idr-archive@nic.merit.edu>; Wed, 7 May 2003 17:23:21 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 51D4E912A1; Wed, 7 May 2003 17:20:54 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 1B0B1912A2; Wed, 7 May 2003 17:20:54 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id CB79F912A1 for <idr@trapdoor.merit.edu>; Wed, 7 May 2003 17:20:47 -0400 (EDT) Received: by segue.merit.edu (Postfix) id A86C45DE4B; Wed, 7 May 2003 17:20:47 -0400 (EDT) Delivered-To: idr@merit.edu Received: from workhorse.fictitious.org (workhorse.fictitious.org [209.150.1.230]) by segue.merit.edu (Postfix) with ESMTP id E86355DE49 for <idr@merit.edu>; Wed, 7 May 2003 17:20:46 -0400 (EDT) Received: from workhorse.fictitious.org (localhost.fictitious.org [127.0.0.1]) by workhorse.fictitious.org (8.9.3/8.9.3) with ESMTP id RAA19579; Wed, 7 May 2003 17:20:25 -0400 (EDT) (envelope-from curtis@workhorse.fictitious.org) Message-Id: <200305072120.RAA19579@workhorse.fictitious.org> To: Pedro Roque Marques <roque@juniper.net> Cc: Alex Zinin <zinin@psg.com>, ppvpn@nortelnetworks.com, idr@merit.edu Reply-To: curtis@fictitious.org Subject: Re: On BGP and VPLS In-reply-to: Your message of "Wed, 07 May 2003 13:58:58 PDT." <200305072058.h47Kww664593@roque-bsd.juniper.net> Date: Wed, 07 May 2003 17:20:25 -0400 From: Curtis Villamizar <curtis@fictitious.org> Sender: owner-idr@merit.edu Precedence: bulk In message <200305072058.h47Kww664593@roque-bsd.juniper.net>, Pedro Roque Marqu es writes: > Pedro Roque Marques writes: > > >>> The way i see it there is an high likely-hood of this turning into > >>> an "Yes, it is" "No, it isn't" discussion. And I'd really like to > >>> avoid that. > > >> Agreed. > > Following up on my own e-mail... i don't think it is in any way > productive to continue the discussion torwards this path. > > Lets try to turn the discussion around to the positive side: > > Ignore VPLS for now. > > Lets define a problem: > > 1. A database consisting on entries (key, attr) needs to be propagated > accross routers of the same domain and accross different > administrative domains. > > 2. An given key may be originated by more than one of the > participating systems. > > 3. A key advertised via a given member of a domain depends on > reachability to that advertiser. > > Task at hand is to find a solution to the problem above. > > Pedro. Pedro, You are welcome to consider BGP for this key distribution even if the BGP spec does not match the terminology you are looking for. The terminology would not be the deciding factor. If there are technical problems with whatever you propose, that is a separate matter. Now please drop IDR from the Cc and go about doing the ppvpn work, whether it is VPLS or selecting a mechanism for your hypothetical key/attr distribution. Curtis Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id RAA10933 for <idr-archive@nic.merit.edu>; Wed, 7 May 2003 17:16:27 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id A44E291299; Wed, 7 May 2003 17:15:46 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 6DA129129D; Wed, 7 May 2003 17:15:46 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 3B4C291299 for <idr@trapdoor.merit.edu>; Wed, 7 May 2003 17:15:45 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 259785DE24; Wed, 7 May 2003 17:15:45 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id 9888F5DE12 for <idr@merit.edu>; Wed, 7 May 2003 17:15:44 -0400 (EDT) Received: from roque-bsd.juniper.net (roque-bsd.juniper.net [172.17.12.183]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h47LFdu31459; Wed, 7 May 2003 14:15:39 -0700 (PDT) (envelope-from roque@juniper.net) Received: (from roque@localhost) by roque-bsd.juniper.net (8.11.6/8.9.3) id h47LFdh64632; Wed, 7 May 2003 14:15:39 -0700 (PDT) (envelope-from roque) Date: Wed, 7 May 2003 14:15:39 -0700 (PDT) Message-Id: <200305072115.h47LFdh64632@roque-bsd.juniper.net> From: Pedro Roque Marques <roque@juniper.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: curtis@fictitious.org Cc: ppvpn@nortelnetworks.com, idr@merit.edu Subject: Re: On BGP and VPLS In-Reply-To: <200305072106.RAA19411@workhorse.fictitious.org> References: <200305072030.h47KUOC64557@roque-bsd.juniper.net> <200305072106.RAA19411@workhorse.fictitious.org> X-Mailer: VM 6.34 under 19.16 "Lille" XEmacs Lucid Sender: owner-idr@merit.edu Precedence: bulk Curtis Villamizar writes: > Pedro, > This is the BGP4 base spec. The interpretation of the BGP > advertisements are IP prefix aka NLRI for which routes are > advertised and not general keys mapping to general records. > If some extension of BGP4 such as VPN or PW makes some other use of > BGP flooding then that's fine but need not be reflected in the base > document. > The changes that you are only vaguely specifying (s/route/record/g > s/IP prefix/key/g) doesn't at all pertain to the use of BGP as > defined in the base spec, hasn't been interpreted as being in > conflict with any existing document, and I don't think this is a > productive discussion during last call of a very key document. Curtis, In no way i was suggesting that we change the base spec. My argument is that the flooding algorithm in the BGP spec is applicable to other NLRI-types. And that the document can be, and has been sucesfully used to do so. i.e. it is still a coherent document when you interpret it in the context of a different NLRI-type. regards, Pedro. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id RAA10895 for <idr-archive@nic.merit.edu>; Wed, 7 May 2003 17:06:41 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id F1A0E9129C; Wed, 7 May 2003 17:06:19 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id C2FF69129D; Wed, 7 May 2003 17:06:18 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 86A0C9129C for <idr@trapdoor.merit.edu>; Wed, 7 May 2003 17:06:17 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 66E025DE1F; Wed, 7 May 2003 17:06:17 -0400 (EDT) Delivered-To: idr@merit.edu Received: from workhorse.fictitious.org (workhorse.fictitious.org [209.150.1.230]) by segue.merit.edu (Postfix) with ESMTP id 9A6995DE18 for <idr@merit.edu>; Wed, 7 May 2003 17:06:16 -0400 (EDT) Received: from workhorse.fictitious.org (localhost.fictitious.org [127.0.0.1]) by workhorse.fictitious.org (8.9.3/8.9.3) with ESMTP id RAA19411; Wed, 7 May 2003 17:06:04 -0400 (EDT) (envelope-from curtis@workhorse.fictitious.org) Message-Id: <200305072106.RAA19411@workhorse.fictitious.org> To: Pedro Roque Marques <roque@juniper.net> Cc: Alex Zinin <zinin@psg.com>, ppvpn@nortelnetworks.com, idr@merit.edu Reply-To: curtis@fictitious.org Subject: Re: On BGP and VPLS In-reply-to: Your message of "Wed, 07 May 2003 13:30:24 PDT." <200305072030.h47KUOC64557@roque-bsd.juniper.net> Date: Wed, 07 May 2003 17:06:04 -0400 From: Curtis Villamizar <curtis@fictitious.org> Sender: owner-idr@merit.edu Precedence: bulk Pedro, This is the BGP4 base spec. The interpretation of the BGP advertisements are IP prefix aka NLRI for which routes are advertised and not general keys mapping to general records. If some extension of BGP4 such as VPN or PW makes some other use of BGP flooding then that's fine but need not be reflected in the base document. The changes that you are only vaguely specifying (s/route/record/g s/IP prefix/key/g) doesn't at all pertain to the use of BGP as defined in the base spec, hasn't been interpreted as being in conflict with any existing document, and I don't think this is a productive discussion during last call of a very key document. Curtis Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id QAA10813 for <idr-archive@nic.merit.edu>; Wed, 7 May 2003 16:59:24 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id EB9129129B; Wed, 7 May 2003 16:59:05 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id BB0079129C; Wed, 7 May 2003 16:59:05 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 8AB169129B for <idr@trapdoor.merit.edu>; Wed, 7 May 2003 16:59:04 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 784425DE3D; Wed, 7 May 2003 16:59:04 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id F0BF45DE23 for <idr@merit.edu>; Wed, 7 May 2003 16:59:03 -0400 (EDT) Received: from roque-bsd.juniper.net (roque-bsd.juniper.net [172.17.12.183]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h47Kwxu29839; Wed, 7 May 2003 13:58:59 -0700 (PDT) (envelope-from roque@juniper.net) Received: (from roque@localhost) by roque-bsd.juniper.net (8.11.6/8.9.3) id h47Kww664593; Wed, 7 May 2003 13:58:58 -0700 (PDT) (envelope-from roque) Date: Wed, 7 May 2003 13:58:58 -0700 (PDT) Message-Id: <200305072058.h47Kww664593@roque-bsd.juniper.net> From: Pedro Roque Marques <roque@juniper.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: Alex Zinin <zinin@psg.com> Cc: ppvpn@nortelnetworks.com, idr@merit.edu Subject: Re: On BGP and VPLS In-Reply-To: <200305072030.h47KUOC64557@roque-bsd.juniper.net> References: <51133594448.20030502191439@psg.com> <200305030805.h4385Kd51107@roque-bsd.juniper.net> <6857870813.20030507112815@psg.com> <200305072030.h47KUOC64557@roque-bsd.juniper.net> X-Mailer: VM 6.34 under 19.16 "Lille" XEmacs Lucid Sender: owner-idr@merit.edu Precedence: bulk Pedro Roque Marques writes: >>> The way i see it there is an high likely-hood of this turning into >>> an "Yes, it is" "No, it isn't" discussion. And I'd really like to >>> avoid that. >> Agreed. Following up on my own e-mail... i don't think it is in any way productive to continue the discussion torwards this path. Lets try to turn the discussion around to the positive side: Ignore VPLS for now. Lets define a problem: 1. A database consisting on entries (key, attr) needs to be propagated accross routers of the same domain and accross different administrative domains. 2. An given key may be originated by more than one of the participating systems. 3. A key advertised via a given member of a domain depends on reachability to that advertiser. Task at hand is to find a solution to the problem above. Pedro. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id QAA10693 for <idr-archive@nic.merit.edu>; Wed, 7 May 2003 16:32:39 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 9D3C89129A; Wed, 7 May 2003 16:31:38 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 689D09129B; Wed, 7 May 2003 16:31:38 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id C0DC39129A for <idr@trapdoor.merit.edu>; Wed, 7 May 2003 16:31:36 -0400 (EDT) Received: by segue.merit.edu (Postfix) id ABF695DDAA; Wed, 7 May 2003 16:31:36 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id 3F7EA5DE2A for <idr@merit.edu>; Wed, 7 May 2003 16:31:36 -0400 (EDT) Received: from roque-bsd.juniper.net (roque-bsd.juniper.net [172.17.12.183]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h47KUOu27778; Wed, 7 May 2003 13:30:24 -0700 (PDT) (envelope-from roque@juniper.net) Received: (from roque@localhost) by roque-bsd.juniper.net (8.11.6/8.9.3) id h47KUOC64557; Wed, 7 May 2003 13:30:24 -0700 (PDT) (envelope-from roque) Date: Wed, 7 May 2003 13:30:24 -0700 (PDT) Message-Id: <200305072030.h47KUOC64557@roque-bsd.juniper.net> From: Pedro Roque Marques <roque@juniper.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: Alex Zinin <zinin@psg.com> Cc: ppvpn@nortelnetworks.com, idr@merit.edu Subject: Re: On BGP and VPLS In-Reply-To: <6857870813.20030507112815@psg.com> References: <51133594448.20030502191439@psg.com> <200305030805.h4385Kd51107@roque-bsd.juniper.net> <6857870813.20030507112815@psg.com> X-Mailer: VM 6.34 under 19.16 "Lille" XEmacs Lucid Sender: owner-idr@merit.edu Precedence: bulk Alex Zinin writes: > Pedro, Sorry for the delay. I found your message in my IDR folder > just recently. I'll limit my involvement to this message, I think I > will have said enough for my opinion to be heard :) Alex, >> The way i see it there is an high likely-hood of this turning into >> an "Yes, it is" "No, it isn't" discussion. And I'd really like to >> avoid that. > Agreed. It seems to me that we immediatly got into this dead-lock point. Allow me to make a further attempt to clarify my point of view. >> This point seems to be predicated in the statement that "BGP uses >> the NLRI field to carry IP reachability"... > Yes, plus that "BGP is an IP routing protocol." That is one possible application of BGP. It does not invalidate that possibility of other applications. Let me rephrase my point: It is possible to extract from the BGP specification a common set of functionality that is independent of 'IP routing'. Such common set of functionality consists on the ability to flood databases of information accross multiple domains, in a distributed fashion. Since this problem reoccurs often in networking, several other applications have made use of this commonality to avoid reinventing the wheel. The 'common functionality' in question is not so much what is documented in the BGP specification but the algorithms required to implement it. > This is where we disagree in the first place. > There's a fine line between a database distribution flooding > algorithm and a path-vector routing algorithm. BGP is clearly not > the former. "Yes, it is" :-) Give me an argument and i'll try a more productive response. >> As an exercise, if we take the existing spec and do: >> s/route/record/g s/IP prefix/key/g >> Do we still have a document that makes sense.. ? > I'm not sure :) >> Except for the vague bits about aggregation, about which BGP itself >> does little about, i would contend that the result would be pretty >> much the same. > You are forgetting the parts about Loc-RIB, routing and forwarding > tables, next-hop, etc. In any case, such a beast would seize to be > an IP routing protocol, but would still perform best path selection > to an opaque key in the Internet. If you need a database > distribution mechanism as you described above, you don't need the > path vector behavior, nor per-peer state for each key, nor > next-hops. I.e., you don't need what BGP does. You repeat this argument throughout your e-mail. And i'm completly missing your point: The path-vector algorithm is essential to the flooding of the BGP database information. Without it flooding would eternally loop. path-vector as in: o as-path and cluster-list loop detection o iBGP doesn't advertise iBGP This is what makes BGP flooding work. None of this information is used for Loc-rib purposes. >>> 2. Distribution of information [...] >> P routers do not carry 2547 routing information. > Unless the SPs want to use the existing RR infrastructure. All the SPs i've worked with explicitly do not want to do this. >> Not really... i can advertise the same key from multiple sources in >> L2VPNs also. All policy mechanisms do work... igp distance, etc. It >> is just the semantics once the path is selected that are different. >> As an example think working and protect PE for a given emulated >> circuit (or lan). > I think you might have missed my point here. Though a BGP speaker > will receive the same key from multiple sources, and will select the > best path to it, in the VPLS case, it is not interested in the > _best_ path, it is interested in only receiving the key, regardless > of where it comes from. That is factually incorrect. I gave you an example in the original e-mail to illustrate the point. The same 'key' maybe advertised from more than one source. BGP needs to select the best-path in any such occurence. > So RR and eBGP are the examples where a given BGP speaker would > receive extra copies of information. As I replied to Mark: > Well, in the IP/BGP case, it is not necessarily the same info, > as path attributes are likely to be different and the BGP speaker is > interested in selecting the best among them while preserving others > as the back-up. In the VPLS case, we don't need the best, just one > copy is sufficient. That is incorrect. PE1 advertises 'key' preference 100, label 10000 - this is working circuit. PE2 advertises 'key' preference 200, label 10001 - this is protect circuit. It works just like IP routing. All the preference mechanisms that you use in IP routing can be used here. IGP failure to PE1 for instance, causes a remote system to automatically reroute. > This does not change the fact that information that BGP as an IP > routing protocol distributes is aggregateable. That doesn't mean that BGP does aggregate anything either. BGP doesn't specify any algorithm for aggregation itself. What BGP does address is the interaction between possible aggregation and its loop detection mechanism. > Also note that route > aggregation rules are part of the routing protocol specification and > definitely depend on the protocol behavior (true for BGP, OSPF, and > ISIS), so it is not a completely distinct notion, though some > implementations decouple the two. BGP does not specify any rules for what routes should be aggregated or into what they should be aggregated. It cannot since these are operational decisions. The statement above is false as far as i can understand it. >>> 5. Coupling of VPLS and BGP SW >>> a) Lesser BGP code stability--bugs in the VPLS part of the >>> code >> You have no basis to conclude that. > I do :) Please present a justification then. We are back to "no, it isn't" / "yes, it is" mode. > The fact is that pieces of code in routing protocol implementations > are not only statically related via the function call tree, but also > dynamically and indirectly... but I will stop right here, because > we'll inevitably get into implementation specifics... This is what i call FUD... "Uncertainty" not being backed by any argument. The way i see it this is a central piece of your (IESG) original "concern" statement. >>> b) Potential dynamic effects--since with a BGP-based approach, >> I'm sorry but this is just FUD. > I hope people don't think about potential interference between large > distributed systems as FUD. You are wording it just that way: "potential interference between large distributed systems". No facts, no specific points. Just vague allegations. If the above is "fair game" then i want to ask the IESG to use the same "potential interference" criterium to, say, IPv6... IPv6 not only involves changes to BGP to support this NLRI but changes to all hosts and pretty much all protocols and applications. It can "dynamically and indirectly" cause interference too. Perhaps IPv6 is a treat to the internet stability ? Since i don't intend to run IPv6 in the forseable future it standardization of IPv6 is going cause 'interference' with my workstation software, which i rather not deal with. > I tend to look at this more broadly--putting VPLS functionality > in BGP increases the chances of interference. Some consider this a > strictly implementation-specific issue. I think that whether or not > VPLS-specific functionality is sufficiently decoupled from base BGP > is an implementation aspect; while increased risk of interference is > an architectural one. I believe that this completly confuses architecture with implementation. This is not an architectural consideration. This is a vague aspersion on the competency of BGP implementors. Your argument taken to the extreme would result in forbidding all and any standardization of new software mechanisms. Pedro. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA09071 for <idr-archive@nic.merit.edu>; Wed, 7 May 2003 14:31:44 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id E26BF91290; Wed, 7 May 2003 14:31:04 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id A59E491292; Wed, 7 May 2003 14:31:04 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id EFB7D91290 for <idr@trapdoor.merit.edu>; Wed, 7 May 2003 14:31:02 -0400 (EDT) Received: by segue.merit.edu (Postfix) id DC10A5DDE9; Wed, 7 May 2003 14:31:02 -0400 (EDT) Delivered-To: idr@merit.edu Received: from psg.com (psg.com [147.28.0.62]) by segue.merit.edu (Postfix) with ESMTP id 638C65DDE4 for <idr@merit.edu>; Wed, 7 May 2003 14:31:02 -0400 (EDT) Received: from psg.com ([147.28.0.62] helo=127.0.0.1) by psg.com with esmtp (Exim 3.36 #1) id 19DTgM-000IVV-00; Wed, 07 May 2003 18:30:22 +0000 Date: Wed, 7 May 2003 11:28:15 -0700 From: Alex Zinin <zinin@psg.com> X-Mailer: The Bat! (v1.62i) Personal Reply-To: Alex Zinin <zinin@psg.com> X-Priority: 3 (Normal) Message-ID: <6857870813.20030507112815@psg.com> To: Pedro Roque Marques <roque@juniper.net> Cc: ppvpn@nortelnetworks.com, idr@merit.edu Subject: Re: On BGP and VPLS In-Reply-To: <200305030805.h4385Kd51107@roque-bsd.juniper.net> References: <51133594448.20030502191439@psg.com> <200305030805.h4385Kd51107@roque-bsd.juniper.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-idr@merit.edu Precedence: bulk Pedro, Sorry for the delay. I found your message in my IDR folder just recently. I'll limit my involvement to this message, I think I will have said enough for my opinion to be heard :) > This point seems to be predicated in the statement that "BGP uses the > NLRI field to carry IP reachability"... Yes, plus that "BGP is an IP routing protocol." > It opens up a sort of philosophical discussion on BGP. This is of > course a highly subjective topic which is hard to quantify or to prove > by logical terms. > Allow me to present my personal view. > BGP is a particular implementation of an algorithm that performs non > looping database flooding distribution. That algorithm consists mostly > on the path vector (used both in ebgp and route reflection) plus route > advertisement rules. This is the publicly specified part of the beast. This is where we disagree in the first place. There's a fine line between a database distribution flooding algorithm and a path-vector routing algorithm. BGP is clearly not the former. > However that ends up being about 10% of the database exchange > algorithm. Each implementation uses distinct algorithms to do the real > heavy lifting: the advertisement of database updates to its peers, > given that each peer is allowed to flow control and that the ammount > of information to be distributed is typically non-trivial compared to > the resources of the system. > None of the functions above actually do depend on the format of your > database records. As long as there is a primary key associated with > each record. Modern implementations, given that they are required to > handle 3/4 different types of records w/ different keys (ipv4, ipv6, > 2547, 2547-for-ipv6, etc) will tend to treat these keys just as > database systems do: as a bit string without any semantics associated > w/ it. > Note also that the number of distinct tables exchanged in a 2547 > implementation may be in the thousands. So segregation of which record > belongs to which table is necessarily a solved problem in practice. > There is one part of BGP that however interacts w/ the semantics of > the particular database being exchanged: route selection from the > Loc-RIB. > The Loc-RIB is by definition where BGP interacts w/ remaining users of > the database and it includes rules that are system specific. > As an exercise, if we take the existing spec and do: > s/route/record/g > s/IP prefix/key/g > Do we still have a document that makes sense.. ? I'm not sure :) > Except for the vague bits about aggregation, about which BGP itself > does little about, i would contend that the result would be pretty > much the same. You are forgetting the parts about Loc-RIB, routing and forwarding tables, next-hop, etc. In any case, such a beast would seize to be an IP routing protocol, but would still perform best path selection to an opaque key in the Internet. If you need a database distribution mechanism as you described above, you don't need the path vector behavior, nor per-peer state for each key, nor next-hops. I.e., you don't need what BGP does. > 2547 which you cite is a particular good example, imho. A 2547 NLRI > ends up being used to create IP reachability information, but while it > is a safi 128 record, it is not IP reachability and it is not treated > as such. 2547 augments _IP prefixes_ with route distinguishers only to make sure that the prefix is unique when used as the key in the RIB. The rest of the reachability/prefix semantics are preserved. >> 2. Distribution of information > That is not the case w/ 2547. PE routers typically have interest in > only a subset of the routing information. They tend to do inbound > filtering in current network deployements but one can also do outbound > filtering in the RRs via either extended-community ORF or subsequent > improvements to ORF (draft-marques-ppvpn-rt-contrain). Note that I purposely compared the approach in the document with the base BGP, not 2547. > P routers do not carry 2547 routing information. Unless the SPs want to use the existing RR infrastructure. > RR in VPN deployments are typically not in the topology. My > understanding of the P-router term is that it is a transit node that > does not have VPN information. Agree, they don't have to. The text should have been "More than that, route reflectors (in some cases implemented on P routers) end up..." > Not really... i can advertise the same key from multiple sources in > L2VPNs also. All policy mechanisms do work... igp distance, etc. It is > just the semantics once the path is selected that are different. > As an example think working and protect PE for a given emulated > circuit (or lan). I think you might have missed my point here. Though a BGP speaker will receive the same key from multiple sources, and will select the best path to it, in the VPLS case, it is not interested in the _best_ path, it is interested in only receiving the key, regardless of where it comes from. > I don't know which model you have in mind but in a typical VPN > deployment scenario (l3 or l2/vpls/etc) a PE has 2 peering sessions to > a RR outside of the topology. The second copy of the information is > there for redudancy... > If a full mesh where used, only 1 copy would be present. So RR and eBGP are the examples where a given BGP speaker would receive extra copies of information. As I replied to Mark: Well, in the IP/BGP case, it is not necessarily the same info, as path attributes are likely to be different and the BGP speaker is interested in selecting the best among them while preserving others as the back-up. In the VPLS case, we don't need the best, just one copy is sufficient. This is not necessarily an issue per se, rather an interesting observation exposing the transport nature of this particular proposed application of BGP. We have similar properties (more than one copy received) in the flooding algorithm in IGPs, but we admit that flooding is a transport component very specific to IGPs (where every node needs all info, btw), and we don't keep track of where we receive PDUs from as we do in BGP. >> 3. Aggregation of information for large-scale operation ... > To give you an example, in JunOS aggregation is implemented as a > separate routing protocol... if i'm not mistaken the model is lifted > from 'gated'. Clearly the idea that aggregation may be a distinct > component from BGP has been around for a while. This does not change the fact that information that BGP as an IP routing protocol distributes is aggregateable. Also note that route aggregation rules are part of the routing protocol specification and definitely depend on the protocol behavior (true for BGP, OSPF, and ISIS), so it is not a completely distinct notion, though some implementations decouple the two. > VPLS doesn't really need aggregation although it does use an IGP :-) > PE to PE connectivity is performed indepently from the 'forwarding > distinguisher' advertisement (i.e. the inner label). Any or multiple > routing and singaling protocols may be used for this > functionality. Only the information exterior to the SP network > (service attachements) is carried through BGP. This part should go into the thread on "Info Summarization" where one of the questions is how we can limit the amount of state/information that a given participating node will have to maintain. I'd like to again highlight the difference between aggregateable semantics of NLRI contents in BGP when used as IP routing protocols, and non-aggregateable semantics of it in the proposal, which means that mechanisms different from those existing in current BGP practices would need to be used to limit the amount of maintained info. >> The above gives me a very uncomfortable feeling that the proposal >> is stretching BGP to perform functions it was not designed for. > Any succesful protocol will be used for means other than it was > designed for. That is usually a sign that the designers got something > right. I was being mild. >> 4. Backwards compatibility and SW upgrade requirements > That is not an issue as we've seen above. The deployment model is > different from what you assume. As before, unless the SPs want to use the existing RR infrastructure. >> 5. Coupling of VPLS and BGP SW >> a) Lesser BGP code stability--bugs in the VPLS part of the code > You have no basis to conclude that. I do :) > Any modern BGP implementation worth its salt consists of > AF-independent code + AF-specific code. The fact is that you can > implement VPLS without touching the AF-independent code. The fact is that pieces of code in routing protocol implementations are not only statically related via the function call tree, but also dynamically and indirectly... but I will stop right here, because we'll inevitably get into implementation specifics... >> b) Potential dynamic effects--since with a BGP-based approach, > I'm sorry but this is just FUD. I hope people don't think about potential interference between large distributed systems as FUD. > All router implementations do have some level of resource sharing > between completly unrelated features. In some of them, all > functionality shares all resources. Agreed, though I was talking about tighter coupling when Inet BGP and VPLS BGP are in the same process/thread (which is very likely the case). As I told in my answer to Mark: I tend to look at this more broadly--putting VPLS functionality in BGP increases the chances of interference. Some consider this a strictly implementation-specific issue. I think that whether or not VPLS-specific functionality is sufficiently decoupled from base BGP is an implementation aspect; while increased risk of interference is an architectural one. Again, I'll stop here too. >> My recommendation would be for the WG to consider these points. > The way i see it there is an high likely-hood of this turning into an > "Yes, it is" "No, it isn't" discussion. And I'd really like to avoid > that. Agreed. > A question to you and to the WG(s) in general: > - What are the main concerns that you have w/ the generic database > exchange view of BGP (Lets call it the "Basically General Purpose" > theory). The fact that BGP is not a generic database exchange protocol, and I don't think it should be positioned as such. > - Can we have a reasonable discussion about the best engineering > approach to provide database exchange services for > routing-related-applications without getting into a religious argument > about "2547 is evil" ? i.e. can we try to separate how highly each one > of us rates the actual application from this discussion ? I think we'll have to agree on the definition of "database exchange services" and "routing-related-applications", but generally, yes, sure. > - I believe one of the preconditions for a resonable discussion is to > realise that implementors are the most interested people in not > introducing regressions to shipping code. They actually get to fix it > after being screamed at for a considerable lenght of time. > I'd really like to get past the "you can't implement a feature i don't > want because your are going to break the code" kind of discussion. I think there is a generally good understanding of this. I don't think this is something that is sufficient for the IETF to base technical conclusions on. > - Are we going to have a similiar discussion about LDP ? LDP is not > any less relevant for network stability nor a protocol which is any > simpler than BGP (if anything the level of complexity is higher given > that LDP has all the db exchange problem of BGP + a non trivial > ammount of issues of its own). I have absolutely no problem with this. Thanks for your comments. Alex Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id TAA01494 for <idr-archive@nic.merit.edu>; Mon, 5 May 2003 19:43:12 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 0BC8C9122B; Mon, 5 May 2003 19:42:35 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id CDABB9122D; Mon, 5 May 2003 19:42:34 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 382099122B for <idr@trapdoor.merit.edu>; Mon, 5 May 2003 19:42:32 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 2267C5E29F; Mon, 5 May 2003 19:42:32 -0400 (EDT) Delivered-To: idr@merit.edu Received: from psg.com (psg.com [147.28.0.62]) by segue.merit.edu (Postfix) with ESMTP id 752B45E240 for <idr@merit.edu>; Mon, 5 May 2003 19:42:31 -0400 (EDT) Received: from psg.com ([147.28.0.62] helo=127.0.0.1) by psg.com with esmtp (Exim 3.36 #1) id 19CpbJ-000FaJ-00; Mon, 05 May 2003 23:42:29 +0000 Date: Mon, 5 May 2003 16:38:15 -0700 From: Alex Zinin <zinin@psg.com> X-Mailer: The Bat! (v1.62i) Personal Reply-To: Alex Zinin <zinin@psg.com> X-Priority: 3 (Normal) Message-ID: <177177649135.20030505163815@psg.com> To: idr@merit.edu Cc: rtg-dir@ietf.org Subject: AD-review comments on draft-ietf-idr-bgp4-20 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-idr@merit.edu Precedence: bulk Folks, Please find below my AD-review comments. Hopefully they will help improve the document. I tried to consult Andrew's list as much as possible, but do feel free to point out if something has already been discussed and agreed upon. Thanks go to Yakov for kicking me often enough ;) -- Alex Zinin Some nits: - run it by a spelling checker, please - disable hyphenation if possible - include boilerplates for IPR notice, Copyright notice General comment: in some places I highlighted the fact that required behavior is not described using the 2119 language, so it is not clear if a MUST or SHOULD or MAY is applicable. I am sure I've missed some more places like this. I'd like to ask the editors to go through the doc and check this. > A Border Gateway Protocol 4 (BGP-4) > <draft-ietf-idr-bgp4-20.txt> > > > Status of this Memo > > ... > The list of Internet-Draft Shadow Directories can be accessed at > http://www.ietf.org/shadow.html. > > Specification of Requirements Nit: move Abstract here. Move requirements after the Acks. > Abstract Should the Abstract say that this spec covers IPv4 only? > 3. Summary of Operation ... > This document uses the term `Autonomous System' (AS) throughout. The > classic definition of an Autonomous System is a set of routers under > a single technical administration, using an interior gateway protocol > (IGP) and common metrics to determine how to route packets within the > AS, and using an inter-AS routing protocol to determine how to route > packets to other ASs. Since this classic definition was developed, it > has become common for a single AS to use several IGPs and sometimes > several sets of metrics within an AS. The use of the term Autonomous > System here stresses the fact that, even when multiple IGPs and met- > rics are used, the administration of an AS appears to other ASs to > have a single coherent interior routing plan and presents a consis- > tent picture of what destinations are reachable through it. Ed: Since 'AS' has been defined before, do we need to repeat the definition here? ... > peer in the same AS is referred to as an internal peer. Internal BGP > and external BGP are commonly abbreviated IBGP and EBGP. Ed: These two have been defined before too ... > Care must be taken to > ensure that the interior routers have all been updated with transit > information before the BGP speakers announce to other ASs that tran- > sit service is being provided. What does the last sentence really mean from the implementation perspective? It used to mean the BGP/IGP synchronization check. Now that iBGP everywhere is assumed, how do we check this condition? > This document specifies the base behavior of the BGP protocol. This > behavior can and is modified by extention specifications. When the Ed: "extension" > protocol is extended the new behavior is fully documented in the > extention specifications. Ed: "extension" > 3.1 Routes: Advertisement and Storage > > For the purpose of this protocol, a route is defined as a unit of > information that pairs a set of destinations with the attributes of a > path to those destinations. The set of destinations are systems whose > IP addresses are contained in one IP address prefix carried in the > Network Layer Reachability Information (NLRI) field of an UPDATE mes- > sage, and the path is the information reported in the path attributes > field of the same UPDATE message. Ed: Repeated definition again ... > If a BGP speaker chooses to advertise the route, it MAY add to or > modify the path attributes of the route before advertising it to a > peer. The intent here is to say that it's ok to modify the attribute set of a previously received route when it's announced further. The way it reads though is that self-originated routes are also within the context and MAY sounds like you don't have to add attributes when announcing those. ... > Changing attribute of a route is accomplished by advertising a > replacement route. The replacement route carries new (changed) > attributes and has the same NLRI as the original route. "same NLRI" implies the same prefix, but not the NLRI field, which can be different (containing other routes), should the use of this term be normalized throughout the document? > 4.2 OPEN Message Format > > After a TCP is established, the first message sent by each side is an "TCP connection" > 5. Path Attributes ... > If a path with recognized transitive optional attribute is accepted > and passed along to other BGP peers and the Partial bit in the > Attribute Flags octet is set to 1 by some previous AS, it is not 'MUST NOT' here? > set > back to 0 by the current AS. Unrecognized non-transitive optional > attributes MUST be quietly ignored and not passed along to other BGP > peers. ... > The same attribute (attribute with the same type) can not appear more > than once within the Path Attributes field of a particular UPDATE > message. What should an implementation do if this happens? > The mandatory category refers to an attribute which MUST be present > in both IBGP and EBGP exchanges if NLRI are contained in the UPDATE Ed: "if the NLRI field is contained" instead? > 5.1.2 AS_PATH ... > b) When a given BGP speaker advertises the route to an external > peer, then the advertising speaker updates the AS_PATH attribute > as follows: > > 1) if the first path segment of the AS_PATH is of type > AS_SEQUENCE, the local system prepends its own AS number as the > last element of the sequence (put it in the leftmost position). 'Leftmost position'... isn't this still open for interpretation? How about wording this relative to the position of the octets in the protocol message? > If the act of prepending will cause an overflow in the AS_PATH > segment, i.e. more than 255 ASs, it is legal What's the recommended behavior here? > to prepend a new > segment of type AS_SEQUENCE and prepend its own AS number to > this new segment. > 5.1.4 MULTI_EXIT_DISC > > > The MULTI_EXIT_DISC is an optional non-transitive attribute which is > intended to be used on external (inter-AS) links to discriminate > among multiple exit or entry points to the same neighboring AS. The > value of the MULTI_EXIT_DISC attribute is a four octet unsigned num- > ber which is called a metric. All other factors being equal, the exit > point with lower metric SHOULD be preferred. If received over EBGP, > the MULTI_EXIT_DISC attribute MAY be propagated over IBGP to other > BGP speakers within the same AS. The MULTI_EXIT_DISC attribute seems that a reference to 9.1.2.2 is due here, as using MED in local route calculation and not propagating it further is dangerous > received from a neighboring AS MUST NOT be propagated to other neigh- > boring ASs. > > A BGP speaker MUST IMPLEMENT a mechanism based on local configuration ^^^^^^^^^lower-case > which allows the MULTI_EXIT_DISC attribute to be removed from a > route. This MAY be done prior to determining the degree of preference what's the recommended behavior here? > of the route and performing route selection (decision process phases > 1 and 2). > > An implementation MAY also (based on local configuration) alter the > value of the MULTI_EXIT_DISC attribute received over EBGP. This MAY > be done prior to determining the degree of preference of the route what's the recommended behavior here? > 5.1.5 LOCAL_PREF ... > A BGP speaker SHALL calculate the degree of preference for > each external route based on the locally configured policy, and Should we be more honest here and say that the implementation must allow the admin to SET the degree of preference through the local policy to influence the best-path selection process, i.e., I don't think any implementation really *calculates* it. > 5.1.6 ATOMIC_AGGREGATE ... > A BGP speaker that receives a route with the ATOMIC_AGGREGATE > attribute MUST NOT make any NLRI of that route more specific (as > defined in 9.1.4) when advertising this route to other BGP speakers. Since deaggregation is not described in this document, do we need this para? > A BGP speaker that receives a route with the ATOMIC_AGGREGATE > attribute needs to be cognizant of the fact that the actual path to > destinations, as specified in the NLRI of the route, while having the > loop-free property, may not be the path specified in the AS_PATH > attribute of the route. What does this really mean from the implementation perspective? > 5.1.7 AGGREGATOR > > > AGGREGATOR is an optional transitive attribute which MAY be included > in updates which are formed by aggregation (see Section 9.2.2.2). A > BGP speaker which performs route aggregation MAY add the AGGREGATOR What's the recommended behavior here? Include or not, and under what circumstances? > 6. BGP Error Handling. ... > The phrase "the BGP connection is closed" means that the TCP connec- > tion has been closed, the associated Adj-RIB-In has been cleared, and > that all resources for that BGP connection have been deallocated. > Entries in the Loc-RIB associated with the remote peer are marked as > invalid. The fact that the routes have become invalid is passed to > other BGP peers before the routes are deleted from the system. What does "the fact is passed" mean? Should we instead say that local route recalculation happens and peers are sent either updated best routes or withdrawals? > 6.2 OPEN message error handling. ... > If the Autonomous System field of the OPEN message is unacceptable, > then the Error Subcode is set to Bad Peer AS. The determination of > acceptable Autonomous System numbers is outside the scope of this > protocol. Shouldn't we say that configuration based detection should be supported, i.e., when remote-as is configured for the peer? ... > If the BGP Identifier field of the OPEN message is syntactically > incorrect, then the Error Subcode is set to Bad BGP Identifier. Syn- > tactic correctness means that the BGP Identifier field represents a > valid IP host address. Is "valid IP host address" defined somewhere, btw? > 6.3 UPDATE message error handling. > > > All errors detected while processing the UPDATE message are indicated > by sending the NOTIFICATION message with Error Code UPDATE Message > Error. The error subcode elaborates on the specific nature of the > error. "are indicated..." is this a MUST, SHOULD, or MAY? ... > If the ORIGIN attribute has an undefined value, then the Error Sub- > code is set to Invalid Origin Attribute. The Data field contains the > unrecognized attribute (type, length and value). Curious: do we really have to drop a session on this condition? Given that the attribute was syntactically correct and the TLV was not broken, so the stream is still in sync and we can move on? Of course, if this is what current implementations do, we have no other choice. ... > If the UPDATE message is received from an external peer, the local > system MAY check whether the leftmost AS in the AS_PATH attribute is Same comment about 'leftmost'... Maybe we should define this somewhere in the beginning of the spec? ... > The NLRI field in the UPDATE message is checked for syntactic valid- > ity. If the field is syntactically incorrect, then the Error Subcode > is set to Invalid Network Field. Should we give more data on what syntactic validity means in this case so people behave consistently? > 6.7 Cease. ... > If the BGP speaker decides to terminate its BGP > connection with a neighbor because the number of address prefixes > received from the neighbor exceeds the locally configured upper > bound, then the speaker MUST send to the neighbor a NOTIFICATION mes- > sage with the Error Code Cease. Should we also say that when the peer decides to discard incoming prefixes, this event should be logged locally? > 8. BGP Finite State machine General comment: I would _really_ appreciate more people looking at this section. > The optional Session attributes are listed below. These optional > attributes may be supported either per connection or per local sys- > tem: > > 1) Delay Open flag Where's the description of this flag and how/when is it set? Same for others below. Should we have a brief description for each attribute? > 2) Open Delay Timer > 3) Perform automatic start flag > 4) Perform automatic stop flag > 5) Passive TCP establishment flag > 6) Perform BGP peer oscillation damping flag > (which will be denoted as stop_peer_flap in text) > 7) Idle Hold timer > 8) Perform Collision detect in Established flag > 9) Accept connections from un-configured peers > 10) Track TCP state flag > 11) Send NOTIFICATION without an OPEN flag > Suggestion: to make reading of the FSM description below easier, we could "merge" the multiword flag names and normalize them, e.g. 'perform automatic start flag' to 'PerformAutoStart flag'. 'Passive TCP establishment flag' to 'PassiveTCPEstablishment flag', 'stop_peer_flap' to 'StopPeerFlag'. > 8.1.1 Administrative Events > > > Please note that only Event 1 (manual start) and Event 2 (manual > stop) are mandatory administrative events. All other administrative > events are optional. The optional attributes do not have to be sup- > ported. However, if these attributes are supported, the state of the > flags should be as indicated. 'flags should be as indicated' does not give a clear understanding of what they are used for. Should the events be sanity-checked by checking those attributes? what's the recommended behavior when the flags are in a different state? > Event3: Automatic start > > Definition: Local system automatically starts the > BGP connection. > When is this event generated by the system? Under what conditions? > Status: Optional depending on local system. > > Optional > attributes: 1) Perform automatic start flag SHOULD be set > if this event occurs. > 2) if the passive Passive TCP establishment flag passive Passive? > Event5: Automatic start with passive TCP flag > > Definition: Local system automatically starts the > BGP connection with the passive flag > enabled. The passive flag indicates > Same question about generation conditions .. > Event23: Open collision dump > > Definition: An event generated administratively > when a connection collision has been > detected while processing an incoming > OPEN message and this connection has been > selected to disconnected. See Section 'to be disconnected' > 6.8 for more information on collision > detection. > > Event23 is an administrative based only 'based on'? > implementation specific policy. This > Event may occur if the FSM is implemented > as two linked state machines. > > > Status: Optional, depending on local system > > Optional > Attributes: If the state machine is to process this > attribute in Established state, > 1) Peform Collision detect in Established 'Perform' > flag SHOULD be set. ... > Event25: NotifMsg > > Definition: An event is generated when a > NOTIFICATION messages is received and message > the error code is anything but > "version error". > > Status: Mandatory > 8.2.1 FSM Definition > > > BGP MUST maintain a separate FSM for each configured peer, Each BGP > peer paired in a potential connection unless configured to remain in > the idle state, or configured to remain passive, will attempt to to to to > connect to the other. For the purpose of this discussion, the active > or connect side of the TCP connection (the side of a TCP connection 'active or connecting'? > sending the first TCP SYN packet) is called outgoing. The passive or > listening side (the sender of the first SYN ACK) is called an incom- > ing connection (see Section 8.2.1.1 on the terms active and passive > below). > > A BGP implementation MUST connect to and listen on TCP port 179 for > incoming connections in addition to trying to connect to peers. For > each incoming connection, a state machine MUST be instantiated. Is this true for implementations that resolve connection collision through one FSM with two transport connections? > 8.2.1.1 Terms "active" and "passive" > > > The terms active and passive have been in our vocabulary for almost a > decade and have proven useful. Ed: The style here is quite different from the rest of the document (i.e., personalization), plus time values tend become outdated with time :) > 8.2.1.2 FSM and collision detection > > > There is one FSM per BGP connection. Prior to determining what peer > a connection is associated with there may be two connections for a > given peer. There SHOULD be no more than one connection per peer. Is above "SHOULD" normative? I.e., should be "should" instead? > The collision detection identifies the case where there is more than > one connection per peer and provides guidance for which connection to > get rid of. When this occurs, the corresponding FSM for the connec- > tion that is closed SHOULD be disposed of. > BTW, I think the specification would really benefit from a section that describes processing of incoming transport connections. > 8.2.2 Finite State Machine > > > Idle state: > > Initially BGP is in the Idle state. Not BGP, but the peer FSM, right? > > In this state BGP refuses all incoming BGP connections. No all incoming connections from that peer? > > resources are allocated to the peer. In response to a > manual start event(Event1) or an automatic start > event(Event3), the local system: > - initializes all BGP resources, all BGP resources or only those needed for the peer? also, what does 'initialize' mean here? > - sets ConnectRetryCnt (the connect retry counter) to zero Seems we have inconsistency in FSM parameter naming here. > In response to a manual start event with the passive TCP connection > flag (Event 4) or automatic start with the passive TCP connection > flag (Event 5), the local system: > - initializes all BGP resources, > - sets ConnectRetryCnt (the connect retry counter) to zero, > - starts the Connect Retry timer with initial value, > - listens for a connection that may be initiated by > the remote peer, and > - changes its state to Active. Ditto comments here > The method of preventing persistent peer oscillation is > outside the scope of this document. So we have these events, but we don't define how to handle them? > Any other events [Events 9-12, 15-28] received in the Idle state > does not cause change in the state of the local system. 'do not cause changes' ? > In response to a manual stop event [Event2], the local system: > - drops the TCP connection, > - releases all BGP resources, > - sets ConnectRetryCnt (the connect retry count) to zero > - sets the Connect Retry timer to zero, and sets timer to zero? 'Stops the timer' instead? > - changes its state to Idle. ... > If the BGP port receives a valid TCP connection indication BGP port? > [Event 14], the TCP connection is processed and > the connection remains in the Connect state. > > If the TCP connection receives an invalid indication [Event 15]: TCP connection receives? > the local system rejects the TCP connection and the connection > remains in the Connect state. > > If the TCP connection succeeds [Event 16 or Event 17], > the local system checks the Delay Open flag prior to > processing. If the Delay Open flag is set, the local system: > - sets the Connect Retry timer to zero, "stops" instead? > - set the Open Delay timer to the initial value, and sets > - stays in the Connect state. > If the Delay Open flag is not set, the local system: > - sets the Connect Retry timer to zero, stops > - completes BGP initialization What does the above really mean? ... > the Open Delay Timer. If the Open Delay timer is running, > the local system: > - restarts the connect retry time with initial value, > - stops the Open Delay timer and resets value to zero, > - continues to listen for a connection that may be > initiated by the remote BGP peer, and > - changes its state to Active. > If the open Delay timer is not running, the local system: > - sets the Connect Retry timer to zero, > - drops the TCP connection, > - releases all BGP resources, and all BGP resources? > - changes its state to Idle. > > If an OPEN message is received with the Open Delay timer is > running [Event 20], the local system: > - sets the Connect Retry timer to zero, > - completes the BGP initialization, What does it mean? > - stops and clears the Open Delay timer (sets the value to zero), > - sends an OPEN message, > - sends a KEEPALIVE message, > - If the hold timer value is non-zero, > - start the keepalive timer to inital value, "starts"... start to initial value? > - reset the hold timer to the negotiated value, Resets > else if hold timer value is zero, > - reset the keepalive timer, and resets > - reset the hold timer value to zero resets > - and changes its state to OpenConfirm. > OK, I'll stop reviewing the FSM text here and will skip to the next section. Given the number of English grammar mistakes, it is clear to me that either it has not been sufficiently reviewed or even read by someone carefully enough or the comments have not been incorporated. Please address. ... > 9. UPDATE Message Handling > > > An UPDATE message may be received only in the Established state. What if it is received in another state? ... > 9.1 Decision Process > > > The Decision Process selects routes for subsequent advertisement by > applying the policies in the local Policy Information Base (PIB) to > the routes stored in its Adj-RIBs-In. The output of the Decision Pro- > cess is the set of routes that will be advertised to peers; the > selected routes will be stored in the local speaker's Adj-RIB-Out RIB-Out or RIBs-out (plural)? > according to policy. > > The selection process is formalized by defining a function that takes > the attribute of a given route as an argument and returns either (a) > a non-negative integer denoting the degree of preference for the > route, or (b) a value denoting that this route is ineligible to be > installed in LocRib and will be excluded from the next phase of route Loc-RIB > selection. ... > The Decision Process operates on routes contained in the Adj-RIB-In, Adj-RIBs-In (plural) ? > and is responsible for: > 9.1.1 Phase 1: Calculation of Degree of Preference ... > If the route is learned from an external peer, then the local BGP > speaker computes the degree of preference based on preconfigured > policy information. If the return value indicates that the route > is ineligible, the route MAY NOT serve as an input to the next > phase of route selection; otherwise the return value is used as > the LOCAL_PREF value in any IBGP readvertisement. So, AFAIK, the major implementations do not follow this step (calculating the degree of preference, and then announcing). Instead, implementations allow setting the LOCAL_PREF value locally, which is taken into consideration during the best path selection, and is also reannounced further. Also "is used" is not specific enough. Is it SHOULD or MUST? > 9.1.2 Phase 2: Route Selection ... > If the AS_PATH attribute of a BGP route contains an AS loop, the BGP > route should be excluded from the Phase 2 decision function. AS loop > detection is done by scanning the full AS path (as specified in the > AS_PATH attribute), and checking that the autonomous system number of > the local system does not appear in the AS path. Operations of a BGP > speaker that is configured to accept routes with its own autonomous > system number in the AS path are outside the scope of this document. If we're checking for an AS loop here (in Phase 2) as opposed to during the UPDATE message sanity checking, the route is already received and accepted in the peer's Adj-RIB-In. Those implementations I know don't even install such routes in the RIB... > 9.1.2.2 Breaking Ties (Phase 2) ... > Similarly, neighborAS(n) is a function which returns the neighbor > AS from which the route was received. If the route is learned via > IBGP, and the other IBGP speaker didn't originate the route, it is > the neighbor AS from which the other IBGP speaker learned the > route. If the route is learned via IBGP, and the other IBGP > speaker originated the route, it is the local AS. What if the route is locally originated? > 9.1.4 Overlapping Routes ... > When overlapping routes are present in the same Adj-RIB-In, the more > specific route takes precedence, in order from more specific to least > specific. > Doesn't this happen at the packet forwarding stage? > > The set of destinations described by the overlap represents a portion > of the less specific route that is feasible, but is not currently in > use. If a more specific route is later withdrawn, the set of desti- > nations described by the overlap will still be reachable using the > less specific route. > > If a BGP speaker receives overlapping routes, the Decision Process > MUST consider both routes based on the configured acceptance policy. > If both a less and a more specific route are accepted, then the Deci- > sion Process MUST either install both the less and the more specific Install where? > routes or it MUST aggregate the two routes and install the aggregated > route, provided that both routes have the same value of the NEXT_HOP > attribute. anyone really does the latter? > If a BGP speaker chooses to aggregate, then it SHOULD either include > all AS used to form the aggreagate in an AS_SET or add the > ATOMIC_AGGREGATE attribute to the route. This attribute is now pri- > marily informational. With the elimination of IP routing protocols > that do not support classless routing and the elimination of router > and host implementations that do not support classless routing, there > is no longer a need to deaggregate. Routes SHOULD NOT be de-aggre- > gated. A route that carries ATOMIC_AGGREGATE attribute in particular > MUST NOT be de-aggregated. That is, the NLRI of this route can not be > made more specific. Forwarding along such a route does not guarantee > that IP packets will actually traverse only ASs listed in the AS_PATH > attribute of the route. Since we don't do deaggregation any more, should we remove the discussion about it completely and indicate in the "changes" section that deaggregation has been deprecated? > 9.2 Update-Send Process ... > When a BGP speaker receives an UPDATE message from an internal peer, > the receiving BGP speaker SHALL NOT re-distribute the routing infor- > mation contained in that UPDATE message to other internal peers, > unless the speaker acts as a BGP Route Reflector [RFC2796]. Suggest to put "unless..." in brackets () to make it more apparent that this is not a normative ref. > 9.2.1.1 Frequency of Route Advertisement > Since fast convergence is needed within an autonomous system, either > (a) the MinRouteAdvertisementInterval used for internal peers SHOULD > be shorter than the MinRouteAdvertisementInterval used for external > peers, or (b) the procedure describe in this section SHOULD NOT apply > for routes sent to internal peers. It sounded like MinRouteAdvertisementInterval was an architectural constant, but now it sounds like either this is a timer that can be assigned different settings or there are two constants: MinRouteAdvIntIBGP and MinRouteAdvIntEBGP. > 9.2.2.2 Aggregating Routing Information > Hmmm... I expected to see in this section some text talking about when and how an aggregate would be announced, i.e., when an aggregate prefix is configured, and more specific routes are present, the aggregate is announced, when no specifics are left--withdraw the aggregate. I haven't found anything on this topic... > 9.3 Route Selection Criteria > > Generally speaking, additional rules for comparing routes among sev- > eral alternatives are outside the scope of this document. There are > two exceptions: > > - If the local AS appears in the AS path of the new route being > considered, then that new route can not be viewed as better than > any other route (provided that the speaker is configured to accept > such routes). If such a route were ever used, a routing loop could > result. > > - In order to achieve successful distributed operation, only > routes with a likelihood of stability can be chosen. Thus, an AS > SHOULD avoid using unstable routes, and it SHOULD NOT make rapid > spontaneous changes to its choice of route. Quantifying the terms > "unstable" and "rapid" in the previous sentence will require expe- > rience, but the principle is clear. > Where does this (the second one) fit within and how does this affect the route selection criteria? > Care must be taken to ensure that BGP speakers in the same AS do not > make inconsistent decisions. How? What does this mean for the implementor? > 9.4 Originating BGP routes > > A BGP speaker may originate BGP routes by injecting routing informa- > tion acquired by some other means (e.g. via an IGP) into BGP. A BGP > speaker that originates BGP routes assigns the degree of preference > "assigns the degree of preference"... how do the implementations really do that? > 10 BGP Timers ... > The suggested default value for the MinRouteAdvertisementInterval is > 30 seconds. This was described as a parameter, not a timer. Further, it was earlier suggested that it should be shorter for iBGP than it is for eBGP. I'd expect the document to specify the recommended value for both. > IANA Considerations ... > All extensions to this protocol, including new message types and Path > Attributes MUST only be made using the Standards Action process > defined in [RFC2434]. This section should include the description of each registry that needs to be created (if needed) and maintained by IANA, as well as the allocation policy that is in the text already. <EOM> Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id QAA25105 for <idr-archive@nic.merit.edu>; Mon, 5 May 2003 16:09:42 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 403CA9123E; Mon, 5 May 2003 16:09:10 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 160439123F; Mon, 5 May 2003 16:09:10 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id D579C9123E for <idr@trapdoor.merit.edu>; Mon, 5 May 2003 16:09:08 -0400 (EDT) Received: by segue.merit.edu (Postfix) id C32DE5E23E; Mon, 5 May 2003 16:09:08 -0400 (EDT) Delivered-To: idr@merit.edu Received: from workhorse.fictitious.org (workhorse.fictitious.org [209.150.1.230]) by segue.merit.edu (Postfix) with ESMTP id 0C5FF5E23D for <idr@merit.edu>; Mon, 5 May 2003 16:09:08 -0400 (EDT) Received: from workhorse.fictitious.org (localhost.fictitious.org [127.0.0.1]) by workhorse.fictitious.org (8.9.3/8.9.3) with ESMTP id QAA04798; Mon, 5 May 2003 16:09:20 -0400 (EDT) (envelope-from curtis@workhorse.fictitious.org) Message-Id: <200305052009.QAA04798@workhorse.fictitious.org> To: Jeffrey Haas <jhaas@nexthop.com> Cc: Yakov Rekhter <yakov@juniper.net>, idr@merit.edu Reply-To: curtis@fictitious.org Subject: Re: Issue 19) Security Considerations In-reply-to: Your message of "Mon, 05 May 2003 14:45:13 EDT." <20030505144513.B17555@nexthop.com> Date: Mon, 05 May 2003 16:09:20 -0400 From: Curtis Villamizar <curtis@fictitious.org> Sender: owner-idr@merit.edu Precedence: bulk In message <20030505144513.B17555@nexthop.com>, Jeffrey Haas writes: > Yakov, > > On Mon, May 05, 2003 at 09:59:14AM -0700, Yakov Rekhter wrote: > > I don't recall seeing any objections to adding this to the document. > > It was more that I hadn't heard anything from anyone one way or the > other. > > > Yakov. > > -- > Jeff Haas > NextHop Technologies Neither did I, in case there was any question about whether something was being worked out off list. Curtis Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id OAA22918 for <idr-archive@nic.merit.edu>; Mon, 5 May 2003 14:46:46 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id A063491239; Mon, 5 May 2003 14:45:45 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 7021F9123B; Mon, 5 May 2003 14:45:45 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 3F87F91239 for <idr@trapdoor.merit.edu>; Mon, 5 May 2003 14:45:44 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 9220F5E1E9; Mon, 5 May 2003 14:45:40 -0400 (EDT) Delivered-To: idr@merit.edu Received: from presque.nexthop.com (dns.nexthop.com [65.247.36.216]) by segue.merit.edu (Postfix) with ESMTP id 65EA25E1DA for <idr@merit.edu>; Mon, 5 May 2003 14:45:37 -0400 (EDT) Received: (from root@localhost) by presque.nexthop.com (8.12.8/8.11.1) id h45IjZh5075505; Mon, 5 May 2003 14:45:35 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: from jhaas.nexthop.com (jhaas.nexthop.com [65.247.36.31]) by presque.nexthop.com (8.12.8/8.12.8) with ESMTP id h45IjIWB075430; Mon, 5 May 2003 14:45:18 -0400 (EDT) (envelope-from jhaas@jhaas.nexthop.com) Received: (from jhaas@localhost) by jhaas.nexthop.com (8.11.3nb1/8.11.3) id h45IjDt21595; Mon, 5 May 2003 14:45:13 -0400 (EDT) Date: Mon, 5 May 2003 14:45:13 -0400 From: Jeffrey Haas <jhaas@nexthop.com> To: Yakov Rekhter <yakov@juniper.net> Cc: idr@merit.edu Subject: Re: Issue 19) Security Considerations Message-ID: <20030505144513.B17555@nexthop.com> References: <20030430124022.K24007@nexthop.com> <200305051659.h45GxEu26987@merlot.juniper.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200305051659.h45GxEu26987@merlot.juniper.net>; from yakov@juniper.net on Mon, May 05, 2003 at 09:59:14AM -0700 X-Virus-Scanned: by AMaViS perl-11 Sender: owner-idr@merit.edu Precedence: bulk Yakov, On Mon, May 05, 2003 at 09:59:14AM -0700, Yakov Rekhter wrote: > I don't recall seeing any objections to adding this to the document. It was more that I hadn't heard anything from anyone one way or the other. > Yakov. -- Jeff Haas NextHop Technologies Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id MAA19898 for <idr-archive@nic.merit.edu>; Mon, 5 May 2003 12:59:50 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id EE03491230; Mon, 5 May 2003 12:59:22 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id B7AC691231; Mon, 5 May 2003 12:59:22 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 8307C91230 for <idr@trapdoor.merit.edu>; Mon, 5 May 2003 12:59:21 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 6100B5E185; Mon, 5 May 2003 12:59:21 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id DAAFA5E17B for <idr@merit.edu>; Mon, 5 May 2003 12:59:20 -0400 (EDT) Received: from juniper.net (garnet.juniper.net [172.17.28.17]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h45GxEu26987; Mon, 5 May 2003 09:59:14 -0700 (PDT) (envelope-from yakov@juniper.net) Message-Id: <200305051659.h45GxEu26987@merlot.juniper.net> To: Jeffrey Haas <jhaas@nexthop.com> Cc: idr@merit.edu Subject: Re: Issue 19) Security Considerations In-Reply-To: Your message of "Wed, 30 Apr 2003 12:40:23 EDT." <20030430124022.K24007@nexthop.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <31248.1052153954.1@juniper.net> Date: Mon, 05 May 2003 09:59:14 -0700 From: Yakov Rekhter <yakov@juniper.net> Sender: owner-idr@merit.edu Precedence: bulk Jeff, > Curtis wrote in a previous message: > > On Thu, Apr 03, 2003 at 09:36:06AM -0500, Curtis Villamizar wrote: > > --- draft-murphy-bgp-vuln-02.txt Wed Mar 5 21:00:00 2003 > > +++ draft-murphy-bgp-vuln-02.txt++ Thu Apr 3 09:18:12 2003 > > @@ -149,6 +149,7 @@ > > 3.2.2.2 Timer events .............................................. 16 > > 4 Security Considerations ......................................... 16 > > 4.1 Residual Risk ................................................. 16 > > +4.2 Practical Considerations ...................................... 16 > > 5 References ...................................................... 17 > > 6 Author's Address ................................................ 18 > > > > @@ -901,6 +902,79 @@ > > Filtering is in use near some customer attachment points, but is not > > effective near the Internet center. The other mechanisms are still > > controversial and are not yet in common use. > > + > > +4.2 Practical Considerations > [...] > > This looks like it has good merit. Shouldn't we add this to the document? > (Well, not "we" since Sandy is authoring it, but it seems like a good idea.) I don't recall seeing any objections to adding this to the document. Yakov. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id JAA13154 for <idr-archive@nic.merit.edu>; Mon, 5 May 2003 09:22:43 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 03B0991229; Mon, 5 May 2003 09:22:22 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id BFEB69122A; Mon, 5 May 2003 09:22:21 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 78F6491229 for <idr@trapdoor.merit.edu>; Mon, 5 May 2003 09:22:20 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 5B4FA5E129; Mon, 5 May 2003 09:22:20 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id D5A7D5E10E for <idr@merit.edu>; Mon, 5 May 2003 09:22:19 -0400 (EDT) Received: from juniper.net (garnet.juniper.net [172.17.28.17]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h45DMDu06347; Mon, 5 May 2003 06:22:13 -0700 (PDT) (envelope-from yakov@juniper.net) Message-Id: <200305051322.h45DMDu06347@merlot.juniper.net> To: Shankar Vemulapalli <svemulap@cisco.com> Cc: idr@merit.edu Subject: Re: draft-ietf-idr-bgp4-20.txt In-Reply-To: Your message of "Sat, 03 May 2003 15:06:46 PDT." <Pine.GSO.4.53.0305031457570.3207@sj-cse-138.cisco.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <60260.1052140933.1@juniper.net> Date: Mon, 05 May 2003 06:22:13 -0700 From: Yakov Rekhter <yakov@juniper.net> Sender: owner-idr@merit.edu Precedence: bulk Shankar, > Hi - > > Not sure if this is already pointed out - > > In draft-ietf-idr-bgp4-20.txt - page 14 > [RFC2842] defines the Capabilities Optional Parameter > and on page 85 - > [RFC2842] R. Chandra, J. Scudder, "Capabilities Advertisement with > BGP-4", RFC2842. > > should be changed to newer RFC - RFC3392 - to reflect the latest info. Agreed. Yakov. Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id SAA05271 for <idr-archive@nic.merit.edu>; Sat, 3 May 2003 18:07:42 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id CF16991330; Sat, 3 May 2003 18:07:06 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 963EC9132E; Sat, 3 May 2003 18:07:06 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id E74D591326 for <idr@trapdoor.merit.edu>; Sat, 3 May 2003 18:06:59 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 906F35E586; Sat, 3 May 2003 18:06:59 -0400 (EDT) Delivered-To: idr@merit.edu Received: from fire.cisco.com (firebird.cisco.com [171.68.227.73]) by segue.merit.edu (Postfix) with ESMTP id 474235E4AC for <idr@merit.edu>; Sat, 3 May 2003 18:06:47 -0400 (EDT) Received: from sj-cse-138.cisco.com (sj-cse-138.cisco.com [171.69.98.126]) by fire.cisco.com (8.11.6+Sun/8.8.8) with ESMTP id h43M6k003556 for <idr@merit.edu>; Sat, 3 May 2003 15:06:46 -0700 (PDT) Date: Sat, 3 May 2003 15:06:46 -0700 (PDT) From: Shankar Vemulapalli <svemulap@cisco.com> To: idr@merit.edu Subject: draft-ietf-idr-bgp4-20.txt Message-ID: <Pine.GSO.4.53.0305031457570.3207@sj-cse-138.cisco.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-idr@merit.edu Precedence: bulk Hi - Not sure if this is already pointed out - In draft-ietf-idr-bgp4-20.txt - page 14 [RFC2842] defines the Capabilities Optional Parameter and on page 85 - [RFC2842] R. Chandra, J. Scudder, "Capabilities Advertisement with BGP-4", RFC2842. should be changed to newer RFC - RFC3392 - to reflect the latest info. Thanks, /Shankar Received: from trapdoor.merit.edu (postfix@trapdoor.merit.edu [198.108.1.26]) by nic.merit.edu (8.9.3/8.9.1) with ESMTP id EAA12707 for <idr-archive@nic.merit.edu>; Sat, 3 May 2003 04:06:02 -0400 (EDT) Received: by trapdoor.merit.edu (Postfix) id 4DAC991316; Sat, 3 May 2003 04:05:45 -0400 (EDT) Delivered-To: idr-outgoing@trapdoor.merit.edu Received: by trapdoor.merit.edu (Postfix, from userid 56) id 0272C91317; Sat, 3 May 2003 04:05:44 -0400 (EDT) Delivered-To: idr@trapdoor.merit.edu Received: from segue.merit.edu (segue.merit.edu [198.108.1.41]) by trapdoor.merit.edu (Postfix) with ESMTP id 50D0E91316 for <idr@trapdoor.merit.edu>; Sat, 3 May 2003 04:05:43 -0400 (EDT) Received: by segue.merit.edu (Postfix) id 3D1C85E398; Sat, 3 May 2003 04:05:43 -0400 (EDT) Delivered-To: idr@merit.edu Received: from merlot.juniper.net (natint.juniper.net [207.17.136.129]) by segue.merit.edu (Postfix) with ESMTP id A80775DFE5 for <idr@merit.edu>; Sat, 3 May 2003 04:05:42 -0400 (EDT) Received: from roque-bsd.juniper.net (roque-bsd.juniper.net [172.17.12.183]) by merlot.juniper.net (8.11.3/8.11.3) with ESMTP id h4385Ku05753; Sat, 3 May 2003 01:05:20 -0700 (PDT) (envelope-from roque@juniper.net) Received: (from roque@localhost) by roque-bsd.juniper.net (8.11.6/8.9.3) id h4385Kd51107; Sat, 3 May 2003 01:05:20 -0700 (PDT) (envelope-from roque) Date: Sat, 3 May 2003 01:05:20 -0700 (PDT) Message-Id: <200305030805.h4385Kd51107@roque-bsd.juniper.net> From: Pedro Roque Marques <roque@juniper.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: Alex Zinin <zinin@psg.com> Cc: ppvpn@nortelnetworks.com, idr@merit.edu Subject: On BGP and VPLS In-Reply-To: <51133594448.20030502191439@psg.com> References: <51133594448.20030502191439@psg.com> X-Mailer: VM 6.34 under 19.16 "Lille" XEmacs Lucid Sender: owner-idr@merit.edu Precedence: bulk [crossposted to idr WG mailing list] Alex Zinin writes: > More specifically, below I tried to put together a list of concerns > I have about the approach described in draft-kompella-ppvpn-vpls, > that I would like the WG to consider. > 1. Use of the NLRI field > As an IP routing protocol, BGP uses the NLRI field to carry IP > reachability information in the form of IP prefixes. Prefixes within > the NLRI field are used for two main purposes in BGP: a) as the > destination/mask pair in the routes installed by BGP in the routing > table, and b) as a handle to an entry in the BGP RIBs. > The document in subject changes the semantics of the NLRI field > quite substantially even when compared to 2547. First, all of its IP > prefix-related properties are lost. There is no more IP routing, or > any addressing information in it. Second, the notion of TLVs is > introduced inside this field, which a) is not needed in BGP as an IP > routing protocol, and b) because of its variable length property > changes the nature of the NLRI contents, i.e., it's being a stable > handle in the BGP database. To solve these problems the > implementations would need to use only a part of the contents of the > NLRI field as the handle used to index within the RIBs, and store > the rest as attributes. Alex, This point seems to be predicated in the statement that "BGP uses the NLRI field to carry IP reachability"... It opens up a sort of philosophical discussion on BGP. This is of course a highly subjective topic which is hard to quantify or to prove by logical terms. Allow me to present my personal view. BGP is a particular implementation of an algorithm that performs non looping database flooding distribution. That algorithm consists mostly on the path vector (used both in ebgp and route reflection) plus route advertisement rules. This is the publicly specified part of the beast. However that ends up being about 10% of the database exchange algorithm. Each implementation uses distinct algorithms to do the real heavy lifting: the advertisement of database updates to its peers, given that each peer is allowed to flow control and that the ammount of information to be distributed is typically non-trivial compared to the resources of the system. None of the functions above actually do depend on the format of your database records. As long as there is a primary key associated with each record. Modern implementations, given that they are required to handle 3/4 different types of records w/ different keys (ipv4, ipv6, 2547, 2547-for-ipv6, etc) will tend to treat these keys just as database systems do: as a bit string without any semantics associated w/ it. Note also that the number of distinct tables exchanged in a 2547 implementation may be in the thousands. So segregation of which record belongs to which table is necessarily a solved problem in practice. There is one part of BGP that however interacts w/ the semantics of the particular database being exchanged: route selection from the Loc-RIB. The Loc-RIB is by definition where BGP interacts w/ remaining users of the database and it includes rules that are system specific. As an exercise, if we take the existing spec and do: s/route/record/g s/IP prefix/key/g Do we still have a document that makes sense.. ? Except for the vague bits about aggregation, about which BGP itself does little about, i would contend that the result would be pretty much the same. 2547 which you cite is a particular good example, imho. A 2547 NLRI ends up being used to create IP reachability information, but while it is a safi 128 record, it is not IP reachability and it is not treated as such. > 2. Distribution of information > When used as an IP routing protocol, BGP distributes routes among > all participating routers. Each router (PE or P using VPN > terminology) is interested in _all_ routes received from its peers; > it selects the best path for each prefix if multiple are available > and installs it in it's routing table; the best paths are propagated > further to other peers. That is not the case w/ 2547. PE routers typically have interest in only a subset of the routing information. They tend to do inbound filtering in current network deployements but one can also do outbound filtering in the RRs via either extended-community ORF or subsequent improvements to ORF (draft-marques-ppvpn-rt-contrain). P routers do not carry 2547 routing information. > The way BGP is used in the document results in a situation where > information relevant only to a subset of routers (e.g. PW-specific, > or VPLS-specific info) is sent to all PEs participating in the BGP > domain. More than that, P routers, usually used as route reflectors > in IP routing, end up storing all information while they are not > using any of it locally. RR in VPN deployments are typically not in the topology. My understanding of the P-router term is that it is a transit node that does not have VPN information. > Note also, that best path selection that is normally performed by > BGP when it receives information about the same prefix from multiple > peers, is not needed in the VPLS case, and (even if implementations > decided to apply the same algo as in regular BGP) would just be an > artifact. Not really... i can advertise the same key from multiple sources in L2VPNs also. All policy mechanisms do work... igp distance, etc. It is just the semantics once the path is selected that are different. As an example think working and protect PE for a given emulated circuit (or lan). > The above exposes the difference between the routing nature of > BGP when used for IP (where reachability info is distributed and the > path properties are as important as the info itself), and its purely > transport application in the proposal (where only the fact of > information delivery is important.) > Interestingly enough, from the transport perspective, BGP, though > reduces the number of sessions a given PE has to maintain (and thus > the sender's complexity), introduces additional overhead from the > receiver's perspective--if a PE router has multiple BGP sessions > (which is normally the case), it will receive the same information > more than once, while clearly a single copy is enough. I don't know which model you have in mind but in a typical VPN deployment scenario (l3 or l2/vpls/etc) a PE has 2 peering sessions to a RR outside of the topology. The second copy of the information is there for redudancy... If a full mesh where used, only 1 copy would be present. > 3. Aggregation of information for large-scale operation > When distributing information among a large number of systems, it > is important to be able to aggregate information as it travels > further ahead to ensure scalability of the system. In routing this > is achieved by summarizing a set of prefixes and announcing them as > a less specific prefix. For example, AS'es in the Internet do not > exchange granular IP prefixes visible inside IGPs, but instead send > each other aggregate prefixes via BGP. > It is not clear to me how, given the format of the NLRI field, > VPLS information can be aggregated using the proposal in the > document. To give you an example, in JunOS aggregation is implemented as a separate routing protocol... if i'm not mistaken the model is lifted from 'gated'. Clearly the idea that aggregation may be a distinct component from BGP has been around for a while. VPLS doesn't really need aggregation although it does use an IGP :-) PE to PE connectivity is performed indepently from the 'forwarding distinguisher' advertisement (i.e. the inner label). Any or multiple routing and singaling protocols may be used for this functionality. Only the information exterior to the SP network (service attachements) is carried through BGP. > The above gives me a very uncomfortable feeling that the proposal > is stretching BGP to perform functions it was not designed for. Any succesful protocol will be used for means other than it was designed for. That is usually a sign that the designers got something right. Let me give you an example: BGP is currently used to block spam propaggating networks/hosts. What this an original goal of the design ? Hardly. When used to block spam BGP does not advertise any valid forwarding information for instance. And i'm sure it is a question of time until we add port information to the record keys. > Below are some additional points that should be taken into > consideration. > 4. Backwards compatibility and SW upgrade requirements > Because the proposal suggests using a new AFI/SAFI combination, > PE routers will not be able to announce VPLS information using the > existing BGP infrastructure. All BGP speakers in a SP's network, > including the P routers, will have to be upgraded with new SW, > though information needs to be exchanged only among the PEs. That is not an issue as we've seen above. The deployment model is different from what you assume. > 5. Coupling of VPLS and BGP SW > Putting VPLS-related functions in BGP leads to two unwanted > consequences: > a) Lesser BGP code stability--bugs in the VPLS part of the code > will likely affect parts of BGP used for Internet routing, thus > increasing the chances of BGP failures in SP networks. The same > argument works in the opposite direction. You have no basis to conclude that. Any modern BGP implementation worth its salt consists of AF-independent code + AF-specific code. The fact is that you can implement VPLS without touching the AF-independent code. Any line of code change that you make to an implementation as the potential to introduce bugs... > b) Potential dynamic effects--since with a BGP-based approach, > VPLS- and routing-related processes are likely to share the same > internal router resources (such as timers, threads, locks/mutex'es, > queues, memory pools), dynamics of the VPLS system are likely to > influence the dynamics of the routing- related functions and vice > versa. The larger the overlap between the two systems, the higher > are the chances of such interference. I'm sorry but this is just FUD. All router implementations do have some level of resource sharing between completly unrelated features. In some of them, all functionality shares all resources. Don't want BGP sharing timers w/ X.25-over-TCP... disable one of them. > My recommendation would be for the WG to consider these points. The way i see it there is an high likely-hood of this turning into an "Yes, it is" "No, it isn't" discussion. And I'd really like to avoid that. A question to you and to the WG(s) in general: - What are the main concerns that you have w/ the generic database exchange view of BGP (Lets call it the "Basically General Purpose" theory). - Can we have a reasonable discussion about the best engineering approach to provide database exchange services for routing-related-applications without getting into a religious argument about "2547 is evil" ? i.e. can we try to separate how highly each one of us rates the actual application from this discussion ? - I believe one of the preconditions for a resonable discussion is to realise that implementors are the most interested people in not introducing regressions to shipping code. They actually get to fix it after being screamed at for a considerable lenght of time. I'd really like to get past the "you can't implement a feature i don't want because your are going to break the code" kind of discussion. - Are we going to have a similiar discussion about LDP ? LDP is not any less relevant for network stability nor a protocol which is any simpler than BGP (if anything the level of complexity is higher given that LDP has all the db exchange problem of BGP + a non trivial ammount of issues of its own). regards, Pedro.
- IDR Agenda Items for Vienna Yakov Rekhter