[RTG-DIR]Re: Rtgdir last call review of draft-ietf-lsvr-bgp-spf-31

Acee Lindem <acee.ietf@gmail.com> Thu, 25 July 2024 21:41 UTC

Return-Path: <acee.ietf@gmail.com>
X-Original-To: rtg-dir@ietfa.amsl.com
Delivered-To: rtg-dir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 684B3C14F70A; Thu, 25 Jul 2024 14:41:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o4m_cutFTQi8; Thu, 25 Jul 2024 14:41:54 -0700 (PDT)
Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 62BCDC14E515; Thu, 25 Jul 2024 14:41:49 -0700 (PDT)
Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2cd5d6b2581so238155a91.2; Thu, 25 Jul 2024 14:41:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721943708; x=1722548508; darn=ietf.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=UuBvjHV+w5O6uIKZpritHjUFrCVjcw6skErhhEG6xCU=; b=I6NWk1jom8KNS2ACkKzYTiM5Zph5rfLXso0G6s6caycQwDeFo99KYKVkBRfOV39htd UnVohM3L0f7zNfjKAN3486kM1HP+0L0Mm+Yb6MC4ATT28HpHKy16wMQV9qAD2iIQqJCj SMbRddGnNYPUJzNrMqUS+bvQmRfdsXe9Pi5B3NX5QdJFWNGVhyIPj/Y5iAZ6DtCAFwDy OvbySvW9zSrnYT/8iMJ2pIh+i9/rOtSj/yKVfX8lGnryCLL/NC51pgAKvKqR+pHUslSi 77HCCfSA0odHoMoQY7R9uF/wHHYg1/NQBXzdeAz7GWnERlVHzZcKwPfO+4Ye1FLDR3ib wHOQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721943708; x=1722548508; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UuBvjHV+w5O6uIKZpritHjUFrCVjcw6skErhhEG6xCU=; b=GI3r1ML0H12+ee1e7LAqv8kgmUy1vMzMvQVQsHnFVAM8FjLMFK7zbFWiMpUd0mBv8r N9Jak80iduhV4KQ3azUlU8ZRG7o6tqEgRjVIw0XHJITatajHFSm/UPwHqge0df7zoayQ tDQM5v5haxA1rXD95pBX5LgWPZyLSyPx7qOfV0Twpdi5fT/bkC+qtLrwrpVhKeZLzrI2 9SvVpNtDgHxKTcvY+/z2GsT5OoCQbEe9UPk/fzPjkxLxSHH+dkhwFl1qVgwyWg2ellly NuQnZUb17bUoTxo22Z0S7duzCSkPT2Stlb70n4xGXDRVPQbH5qXgFqQcHvaHLAuBKm40 RuNQ==
X-Forwarded-Encrypted: i=1; AJvYcCUW5mRXIX/52tHSIOAgUQ7sLhNMvNHjKVXIQu2u+jfb05O5Vr4TTk+hF5eGlCoRS0VO8N74iAMyLUwHCZmJkXUIORyzK5s4UARC0p8VaHyGmBcollONPPfx0UPKQX/6XdJSYlKEeaWk72AbJV4wE/CFOKZ2h2t4gDrIGxpkhUQ9Nlft0HKaupkJQgiM
X-Gm-Message-State: AOJu0YzYt8RQxyBy1ajM4UBuHQKdoKmBx6try66cXPvczchcO/UlW+Da M+7vMeId61L0RLTnZtIoBbUiTCs2JFNR9Ykb5WIhYtBu5vvlr9JerEsiPQ==
X-Google-Smtp-Source: AGHT+IGbGCNpJ8n3sCA7XXMNafTRnj53DnhRQry9cdyqTVylylB+n/3VmC8sUMZ8Pm/RqISZAXL/2Q==
X-Received: by 2002:a17:90a:f48f:b0:2cd:b938:770b with SMTP id 98e67ed59e1d1-2cf2e9b30a2mr3573470a91.5.1721943708042; Thu, 25 Jul 2024 14:41:48 -0700 (PDT)
Received: from smtpclient.apple ([2001:67c:1232:144:d4cb:9c99:fe19:839a]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2cdb73ddb7dsm4023500a91.24.2024.07.25.14.41.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Jul 2024 14:41:47 -0700 (PDT)
From: Acee Lindem <acee.ietf@gmail.com>
Message-Id: <1E7025AE-B4E6-41A9-9A46-A048262DB2F7@gmail.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_2AF34C07-A32D-49C9-832A-9542F710470D"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\))
Date: Thu, 25 Jul 2024 14:41:36 -0700
In-Reply-To: <CAH6gdPxuOZNeNmCGCKL0X1Cz06nLMR503V_hDu51cN3gLuPpfQ@mail.gmail.com>
To: Ketan Talaulikar <ketant.ietf@gmail.com>
References: <172021249217.1441177.16399497544421034270@dt-datatracker-5f88556585-g8gwj> <CAH6gdPxuOZNeNmCGCKL0X1Cz06nLMR503V_hDu51cN3gLuPpfQ@mail.gmail.com>
X-Mailer: Apple Mail (2.3774.600.62)
Message-ID-Hash: OPBK7FXP2R7E3K7ZVTS42ILZSGL4F4Q2
X-Message-ID-Hash: OPBK7FXP2R7E3K7ZVTS42ILZSGL4F4Q2
X-MailFrom: acee.ietf@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-rtg-dir.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Adrian Farrel <adrian@olddog.co.uk>, Routing Directorate <rtg-dir@ietf.org>, draft-ietf-lsvr-bgp-spf.all@ietf.org, lsvr@ietf.org, lsvr-chairs@ietf.org
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [RTG-DIR]Re: Rtgdir last call review of draft-ietf-lsvr-bgp-spf-31
List-Id: Routing Area Directorate <rtg-dir.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/9NJR1Jl80rOtbXk48iw9yTTBC9c>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-dir>
List-Help: <mailto:rtg-dir-request@ietf.org?subject=help>
List-Owner: <mailto:rtg-dir-owner@ietf.org>
List-Post: <mailto:rtg-dir@ietf.org>
List-Subscribe: <mailto:rtg-dir-join@ietf.org>
List-Unsubscribe: <mailto:rtg-dir-leave@ietf.org>

Hi Ketan, Adrian, 

> On Jul 11, 2024, at 01:03, Ketan Talaulikar <ketant.ietf@gmail.com> wrote:
> 
> Hi Adrian,
> 
> Thanks for your review. Request the authors to respond and for the WG members to also chime in.
> 
> Thanks,
> Ketan
> 
> 
> On Sat, Jul 6, 2024 at 2:18 AM Adrian Farrel via Datatracker <noreply@ietf.org <mailto:noreply@ietf.org>> wrote:
>> Reviewer: Adrian Farrel
>> Review result: Not Ready
>> 
>> Hello
>> 
>> I have been selected to do a routing directorate "early" review of this draft.
>> https://datatracker.ietf.org/doc/draft-ietf-lsvr-bgp-spf/
>> 
>> The routing directorate will, on request from the working group chair, perform
>> an "early" review of a draft before it is submitted for publication to the
>> IESG. The early review can be performed at any time during the draft’s lifetime
>> as a working group document. The purpose of the early review depends on the
>> stage that the document has reached.
>> 
>> This document appears to have completed WG last call, been passed to the AD for
>> publication, received several Directorate reviews (at around revision -28), and
>> had a review by the AD. It seems to have been returned to the WG for further
>> work. Thus, this review comes before the document is returned to the AD for
>> publication. In that respect, the review is for consideration by the authors
>> and working group rather than for attention of the AD.
>> 
>> For more information about the Routing Directorate, please see
>> https://wiki.ietf.org/en/group/rtg/RtgDir
>> 
>> Document: draft-ietf-lsvr-bgp-spf-31.txt
>> Reviewer: Adrian Farrel
>> Review Date: date
>> Intended Status: Standards Track
>> 
>> Summary:
>> 
>> I have concerns about this document. I think it needs more work before being
>> submitted to the IESG.
>> 
>> Comments:
>> 
>> This draft is well written from a technical perspective. I believe it
>> could be implemented with a high probability of interoperability modulo
>> the issues noted below.
>> 
>> As noted in the comments below, some attention to the explanation in the
>> Introduction and the overview of the function of the protocol might be
>> helpful, and a little work on the English usage could be beneficial.
>> 
>> The document describes the use of BGP as an interior gateway protocol
>> for SPF calculations in a network environment that is assumed to be
>> highly stable and which might have a large degree of ECMP. The approach
>> builds on th e information elements developed for BGP-LS, but is not a
>> re-use of BGP-LS as a new SAFI is used.
>> 
>> I have a number of comments which I believe should be discussed before
>> this document progresses to publication.
>> 
>> = Major
>> 
>> While BGP-LS was originally invented as an East-West protocol, that idea
>> was discarded because of severe scaling and stability concerns and RFC
>> 7752 was shaped as a Northbound protocol allowing a BGP speaker in the
>> network to report to a controller, management station of route reflector
>> (and subject to filters, aggregation, and policy) the full link-state
>> database of the network.
>> 
>> This work obviously decides to change that approach and to use BGP-LS
>> with additional information to distribute the full set of routing
>> information between all nodes in the network.
>> 
>> I think that this document needs to more clearly set out this change in
>> philosophy, and (more importantly) discuss the assumptions of network
>> scale and stability that are necessary for this protocol to be used.
>> While a data centre environment may be adequately stable, this
>> definition of a protocol usage would be very unwise in a WAN. All you
>> have in this respect is a statement at the top of section 4 that
>> implies that there are choices based on the network scale, and an
>> observation in the Introduction that the solution would be beneficial
>> if the topology is very stable.
>> 
>> The minimum needed for this is an upfront statement along the lines of
>>    The protocol described in this document is intended for deployments
>>    where the network topology is very stable so that topology update
>>    advertisements will be rare. It's use in other networks, and in
>>    particular in the Internet, is NOT RECOMMENDED because the protocol
>>    might not be stable or scale well under high load caused by rapid or
>>    frequent changes in the topology.
>> (Your choice of words may vary.)
>> 
>> It might even help to not call this BGP-LS-SPF but BGP-DC-SPF or
>> something similar. Since it is a new SAFI, there is no need to persist
>> with the BGP-LS name and that might help make it clear what the intended
>> or presumed deployment scenarios are.
>> 
>> Of course, the wrinkle comes in 5.2.2.1 where you want to define a new
>> TLV that only has use in BGP SPF, but you want to assign it from the
>> BGP-LS registry. Have you thought of looking for a way to keep the SPF
>> information separate? Or would you expect a regular BGP-LS speaker to
>> also share this new TLV if it knows the information?


We can discuss this in today’s meeting. Perhaps, we should remove the reference to “very stable environments”. I’d expect BGP SPF to perform better than RFC 7938.



>> 
>> = Medium
>> 
>> I find the reference to draft-psarkar-lsvr-bgp-spf-impl unsatisfactory.
>> Not only did that draft expire a good while ago, but this draft has been
>> updated six times since that implementation statement was posted. Have
>> there been no changes to the technical content in all those revisions?
>> 
>> The implementation status covers a bit more than 1.5 implementations.
>> It's good information, but is it enough for changes to BGP?

I’m not sure why you’d consider this a medium comment. The reference is to be removed before publication. I’ll go ahead and remove it now. 



>> 
>> ---
>> 
>> The text at the top of section 4 says that there are deployment choices
>> based on the network scale, but gives no advice (or pointers to advice)
>> on how to select between deployment options, how to ensure that the same
>> option is used by all nodes in the network, and what to do if two peers
>> discover that they are using different options.
>> 
>> ---
>> 
>> In the RR mode of section 4.3, it is not clear what information the RR
>> readvertises and to which nodes.

Possibly we’ll remove this deployment model for now. 



>> 
>> ---
>> 
>> 8.2 names a registry that does not exist (with the given name).
>> Please don't leave IANA guessing.


Fixed. 


>> 
>> ---
>> 
>> 5.2.1.1, 5.2.2.2, and 5.2.3.1 describe "SPF Status" fields of
>> the various TLVs.

SPF Status is a TLV. 


>> 
>> It might help if the fields had different names to help not confuse
>> anything (e.g., Node SPF Status).


There is a single SPF Status TLV used for different NLRI. 


>> 
>> Adding in the Address Family field from 5.2.2.1, it is unclear whether
>> you intend the "undefined" settings to be available for use in future
>> specifications.
>>   - If you do, then you probably need to create registries, and while
>>     you describe how to handle the receipt of undefined statuses, you
>>     are missing a description of how to handle undefined address
>>     families. (Note, however, that in describing the handling of
>>     undefined status you say they SHOULD be readvertised. This means
>>     that there can be holes and discrepencies in the databases that
>>     different nodes see if new bits are defined in the future. So you
>>     probably need to make this MUST.)
>>   - If you don't then it is interesting to ask why you need to specify
>>     a difference between "reserved" and "undefined", and why you need
>>     more than a single bit flag (in fact, in the case of 5.2.2.2 and
>>     5.2.3.1 it would appear that the very presence of the TLV is enough
>>     and no status field is needed).


Fixed. 


>> 
>> ---
>> 
>> 5.2.2.1 has
>> 
>>    the Address Family (AF) Link
>>    Descriptor SHOULD be included with the Link Local/Remote Identifiers
>>    TLV so that the link can be used in the respective address family
>>    SPF.  If the Address Family Link Descriptor is not present for an
>>    unnumbered link, the link will not be used in the SPF computation for
>>    either address family.
>> 
>> The use of SHOULD means that an implementation MAY leave it out. Now, in
>> the case of an unnumbered link, it is clear that that is fine,

Actually, it is the opposite. It is needed for unnumbered links. 

>> but in
>> the case of a numbered link, leaving the link descriptor out would seem
>> to make the link advertisement a bit pointless. So why might an
>> implementation make that choice?
>> 
>> ---
>> 
>> In 5.2.2.2
>>    If a BGP SPF speaker received the Link NLRI but the SPF Status TLV is
>>    not received, then any previously received information is considered
>>    as implicitly withdrawn and the update is propagated to other BGP SPF
>>    speakers.
>> 
>> "is not received" in what time scale?

This applies to SPF status information only - so it is any pending SPF status information. 




>> 
>> Ditto 5.2.3.1

Same as above.


>> 
>> ---
>> 
>> 9.
>> 
>> Given that this protocol is targetted at stable environments, one might
>> consider that making a part of the network fragile could destabilise the
>> whole. So there is a physical attack of causing a link to flap that
>> could result in many updates being flooded through the network. Is that
>> a realistic attack? How can it be mitigated?

BGP SPF is no more fragile than BGP. 



>> 
>> = Minor
>> 
>> There is a "MAY" in 4.1 that is a bit odd. "MAY be expected..." Does
>> that mean that an implementation is allowed to expect, but that is
>> exceptional behaviour? Or does it mean that the condition might happen?
>> Or what? And what is the alternative to the "MAY"?
>> 
>> See also 4.2


Waiting for End-of-RIB is an optional feature. 


>> 
>> ---
>> 
>> 4.2 has
>> 
>>    Hence, this peering model is
>>    RECOMMENDED over the single-hop peering model Section 4.1.
>> 
>> "RECOMMENDED" has the same weight as "SHOULD". So you need to say why a
>> deployment would ever consider option 1.

BGP itself allow single hop or loopback peering. 



>> 
>> ---
>> 
>> Since the value of 266 is only suggested to IANA for the BGP-LS Address
>> Family Link Descriptor TLV in section 8.2, section 5.2.2.1 should really
>> use a "TBD" so that the RFC editor can spot and fix any issues that
>> might arise if IANA assigns a different value.


Fixed. 




>> 
>> ---
>> 
>> Why are the new registries in 8.4, 8.5, and 8.6 created as IETF Review?
>> If you follow the thrust of BGP-LS, then it would be reasonable to make
>> them Expert Review (and to write some advice for the Designated
>> Experts). I *guess* that since you ask for this to be in a new registry
>> group, you are free to set your own rules, but I do wonder.

Fixed. 

>> 
>> (By the way, since you call the new registry group "BGP SPF" this seems
>> to agree with my major point at the top of this review: this document
>> defines BGP-SPF not BGP-LS-SPF.

This registry falls under the category of BGP-LS. 

>> 
>> ---
>> 
>> 6.1
>> 
>>        NLRI originated by directly connected BGP SPF peers are
>>        preferred.
>> 
>> No objection to you setting this rule, but have you considered that NLRI
>> coming from remote peers are likely to convey updates relating to remote
>> resources faster than those that have been forwarded hop-by-hop with
>> each hop performing processing?


You are misinterpreting this rule. In other words, NLRI corresponding to the peer itself will be preferred. I changed this to “self-originated”. 



>> 
>> ---
>> 
>> 7.1 has
>> 
>>    When a BGP SPF speaker receives a BGP Update containing a malformed
>>    Node NLRI SPF Status TLV in the BGP-LS Attribute
>>    [I-D.ietf-idr-rfc7752bis], the corresponding Node NLRI is considered
>>    as malformed and MUST be handled as 'Treat-as-withdraw'.  An
>>    implementation SHOULD log an error (subject to rate-limiting) for
>>    further analysis.
>> 
>> But 5.2.4 has
>> 
>>    If the Sequence-Number TLV is not received, then the corresponding
>>    NLRI is considered as malformed and MUST be handled as 'Treat-as-
>>    withdraw'.  An implementation MAY log an error for further analysis.
>> 
>> SHOULD or MAY?

SHOULD


>> 
>> ---
>> 
>> 7.1
>> 
>>    Node attribute TLVs and their
>>    error handling rules are either defined in [I-D.ietf-idr-rfc7752bis]
>>    or derived from [RFC5305] and [RFC6119].
>> 
>> I suspect that this means that some TLVs are in one source, and some are
>> in another, not that there is a choic!

Please suggest text. 



>> 
>> ---
>> 
>> 7.2 tells us what checks to perform, but not what to do if a check
>> fails. Compare with 7.3.


These checks are defined in RFC 9552 and need not be repeated in this document. 


>> 
>> ---
>> 
>> 7.4
>> 
>> While it will be clear to one skilled in the art what it means to reset
>> a BGP session, a pointer to the procedure would be valuable.


TBD.

>> 
>> ---
>> 
>> Section 9 appears to use lower case rather than upper case BCP 14 terms.
>> Possibly worth resolving.


Fixed. 

>> 
>> ---
>> 
>> Thanks for including section 10, it's important.
>> 
>> 10.1 says that being in a single administrative domain "allows for
>> consistent configuration". This is true, but I don't think it in any way
>> guarantees it. What measures can be taken to ensure consistency?

This is beyond the scope of the document. 




>> 
>> ---
>> 
>> 10.2 makes a RECOMMENDATION. Looks like a wise one. Why would a
>> deployment deviate from this, and what are the consequences?

There may be use cases that go beyond the scope of the document. 


>> 
>> ---
>> 
>> In 10.2 you say...
>> 
>>    For
>>    non-loopback prefixes, the setting of the metric is a local matter
>>    and beyond the scope of this document.
>> 
>> ...and that seems good to me. Why do you then go on to discuss a
>> possible setting of the metric? i suggest deleting the paragraph...
>> 
>>    Algorithms such as setting the metric inversely to the link speed as
>>    supported in some IGP implementations MAY be supported.  However, the
>>    details of how the metric is computed are beyond the scope of this
>>    document.

Why? This is useful. 



>> 
>> ---
>> 
>> 10.2 has a paragraph with three cases of "SHOULD"
>> 
>>    Within a BGP SPF Routing Domain, the IGP metrics for all advertised
>>    links SHOULD be configured or defaulted consistently.  For example,
>>    if a default metric is used for one router's links, then a similar
>>    metric should be used for all router's links.  Similarly, if the link
>>    metric is derived from using the inverse of the link bandwidth on one
>>    router, then this SHOULD be done for all routers and the same
>>    reference bandwidth SHOULD be used to derive the inversely
>>    proportional metric.  Failure to do so will result in incorrect
>>    routing based on link metric.
>> 
>> There is some explanation of the consequences of not following these
>> SHOULDs (although the cascaded SHOULDs make it hard to determine where
>> the choice goes wrong). Since you are allowing variation from this
>> advice, you need to explain the alternative "MAY" : what can be done and
>> why would it be done?
>> 
>> ---
>> 
>> 10.3 has a reasonable RECOMMENDED. But, again, you need to talk about
>> the reasons to vary this and the consequences.


I don’t think this is reasonable - please suggest text. 



>> 
>> ---
>> 
>> 10.4 has two SHOULDs. This allows variation that you need to explain.


These are both SHOULDs. I don’t see that any more explanation necessary. 


Thanks,
Acee



>> 
>> = Nits
>> 
>> The English usage in the document could really do with some work. I know
>> that the RFC Editor is paid to fix this sort of thing, but there is
>> really quite a lot of work needed (more than I felt like dealing with in
>> this review). Perhaps you can get someone from the working group to take
>> a pass on it.
>> 
>> ---
>> 
>> draft-ietf-idr-rfc7752bis is now RFC 9552
>> 
>> 
>>