[bess] John Scudder's Discuss on draft-ietf-bess-evpn-fast-df-recovery-10: (with DISCUSS and COMMENT)
John Scudder via Datatracker <noreply@ietf.org> Tue, 20 August 2024 00:12 UTC
Return-Path: <noreply@ietf.org>
X-Original-To: bess@ietf.org
Delivered-To: bess@ietfa.amsl.com
Received: from [10.244.2.52] (unknown [104.131.183.230]) by ietfa.amsl.com (Postfix) with ESMTP id 0B120C1CAF32; Mon, 19 Aug 2024 17:12:31 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: John Scudder via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
X-Test-IDTracker: no
X-IETF-IDTracker: 12.22.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <172411275169.1985200.11309811863338610562@dt-datatracker-6df4c9dcf5-t2x2k>
Date: Mon, 19 Aug 2024 17:12:31 -0700
Message-ID-Hash: 3LIXLXXUTBOFWFQETOCLJMRFRQSWLI6E
X-Message-ID-Hash: 3LIXLXXUTBOFWFQETOCLJMRFRQSWLI6E
X-MailFrom: noreply@ietf.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-bess.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: draft-ietf-bess-evpn-fast-df-recovery@ietf.org, bess-chairs@ietf.org, bess@ietf.org, matthew.bocci@nokia.com
X-Mailman-Version: 3.3.9rc4
Reply-To: John Scudder <jgs@juniper.net>
Subject: [bess] John Scudder's Discuss on draft-ietf-bess-evpn-fast-df-recovery-10: (with DISCUSS and COMMENT)
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/h0is4D4nAsqKrcxC7cghj9O3Vk0>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Owner: <mailto:bess-owner@ietf.org>
List-Post: <mailto:bess@ietf.org>
List-Subscribe: <mailto:bess-join@ietf.org>
List-Unsubscribe: <mailto:bess-leave@ietf.org>
John Scudder has entered the following ballot position for draft-ietf-bess-evpn-fast-df-recovery-10: Discuss When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-fast-df-recovery/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- # John Scudder, RTG AD, comments for draft-ietf-bess-evpn-fast-df-recovery-10 CC @jgscudder Thanks for this document. I appreciate the fact it was straightforward to read and review, and of course, I appreciate the work on a useful protocol extension! ## DISCUSS I have two concerns I'd like to discuss. ### "Unreasonable" SCT offsets Apart from a few words in the Security Considerations section, you don't provide any instructions to the implementer about what to do with unreasonable SCT offsets, in particular unreasonable SCT future offsets. The language from Section 5 is, It is left to implementations to decide what consists an "unreasonably large" SCT value. The quotation marks imply that you think the implementor should do something about unreasonably large SCT values, but the spec doesn't supply any clarity about it. That's too bad, it should. I accept the argument in the preceding Security Considerations sentences (not quoted) that the ability to signal unreasonable SCT values doesn't introduce a new vulnerability vis-à-vis the pre-existing specifications. Nonetheless, it does seem like this specification should advise implementers to impose an upper limit on what SCT future offset they will accept. The quoted sentence is a backhanded hint in that direction. I think you should make it explicit, probably in Section 2 since that seems to be where elements of procedure mostly go (see also my later remarks about this in the COMMENTS section). One way to address this would be to insert something into Section 2.2, along the lines of, - Initialize SCT_delay to 0. - If an SCT timestamp is present (etc etc), then, - If the SCT timestamp is further in the future than max_SCT_delay, SCT_delay = max_SCT_delay. - If the SCT timestamp is less than SCT_skew time units in the future, SCT_delay = 0. (This includes the case where it is in the past.) - Else SCT_delay = (SCT timestamp value) - (present time) - SCT_skew and then write 9.1 in terms of "wait SCT_delay before proceeding to 9.2". This is crudely written, I don't expect you to use the text above, although you're welcome to if you want. I'm just trying to demonstrate one way to close the gap. In addition to inventing SCT_delay and max_SCT_delay, I've incorporated skew into the above (and renamed it SCT_skew). An underspecification of skew is another, lesser, concern I had (see COMMENTS) and this seemed the natural place to work it in. ### Assumptions about synchronized clocks Although the document assumes synchronized clocks, it doesn't provide any details about its assumptions, e.g. how close synchronization needs to be and what the consequences are if the requirement isn't met. Normally, specifications assuming synchronized clocks will provide at least some basic analysis. I think probably if you fix the max_SCT_delay thing, the analysis would be easy because it would in effect bound the maximum effect of clock desynchronization, and I bet the analysis would amount to "if you don't have badly desynchronized clocks, it acts like RFC7432." (But this is just a guess, you are the experts.) If you choose to add a short analysis, I could see it being a bullet in Section 1.4. ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- ## COMMENTS ### Section 1.4, this sentence no verb * The fast DF recovery solution maintains backwards-compatibility (see Section 4) by ensuring that PEs any unrecognized new BGP Extended Community. The second clause should have a verb, but it doesn't. I'd propose a rewrite if I knew what you were trying to say. ### Section 1.4, "normalizes" * The fast DF recovery solution is agnostic of the actual time synchronization mechanism used, and normalizes to NTP for EVPN signalling only. I don't understand what you mean by this, in particular by "normalizes to NTP for EVPN signalling only". Can you rephrase it? (Also, BTW, "signaling" has only one ell.) ### Sections 2 and 3 aren't well-divided Section 2 is called "DF Election Synchronization Solution". Based on that, I would assume that the entire solution would be specified in this section and its subsections. Section 3 is called "Synchronization Scenarios". Based on that, I would assume it contains examples, intended to help the reader understand the operation of the solution. Unfortunately, that's not entirely the case. Section 3 contains instances of normative, or should-be-normative, text. I think you should restructure these two sections to put the entire solution in Section 2 and keep Section 3 as examples. By the way, I did find the examples very useful, thank you for including them. I've flagged suggestions for restructuring in other more specific comments. For that matter, there's also normative text in Section 5: "the receiving PE SHALL treat the DF Election at the peer as having occurred already, and proceed without starting any timer to futher delay service carving". That's not a consideration, that's a requirement, and I think it should also go into Section 2. The approach suggested in my DISCUSS would fix this issue. (Also s/futher/further/.) ### Section 2, several things about skew The receiving partner PEs add a skew (default = -10ms) to the Service Carving Time to enforce this mechanism. The previously inserted PE(s) must perform service carving first, followed shortly by the newly insterted PE, after the specified skew delay. The first sentence says that the default is -10 ms, so the receiving partner adds -10 ms, or more simply put, the receiving partner subtracts 10 ms. I think the whole quote holds together but it’s unnecessarily hard to follow. Perhaps something as basic as, NEW: The receiving partner PEs subtract a skew (default = 10ms) from the Service Carving Time to enforce this mechanism. The previously inserted PE(s) must perform service carving first, followed shortly by the newly inserted PE, after the specified skew delay. Note I also made the correction s/insterted/inserted/ Of somewhat greater concern, I don't think Section 2 is precise enough about when and how the skew is applied. It's possible to work it out by referring to the example in Section 3, but it shouldn't be necessary to do this to glean what an implementation MUST do. The approach suggested in my DISCUSS would probably make the problem go away. ### Section 2.1, that's not 8 octets ``` A new BGP extended community is defined to communicate the Service Carving Timestamp for each Ethernet Segment. A new transitive extended community where the Type field is 0x06, and the Sub-Type is 0x0F is advertised along with the Ethernet Segment route. The expected Service Carving Time is encoded as an 8-octet value as follows: 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ Timestamp Seconds | Timestamp Fractional Seconds | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ``` Did you mean six octets? Because that’s what the diagram shows... unless you're calling 0x060F part of the service carving time, which doesn't make much sense. I don't know, maybe you meant that it's an eight-octet value encoded in six octets, but you explain that just fine in the text that follows. I think you'd be better off simplifying the introductory text to something like this, NEW: A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is defined to communicate the Service Carving Timestamp for each Ethernet Segment: 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ Timestamp Seconds | Timestamp Fractional Seconds | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note I also elided "new" because fairly soon after publication as an RFC, the community will no longer be "new". Then you let the following text supply the additional detail that I've removed from my NEW text. ### Section 2.1, "outside of the scope of this document" “A DF Election operation occurring exactly at the Era transition boundary some time in 2036 is outside of the scope of this document.” That’s fine as long as the consequences of such an incident wouldn’t be serious. Have you analyzed this? A few words to this effect would be desirable; the rollover is less than twelve years in the future and your specification is likely to still be in use then. I bet that, again, the change suggested in my DISCUSS would make this easy to cover because even a naïve implementation would treat era rollover as nothing worse than severe clock skew, and the window of vulnerability would be bounded to max_SCT_delay. I'd still encourage you to say a few words about it, to console any readers doing a Y2036 review sometime in the future. ### Section 3.1, concurrent elections In the case of multiple concurrent DF elections, each initiated by one of the recovering PEs, the SCTs must be ordered chronologically. All PEs shall execute only a single DF Election at the service carving time corresponding to the largest (latest) received timestamp value. This DF Election will involve all active PEs in a unified DF Election update. This looks and feels exactly like normative specification, which makes me think it belongs in Section 2. I also wonder if you want SHALL instead of "shall". ## Notes This review is in the ["IETF Comments" Markdown format][ICMF], You can use the [`ietf-comments` tool][ICT] to automatically convert this review into individual GitHub issues. [ICMF]: https://github.com/mnot/ietf-comments/blob/main/format.md [ICT]: https://github.com/mnot/ietf-comments
- [bess] John Scudder's Discuss on draft-ietf-bess-… John Scudder via Datatracker
- [bess] Re: John Scudder's Discuss on draft-ietf-b… John Scudder