Re: [Tsv-art] Tsvart early review of draft-ietf-lsvr-l3dl-03

Randy Bush <randy@psg.com> Wed, 06 May 2020 22:41 UTC

Return-Path: <randy@psg.com>
X-Original-To: tsv-art@ietfa.amsl.com
Delivered-To: tsv-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3EFCF3A0DA0; Wed, 6 May 2020 15:41:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hMTgDjicDLF6; Wed, 6 May 2020 15:40:55 -0700 (PDT)
Received: from ran.psg.com (ran.psg.com [IPv6:2001:418:8006::18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 121D33A0D9F; Wed, 6 May 2020 15:40:54 -0700 (PDT)
Received: from localhost ([127.0.0.1] helo=ryuu.rg.net) by ran.psg.com with esmtp (Exim 4.90_1) (envelope-from <randy@psg.com>) id 1jWSii-0005Ru-Nc; Wed, 06 May 2020 22:40:52 +0000
Date: Wed, 06 May 2020 15:40:52 -0700
Message-ID: <m2sggclma3.wl-randy@psg.com>
From: Randy Bush <randy@psg.com>
To: Jörg Ott <ott@in.tum.de>
Cc: tsv-art@ietf.org, draft-ietf-lsvr-l3dl.all@ietf.org, lsvr@ietf.org
In-Reply-To: <158870511665.7532.2079643708622987385@ietfa.amsl.com>
References: <158870511665.7532.2079643708622987385@ietfa.amsl.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/26.3 Mule/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/BKw3AQUvnESqDpweerXhBEqxOe8>
Subject: Re: [Tsv-art] Tsvart early review of draft-ietf-lsvr-l3dl-03
X-BeenThere: tsv-art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Review Team <tsv-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-art/>
List-Post: <mailto:tsv-art@ietf.org>
List-Help: <mailto:tsv-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 May 2020 22:41:02 -0000

jörg:

again, thanks for taking the time for a strong review.  it is much
appreciated.

> could benefit from a bit of extra context and target application
> domain in the introduction. E.g., explaining explicitly who would talk
> L3DL to whom.

how about the last para of intro getting a few more words along the line
of

   In this document, the use case for L3DL is for point to point links
   in a datacenter Clos in order to exchange the data needed for BGP-SPF
   [I-D.ietf-lsvr-bgp-spf] bootstrap and continuity.  Once layer two
   connectivity has been leveraged to get layer three addressability and
   forwarding capabilities, normal layer three forwarding and routing
   can take over.

   L3DL might be found to be more widely applicable to a range of
   routing and similar protocols which need layer three discovery and
   characterisation.

> 1. Section 10 spells out a default HELLO interval of 60 seconds. With
> a large broadcast domain, this may create quite a bit of
> traffic. While this may not be an issue in well-provisioned data
> center networks, a remark about sensible value ranges and the
> implications may be worthwhile. Just to provide some guidelines to
> implementers (who want to offer choices) and operators (who pick
> them).

   If the configured destination address is one that is propagated by
   switches, the HELLO SHOULD be repeated at a configured interval, with
   a default of 60 seconds.  This allows discovery by new devices which
   come up on the layer-2 mesh.  In this multi-link scenario, the
   operator should be aware of the trade-off between timer tuning and
   network noise and adjust the inter-HELLO timer accordingly.

> 2. Section 10 also suggest that in response to HELLO messages nodes
> will issue OPEN PDUs to newly discovered peers. This appears to bear
> the clear risk of an OPEN implosion when many system come up at the
> same time. Shouldn't guidance be given to avoid repeated traffic
> surges and possible losses and thus unnecessary delays? (I noted that
> other places foresee exponential backoff when retransmitting OPEN and
> other ACKed PDUs).

added

   To ameliorate possible load spikes during bootstrap or event
   recovery, there SHOULD be a jittered delay between receipt of a HELLO
   and issue of the OPEN.  The default delay range SHOULD BE zero to
   five seconds, and MUST be configurable.

> 3. When the protocol applies fragmentation, should there be a note on
> preventing bursts?

likely part of this is our fault, as we did not mean 'fragmentation' in
the classic "oops!  we found a hop with a small mtu."

does augmenting sec 6 as follow help?

   This is not classic 'fragmentation', but rather decomposition at the
   origin to allow PDU payloads larger than the frame allows.  There are
   no intermediate devices capable of further fragmentation or
   reassembly.

   L3DL is carrying relatively small amounts of data on relatively high
   bandwidth links, and at a time when the link is not active with other
   data as it does not yet have layer three connectivity.  So congestion
   is not considered a sufficiently significant risk to warrent
   additional complexity.

> Section 7 on the checksum needs more detail.

could you be more specific re in what area?

> It also talks about a "suggested" algorithm but this should be clearly
> mandated or way to choose one by means of configuration for a complete
> data centre would need to be made explicit.

   The following code describes a suggested algorithm.  This
   specification avoids mandatory to implement, algorithm agility, etc.
   What matters is that the same algorithm is used consistently in any
   deployment.

> I also assume that the pseudo code on p.11 would benefit from a leader
> '0' in 0xffffffff -> 0x0ffffffff, otherwise expansion to 64 bits might
> fill the high order bits with '1's, which is clearly not intended.

ok, but what kinky compiler are you using?  :)

     result = (result >> 32) + (result & 0x0FFFFFFFF);
     result = (result >> 32) + (result & 0x0FFFFFFFF);

> Section 11, p.17, second to last para ("If a properly
> authenticated...").  From the text, it is unclear what is meant by an
> "OPEN with the Serial Number of the last data received".

   If a properly authenticated OPEN arrives with a new Nonce from an
   LLEI with which the receiving logical link endpoint believes it
   already has an L3DL session (OPENs have already been exchanged), and
   the Serial Number in the OPEN PDU is non-zero, the receiver SHOULD
   establish a new session by sending an OPEN with the Serial Number
   being the same as that of the last sent and ACKed PDU.  Each party
   MUST resume sending encapsulations etc. subsequent to the other
   party's Sequence Number.  And each MUST retain all previously
   discovered encapsulation and other data.

> I am curious about the error code, providing 16 bits for additional
> explanation.  Why not a text field?

parsing by the receiver of error codes then becomes an larger unbounded
chase.  but this bit of syntactic sugar may be more pain than it is
worth.  free form or under-defined fields are too fertile a ground for
interoperational incompatibility.  opinions solicited.

> Also wondering if repeated retries (due to failure, not lost packets)
> could yield fast repeated transmissions.

could you be specific about what you think could be improved in

   12.1.  Retransmission

   If a PDU sender expects an ACK, e.g. for an OPEN, an Encapsulation, a
   VENDOR PDU, etc., and does not receive the ACK for a configurable
   time (default one second), and the interface is live at layer 2, the
   sender resends the PDU using exponential back-off, see [RFC1122].
   This cycle MAY be repeated a configurable number of times (default
   three) before it is considered a failure.  The session MAY BE
   considered closed in case of this ACK failure.

> Section 15, should the KEEPALIVE interval have suggested (lower)
> bounds?  At the top of p.26, it says "One per second is the default",
> the previous page at the bottom refers to the inter-KEEPALIVE interval
> of ten seconds. Not sure if the two are the same, I suppose so. If
> they are, the numbers should match.  If they are not, we'll need some
> extra text to explain the difference.

whoops.  how about ten seconds for both?

> There are two spellings of "Encapsulation", capitalised and lower
> case. Use one consistently.

mind if we stall on this one?  we were attempting to differentiate
between the proper noun of a PDU type and normal use of the world.  but
looking at the text, it's a bit messy.  full employment for rfc editors!
:)

> p10, first para: comprise -> comprising

s/compare/comparing/  in

      [RFC1982] on DNS Serial Number Arithmetic for too much detail on
      comparing and incrementing a wrapping sequence number.

pre-published text and xml are at https://git.rg.net/randy/draft-l3dl

thanks again!

randy