Re: [Tsv-art] Tsvart early review of draft-ietf-lsvr-l3dl-03

Joerg Ott <ott@in.tum.de> Mon, 11 May 2020 20:40 UTC

Return-Path: <ott@in.tum.de>
X-Original-To: tsv-art@ietfa.amsl.com
Delivered-To: tsv-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 761643A0B6E; Mon, 11 May 2020 13:40:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2O7LySk89wwn; Mon, 11 May 2020 13:40:34 -0700 (PDT)
Received: from mail-out2.informatik.tu-muenchen.de (mail-out2.in.tum.de [131.159.0.36]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6F92E3A0B86; Mon, 11 May 2020 13:40:32 -0700 (PDT)
Received: by mail.in.tum.de (Postfix, from userid 107) id CAA871C0851; Mon, 11 May 2020 22:40:30 +0200 (CEST)
Received: (Authenticated sender: ott) by mail.in.tum.de (Postfix) with ESMTPSA id F26971C084A; Mon, 11 May 2020 22:40:27 +0200 (CEST) (Extended-Queue-bit tech_mewbi@fff.in.tum.de)
To: Randy Bush <randy@psg.com>
Cc: tsv-art@ietf.org, lsvr@ietf.org, draft-ietf-lsvr-l3dl.all@ietf.org
References: <158870511665.7532.2079643708622987385@ietfa.amsl.com> <m2sggclma3.wl-randy@psg.com>
From: Joerg Ott <ott@in.tum.de>
Message-ID: <c1712c72-fcf1-f2d5-e5ab-f8f4eb3f911d@in.tum.de>
Date: Mon, 11 May 2020 22:40:27 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.8.0
MIME-Version: 1.0
In-Reply-To: <m2sggclma3.wl-randy@psg.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/XSwzcdljy7M7iMkNjBiN4CVlu8Q>
Subject: Re: [Tsv-art] Tsvart early review of draft-ietf-lsvr-l3dl-03
X-BeenThere: tsv-art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Review Team <tsv-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-art/>
List-Post: <mailto:tsv-art@ietf.org>
List-Help: <mailto:tsv-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 11 May 2020 20:40:37 -0000

Randy,

thanks much, inline.

> again, thanks for taking the time for a strong review.  it is much
> appreciated.
> 
>> could benefit from a bit of extra context and target application
>> domain in the introduction. E.g., explaining explicitly who would talk
>> L3DL to whom.
> 
> how about the last para of intro getting a few more words along the line
> of
> 
>     In this document, the use case for L3DL is for point to point links
>     in a datacenter Clos in order to exchange the data needed for BGP-SPF
>     [I-D.ietf-lsvr-bgp-spf] bootstrap and continuity.  Once layer two
>     connectivity has been leveraged to get layer three addressability and
>     forwarding capabilities, normal layer three forwarding and routing
>     can take over.
> 
>     L3DL might be found to be more widely applicable to a range of
>     routing and similar protocols which need layer three discovery and
>     characterisation.

Sounds (well: reads) good.

>> 1. Section 10 spells out a default HELLO interval of 60 seconds. With
>> a large broadcast domain, this may create quite a bit of
>> traffic. While this may not be an issue in well-provisioned data
>> center networks, a remark about sensible value ranges and the
>> implications may be worthwhile. Just to provide some guidelines to
>> implementers (who want to offer choices) and operators (who pick
>> them).
> 
>     If the configured destination address is one that is propagated by
>     switches, the HELLO SHOULD be repeated at a configured interval, with
>     a default of 60 seconds.  This allows discovery by new devices which
>     come up on the layer-2 mesh.  In this multi-link scenario, the
>     operator should be aware of the trade-off between timer tuning and
>     network noise and adjust the inter-HELLO timer accordingly.

Ack.

>> 2. Section 10 also suggest that in response to HELLO messages nodes
>> will issue OPEN PDUs to newly discovered peers. This appears to bear
>> the clear risk of an OPEN implosion when many system come up at the
>> same time. Shouldn't guidance be given to avoid repeated traffic
>> surges and possible losses and thus unnecessary delays? (I noted that
>> other places foresee exponential backoff when retransmitting OPEN and
>> other ACKed PDUs).
> 
> added
> 
>     To ameliorate possible load spikes during bootstrap or event
>     recovery, there SHOULD be a jittered delay between receipt of a HELLO
>     and issue of the OPEN.  The default delay range SHOULD BE zero to
>     five seconds, and MUST be configurable.

Ok.

>> 3. When the protocol applies fragmentation, should there be a note on
>> preventing bursts?
> 
> likely part of this is our fault, as we did not mean 'fragmentation' in
> the classic "oops!  we found a hop with a small mtu."

I didn't take it to mean classic fragmentation but rather ALF-style
operation.  Still, this could generate bursts depending on how much
information there is to 'fragment'.

> does augmenting sec 6 as follow help?
> 
>     This is not classic 'fragmentation', but rather decomposition at the
>     origin to allow PDU payloads larger than the frame allows.  There are
>     no intermediate devices capable of further fragmentation or
>     reassembly.
> 
>     L3DL is carrying relatively small amounts of data on relatively high
>     bandwidth links, and at a time when the link is not active with other
>     data as it does not yet have layer three connectivity.  So congestion
>     is not considered a sufficiently significant risk to warrent
>     additional complexity.

Fine by me.  I assume that the limited amount of information you carry
in the first place would not yield more than 2 or 3 packets, so that the
above sounds good.

>> Section 7 on the checksum needs more detail.
> 
> could you be more specific re in what area?

Let me go back:

The code is fine (and certainly a good idea).  I was kind of expecting
the code to be described in text, but given the code is there, the
concise description is fine.  Except that I got confused by
"repeat until zero".  What should be zero?  Looking at the code, surely
not the result.

>> It also talks about a "suggested" algorithm but this should be clearly
>> mandated or way to choose one by means of configuration for a complete
>> data centre would need to be made explicit.
> 
>     The following code describes a suggested algorithm.  This
>     specification avoids mandatory to implement, algorithm agility, etc.
>     What matters is that the same algorithm is used consistently in any
>     deployment.

Ok.

>> I also assume that the pseudo code on p.11 would benefit from a leader
>> '0' in 0xffffffff -> 0x0ffffffff, otherwise expansion to 64 bits might
>> fill the high order bits with '1's, which is clearly not intended.
> 
> ok, but what kinky compiler are you using?  :)
> 
>       result = (result >> 32) + (result & 0x0FFFFFFFF);
>       result = (result >> 32) + (result & 0x0FFFFFFFF);

We had this and I am happy to undo this comment.  Hasn't occurred
lately but surely earlier.

Probably not having the leading zero would ensure that people don't
assume that it's just 7 x F instead of 8.

>> Section 11, p.17, second to last para ("If a properly
>> authenticated...").  From the text, it is unclear what is meant by an
>> "OPEN with the Serial Number of the last data received".
> 
>     If a properly authenticated OPEN arrives with a new Nonce from an
>     LLEI with which the receiving logical link endpoint believes it
>     already has an L3DL session (OPENs have already been exchanged), and
>     the Serial Number in the OPEN PDU is non-zero, the receiver SHOULD
>     establish a new session by sending an OPEN with the Serial Number
>     being the same as that of the last sent and ACKed PDU.  Each party
>     MUST resume sending encapsulations etc. subsequent to the other
>     party's Sequence Number.  And each MUST retain all previously
>     discovered encapsulation and other data.

Clear now.

>> I am curious about the error code, providing 16 bits for additional
>> explanation.  Why not a text field?
> 
> parsing by the receiver of error codes then becomes an larger unbounded
> chase.  but this bit of syntactic sugar may be more pain than it is
> worth.  free form or under-defined fields are too fertile a ground for
> interoperational incompatibility.  opinions solicited.

fine by me either way, was just curious.

>> Also wondering if repeated retries (due to failure, not lost packets)
>> could yield fast repeated transmissions.
> 
> could you be specific about what you think could be improved in

I was trying to decipher my handwriting on the tablet, and it appears to
say "wait before retry to prevent fast loops", scribbled on the bottom
of p.22.  I had to go back tot he text and I believe this refers back to
p.18, first para:

"The Key is specific to the operational environment.  A failure to
    authenticate is a failure to start the L3DL session, an ERROR PDU
    MUST BE sent (Error Code 3), and HELLOs MUST be restarted."

If you have a systematic auth failure, this would cause HELLOs to be
restarted.  Without any jittering (now you introduced some above),
you could, in principle, get a loop:

HELLO -> OPEN -> (Auth) ERROR -> HELLO -> OPEN -> (Auth) ERROR -> ...

So misconfig could be bad unless there is some delay built in.

>     12.1.  Retransmission
> 
>     If a PDU sender expects an ACK, e.g. for an OPEN, an Encapsulation, a
>     VENDOR PDU, etc., and does not receive the ACK for a configurable
>     time (default one second), and the interface is live at layer 2, the
>     sender resends the PDU using exponential back-off, see [RFC1122].
>     This cycle MAY be repeated a configurable number of times (default
>     three) before it is considered a failure.  The session MAY BE
>     considered closed in case of this ACK failure.
> 
>> Section 15, should the KEEPALIVE interval have suggested (lower)
>> bounds?  At the top of p.26, it says "One per second is the default",
>> the previous page at the bottom refers to the inter-KEEPALIVE interval
>> of ten seconds. Not sure if the two are the same, I suppose so. If
>> they are, the numbers should match.  If they are not, we'll need some
>> extra text to explain the difference.
> 
> whoops.  how about ten seconds for both?

I have no specific opinion as long as it is consistent.  10s for both is
fine.

>> There are two spellings of "Encapsulation", capitalised and lower
>> case. Use one consistently.
> 
> mind if we stall on this one?  we were attempting to differentiate
> between the proper noun of a PDU type and normal use of the world.  but
> looking at the text, it's a bit messy.  full employment for rfc editors!
> :)

yay! :-)

>> p10, first para: comprise -> comprising
> 
> s/compare/comparing/  in
> 
>        [RFC1982] on DNS Serial Number Arithmetic for too much detail on
>        comparing and incrementing a wrapping sequence number.
> 
> pre-published text and xml are at https://git.rg.net/randy/draft-l3dl
> 

Thanks much for the super-quick responses!

Cheers,
Jörg