[Lsvr] Tsvart early review of draft-ietf-lsvr-l3dl-03

Joerg Ott via Datatracker <noreply@ietf.org> Tue, 05 May 2020 18:58 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: lsvr@ietf.org
Delivered-To: lsvr@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id C03113A045B; Tue, 5 May 2020 11:58:36 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Joerg Ott via Datatracker <noreply@ietf.org>
To: <tsv-art@ietf.org>
Cc: draft-ietf-lsvr-l3dl.all@ietf.org, lsvr@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.129.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <158870511665.7532.2079643708622987385@ietfa.amsl.com>
Reply-To: Joerg Ott <jo@acm.org>
Date: Tue, 05 May 2020 11:58:36 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsvr/jfh3Av08b3CqFD_NKtS4LgO92cY>
X-Mailman-Approved-At: Tue, 05 May 2020 12:13:24 -0700
Subject: [Lsvr] Tsvart early review of draft-ietf-lsvr-l3dl-03
X-BeenThere: lsvr@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Link State Vector Routing <lsvr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsvr>, <mailto:lsvr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsvr/>
List-Post: <mailto:lsvr@ietf.org>
List-Help: <mailto:lsvr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsvr>, <mailto:lsvr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 May 2020 18:58:37 -0000

Reviewer: Joerg Ott
Review result: Ready with Issues

The draft describes a peer/neighbour discovery mechanisms for large-scale L2/L3
topologies in data centres. The aim is provide a protocol by means of which the
involved nodes can learn about other nodes connected to their (broadcast or
point-to-point) L2 links and about their respectively support encapsulation
schemes, identifiers, L2/L3 addresses, etc. This information is then provided
to a higher layer for further processing.

The document is well written and fairly easy to follow, but could benefit from
a bit of extra context and target application domain in the introduction. E.g.,
explaining explicitly who would talk L3DL to whom.

>From a transport perspective, I see three potential issues that deserve
clarification or reconsideration:

1. Section 10 spells out a default HELLO interval of 60 seconds. With a large
broadcast domain, this may create quite a bit of traffic. While this may not be
an issue in well-provisioned data center networks,  a remark about sensible
value ranges and the implications may be worthwhile. Just to provide some
guidelines to implementers (who want to offer choices) and operators (who pick
them).

2. Section 10 also suggest that in response to HELLO messages nodes will issue
OPEN PDUs to newly discovered peers. This appears to bear the clear risk of an
OPEN implosion when many system come up at the same time. Shouldn't guidance be
given to avoid repeated traffic surges and possible losses and thus unnecessary
delays? (I noted that other places foresee exponential backoff when
retransmitting OPEN and other ACKed PDUs).

3. When the protocol applies fragmentation, should there be a note on
preventing bursts?

Other notes:
Section 7 on the checksum needs more detail. It also talks about a "suggested"
algorithm but this should be clearly mandated or way to choose one by means of
configuration for a complete data centre would need to be made explicit. I also
assume that the pseudo code on p.11 would benefit from a leader '0' in
0xffffffff -> 0x0ffffffff, otherwise expansion to 64 bits might fill the high
order bits with '1's, which is clearly not intended.

Section 11, p.17, second to last para ("If a properly authenticated...").  From
the text, it is unclear what is meant by an "OPEN with the Serial Number of the
last data received".

I am curious about the error code, providing 16 bits for additional
explanation. Why not a text field? Also wondering if repeated retries (due to
failure, not lost packets) could yield fast repeated transmissions.

Section 15, should the KEEPALIVE interval have suggested (lower) bounds?
At the top of p.26, it says "One per second is the default", the previous page
at the bottom refers to the inter-KEEPALIVE interval of ten seconds. Not sure
if the two are the same, I suppose so. If they are, the numbers should match.
If they are not, we'll need some extra text to explain the difference.

Nits:
There are two spellings of "Encapsulation", capitalised and lower case. Use one
consistently. p10, first para: comprise -> comprising