Re: [Gen-art] Genart last call review of draft-ietf-tcpm-rto-consider-14

Mark Allman <mallman@icir.org> Fri, 05 June 2020 16:44 UTC

Return-Path: <mallman@icsi.berkeley.edu>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7BAE03A0AC7 for <gen-art@ietfa.amsl.com>; Fri, 5 Jun 2020 09:44:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.652
X-Spam-Level:
X-Spam-Status: No, score=-1.652 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Wsc1BosLXEbo for <gen-art@ietfa.amsl.com>; Fri, 5 Jun 2020 09:44:00 -0700 (PDT)
Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F31E3A09F4 for <gen-art@ietf.org>; Fri, 5 Jun 2020 09:43:55 -0700 (PDT)
Received: by mail-qk1-f171.google.com with SMTP id b27so10321476qka.4 for <gen-art@ietf.org>; Fri, 05 Jun 2020 09:43:55 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version; bh=0jIv31Y1zSFQcB7iH4s3wrmImxYD2/yRQmqClp9+izI=; b=EtzYtSJ14t5ojNQMEPl7iT2L5GXmZaRZMRfLkTAUwhKi1ljjUVRRuIiCB5/IKbWc7n LyOiV4/xE22RgX0a6GzVm7Zej0GbFohXQC5ilHF3vi8O17vfaeG0N/ZnT1wc/2BMYvuC l3OcqnGb/L6OT9oQIxSFbHk2JjANcb/i0fF5Um7CdbAmMQvzNTQoqu5GjxVWvT5+59rQ bg7ioVwvSXnegfm1jj2XKv4LwcCHOmbn9saz6xC6kT9pdPWGCPtb8BPsWrIkAbmb3aRg Qt+uQHzSsFpDxGI/rZm1ATo3cI815GuUL4tUc+yjc2ZJ65ByVQ08Ofti9OgLmQa/2iqj mO2Q==
X-Gm-Message-State: AOAM530QnOGWjIDKwfpvWD8RxRYdIj+31sDUvlr6iIMtXG2fkVcGj+F3 5F0lx9oDtJemdMN07XUgco/AFw==
X-Google-Smtp-Source: ABdhPJz50aatNCzu1qNzKRE70fFoZ5CUhOUycrDXPlEK0OT4sQ8RC3vO3/flXmz5HS/Yn8mBbfJbEA==
X-Received: by 2002:a37:aa44:: with SMTP id t65mr10890625qke.81.1591375433837; Fri, 05 Jun 2020 09:43:53 -0700 (PDT)
Received: from [192.168.1.244] ([2600:1700:b380:3f00:64f0:807:946d:504b]) by smtp.gmail.com with ESMTPSA id x205sm243513qka.12.2020.06.05.09.43.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Jun 2020 09:43:52 -0700 (PDT)
From: Mark Allman <mallman@icir.org>
To: Stewart Bryant <stewart.bryant@gmail.com>
Cc: gen-art@ietf.org, tcpm@ietf.org, last-call@ietf.org, draft-ietf-tcpm-rto-consider.all@ietf.org
Date: Fri, 05 Jun 2020 12:43:50 -0400
X-Mailer: MailMate (1.13.1r5671)
Message-ID: <FE0FA7D5-176D-4111-95DA-BD5424A24FE2@icir.org>
In-Reply-To: <159083802039.5596.14695350463305243689@ietfa.amsl.com>
References: <159083802039.5596.14695350463305243689@ietfa.amsl.com>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_FFE56152-B5DB-4BC4-A398-62E31465C5AF_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/gen-art/8dAqkwKf7oFyORwkPZQA48JSu6M>
Subject: Re: [Gen-art] Genart last call review of draft-ietf-tcpm-rto-consider-14
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jun 2020 16:44:09 -0000

Hi Stewart!

Thanks for the feedback.  Sorry for the long RTT.  I had a recent
deadline and am now trying to dig out.

> Major issues:
>
> As far as I can see this text only applies to exchanges between
> applications and network support applications such as
> DNS. I.e. this is targeted at layer 4 and above. Given the
> religious nature of BCPs in the eyes of some reviewers, and to
> prevent endless explanations by those that design routing
> protocols, OAM and other lower layer sub-system I think there
> needs to a scoping text in block capitals at the at the very start
> of the documnet.

I am not entirely sure what you're suggesting here.  Per note to
Tom, I am going to add a few words to the intro.  Maybe that will
help.  I think it's unlikely I'll use block capitals! :-)

> =========
>
>       - The requirements in this document may not be appropriate in all
>         cases and, therefore, inconsistent deviations may be necessary
>         (hence the "SHOULD" in the last bullet).  However,
>         inconsistencies MUST be (a) explained and (b) gather consensus.
>
> SB> That can be quite an onerous obligation  and provide scope for
> SB> endless argument when reviewers are not domain experts in the
> SB> protocol being designed.

This was added because another reviewer thought it was for sure
necessary.

I guess I don't understand why you'd call this 'an onerous
obligation' since presumably you'd do it anyway without this
document.  Are we ramming things through without consensus?  If not
(my assumption), (b) is no sweat.  Are we ramming things through
without thought?  If not (my assumption), (a) is straightforward and
hopefully is being done anyway.  In other words, I don't understand
the complaint here because if you don't want to use the guidelines
then that is fine, but in going through the standard process to
define a loss detector you'll end up meeting this bullet.  Even if
this document doesn't get published or didn't exist our documents
should still be meeting this bullet.

> =======
>
>           While there are a bevy of uses for timers in protocols---from
>           rate-based pacing to connection failure detection and
>           beyond---these are outside the scope of this document.
>
> SB> I am not sure what that means for the applicability of this
> SB> document.

This was added at some point along the way because someone thought
something like rate-based pacing could be covered by the guidelines
and the intent is to say it is not.  I have zero love for this bit
and would happily remove it, but am loathe to do so because the old
comment will then come back.

> =========
>
>     (1) As we note above, loss detection happens when a sender does not
>         receive delivery confirmation within an some expected period of
>         time.  In the absence of any knowledge about the latency of a
>         path, the initial RTO MUST be conservatively set to no less than
>         1 second.
>
> SB> This issue may be addressed by the scoping text, but 1s is no
> SB> use when you are trying to detect sub 50ms of packet loss in
> SB> the infrastructure.

We have to start somewhere when we know nothing.

I think in my thread with Tom we hit upon this notion that the
document is really about sort of arbitrary, unknown and therefore
presumed unreliable networks.  I am going to add some words to this
effect.  Does this help?

Again, for specific environments where things are more nailed down
and known, deviations are fine and explicitly OK.  But, as a general
default I think saying "when you don't know anything < 50msec is
cool" is unlikely to be appropriate.  Well, no, I think it would be
quite inappropriate, actually.

> =============
>
>     (3) Each time the RTO is used to detect a loss, the value of the RTO
>         MUST be exponentially backed off such that the next firing
>         requires a longer interval.  The backoff SHOULD be removed after
>         either (a) the subsequent successful transmission of
>         non-retransmitted data, or (b) an RTO passes without detecting
>         additional losses.  The former will generally be quicker.  The
>         latter covers cases where loss is detected, but not repaired.
>
>         A maximum value MAY be placed on the RTO.  The maximum RTO MUST
>         NOT be less than 60 seconds (as specified in [RFC6298]).
>
>         This ensures network safety.
>
> SB> This does not work in OAM applications.

Well, OK, get consensus to do something different---which is
completely fine.  I think retransmission timers have shown
themselves to be crucial for preventing collapse and, again, as a
default I think this is our best advice.

> Minor issues:
>
>  "By waiting long enough that we are unambiguously
>   certain a packet has been lost we cannot repair losses in a timely
>   manner and we risk prolonging network congestion."
>
> I have a concern here that the emphasis is on classical
> operation. We are beginning to see application to run over the
> network where the timely delivery of a packet is critical for
> correct operation of even SoL. As a BCP the text needs to
> recognise that the scope and purpose of IP is changing and that
> classical learning and rules derived from them may not apply.
>
> Also if not ruled out of scope earlier we need to be clear at this
> point that things like BFD have different considerations.

I am going to suggest we revisit this after I hack out a little
extra text for the intro.  You can see if that helps.

> ==========
>
>       "- This document does not update or obsolete any existing RFC.
>         These previous specifications---while generally consistent with
>         the requirements in this document---reflect community consensus
>         and this document does not change that consensus."
>
> I think it needs to be clear that adherence to this RFC is not
> required for minor updates and extensions to existing RFCs. Having
> seen minor routing extension held up by security concerns related
> to underlying protocols rather than the extension itself there is
> a lot of sensitivity on this point in some quarters of the IETF.

Um.  Do you have suggested words?  I am not much of a protocol
lawyers (thankfully!), but I am not really conjuring the case you're
concerned about.  Something like ...

  (1) RFC XXXX was published 10 years ago and violates
      rto-consider.
  (2) We want to do a XXXXbis.
  (3) The bis has to then explain why it's cool to violate
      rto-consider.

... ?

I would say if XXXX has a loss detector that had consensus and has
been in use for a while it'd be pretty easy to get consensus for
XXXXbis that we can still use it as it has worked fine.

> It might be useful to make it clear that there are some
> applications that would prefer no data to late data.

This document is about loss detection, not what one does after
detecting.  So, we do say ...

    However, as discussed above, the detected loss need not be
    repaired

I am happy to re-enforce this point.  Text suggestions welcome.

> Nits/editorial comments:
>
> The terminology section confuses ID-nits - I think it should be a
> section in its own right later in the document.

Yeah- id-nits as it is run when submitting doesn't flag this.  It
was flagged by someone else in LC.  Because I am old school it's
hard to renumber everything and so I was just leaving this for the
rfc-ed to do something reasonable here.

> The following nits issues need looking at
>
>   == Missing Reference: 'RFC5681' is mentioned on line 377, but not defined
>
>   == Unused Reference: 'RFC3940' is defined on line 515, but no explicit
>      reference was found in the text
>
>   == Unused Reference: 'RFC4340' is defined on line 519, but no explicit
>      reference was found in the text
>
>   == Unused Reference: 'RFC6582' is defined on line 540, but no explicit
>      reference was found in the text

I will fix all these.  Again, I was trusting the id-nits when I
submitted and these were not flagged (or, if they were it wasn't in
a way that foisted them on my screen).  But, they're easy fixes, so
thanks!

allman