Re: [Gen-art] [tcpm] Genart last call review of draft-ietf-tcpm-rto-consider-14

Please see below.

On 05/06/2020 17:43, Mark Allman wrote:
> Hi Stewart!
>
> Thanks for the feedback.  Sorry for the long RTT.  I had a recent
> deadline and am now trying to dig out.
>
>> Major issues:
>>
>> As far as I can see this text only applies to exchanges between
>> applications and network support applications such as
>> DNS. I.e. this is targeted at layer 4 and above. Given the
>> religious nature of BCPs in the eyes of some reviewers, and to
>> prevent endless explanations by those that design routing
>> protocols, OAM and other lower layer sub-system I think there
>> needs to a scoping text in block capitals at the at the very start
>> of the documnet.
> I am not entirely sure what you're suggesting here.  Per note to
> Tom, I am going to add a few words to the intro.  Maybe that will
> help.  I think it's unlikely I'll use block capitals! :-)
>
>> =========
>>
>>        - The requirements in this document may not be appropriate in all
>>          cases and, therefore, inconsistent deviations may be necessary
>>          (hence the "SHOULD" in the last bullet).  However,
>>          inconsistencies MUST be (a) explained and (b) gather consensus.
>>
>> SB> That can be quite an onerous obligation  and provide scope for
>> SB> endless argument when reviewers are not domain experts in the
>> SB> protocol being designed.
> This was added because another reviewer thought it was for sure
> necessary.
>
> I guess I don't understand why you'd call this 'an onerous
> obligation' since presumably you'd do it anyway without this
> document.  Are we ramming things through without consensus?  If not
> (my assumption), (b) is no sweat.  Are we ramming things through
> without thought?  If not (my assumption), (a) is straightforward and
> hopefully is being done anyway.  In other words, I don't understand
> the complaint here because if you don't want to use the guidelines
> then that is fine, but in going through the standard process to
> define a loss detector you'll end up meeting this bullet.  Even if
> this document doesn't get published or didn't exist our documents
> should still be meeting this bullet.
>
>> =======
>>
>>            While there are a bevy of uses for timers in protocols---from
>>            rate-based pacing to connection failure detection and
>>            beyond---these are outside the scope of this document.
>>
>> SB> I am not sure what that means for the applicability of this
>> SB> document.
> This was added at some point along the way because someone thought
> something like rate-based pacing could be covered by the guidelines
> and the intent is to say it is not.  I have zero love for this bit
> and would happily remove it, but am loathe to do so because the old
> comment will then come back.
I think Mark is correct, there are many transport uses of timers, and 
calling out a small number of other uses was important to scope this 
withing the transport discussions, even if it just says "timers also do 
other stuff".
>> =========
>>
>>      (1) As we note above, loss detection happens when a sender does not
>>          receive delivery confirmation within an some expected period of
>>          time.  In the absence of any knowledge about the latency of a
>>          path, the initial RTO MUST be conservatively set to no less than
>>          1 second.
>>
>> SB> This issue may be addressed by the scoping text, but 1s is no
>> SB> use when you are trying to detect sub 50ms of packet loss in
>> SB> the infrastructure.
> We have to start somewhere when we know nothing.
>
> I think in my thread with Tom we hit upon this notion that the
> document is really about sort of arbitrary, unknown and therefore
> presumed unreliable networks.  I am going to add some words to this
> effect.  Does this help?
>
> Again, for specific environments where things are more nailed down
> and known, deviations are fine and explicitly OK.  But, as a general
> default I think saying "when you don't know anything < 50msec is
> cool" is unlikely to be appropriate.  Well, no, I think it would be
> quite inappropriate, actually.

This is I think a natural discussion based on a different perspective. 
The 1 second initial starting value for a transport path has been there 
for a long time, and transport reviewers will frequently quote this be 
it for transport:  SCTP, TCP, or for UDP-based apps (BCP: 145 Sect 
3.1.1). I'd expect this is about the assumed starting position for an 
Internet path.

True if we're talking about a link between adjacent peers, this is 
something very different.

>> =============
>>
>>      (3) Each time the RTO is used to detect a loss, the value of the RTO
>>          MUST be exponentially backed off such that the next firing
>>          requires a longer interval.  The backoff SHOULD be removed after
>>          either (a) the subsequent successful transmission of
>>          non-retransmitted data, or (b) an RTO passes without detecting
>>          additional losses.  The former will generally be quicker.  The
>>          latter covers cases where loss is detected, but not repaired.
>>
>>          A maximum value MAY be placed on the RTO.  The maximum RTO MUST
>>          NOT be less than 60 seconds (as specified in [RFC6298]).
>>
>>          This ensures network safety.
>>
>> SB> This does not work in OAM applications.
> Well, OK, get consensus to do something different---which is
> completely fine.  I think retransmission timers have shown
> themselves to be crucial for preventing collapse and, again, as a
> default I think this is our best advice.
>
It should be applicable for OAM applications that use a path across the 
Internet that can change, and certainly could be bad advice for 
controlled environment. It's actually not new, BCP: 145 also speaks of 
backoff.
>> Minor issues:
>>
>>   "By waiting long enough that we are unambiguously
>>    certain a packet has been lost we cannot repair losses in a timely
>>    manner and we risk prolonging network congestion."
>>
>> I have a concern here that the emphasis is on classical
>> operation. We are beginning to see application to run over the
>> network where the timely delivery of a packet is critical for
>> correct operation of even SoL. As a BCP the text needs to
>> recognise that the scope and purpose of IP is changing and that
>> classical learning and rules derived from them may not apply.
>>
>> Also if not ruled out of scope earlier we need to be clear at this
>> point that things like BFD have different considerations.
Isn't BFD is a link protocol between adjacent systems?
> I am going to suggest we revisit this after I hack out a little
> extra text for the intro.  You can see if that helps.
>
>> ==========
>>
>>        "- This document does not update or obsolete any existing RFC.
>>          These previous specifications---while generally consistent with
>>          the requirements in this document---reflect community consensus
>>          and this document does not change that consensus."
>>
>> I think it needs to be clear that adherence to this RFC is not
>> required for minor updates and extensions to existing RFCs. Having
>> seen minor routing extension held up by security concerns related
>> to underlying protocols rather than the extension itself there is
>> a lot of sensitivity on this point in some quarters of the IETF.
> Um.  Do you have suggested words?  I am not much of a protocol
> lawyers (thankfully!), but I am not really conjuring the case you're
> concerned about.  Something like ...
>
>    (1) RFC XXXX was published 10 years ago and violates
>        rto-consider.
>    (2) We want to do a XXXXbis.
>    (3) The bis has to then explain why it's cool to violate
>        rto-consider.
>
> .... ?
>
> I would say if XXXX has a loss detector that had consensus and has
> been in use for a while it'd be pretty easy to get consensus for
> XXXXbis that we can still use it as it has worked fine.
>
>> It might be useful to make it clear that there are some
>> applications that would prefer no data to late data.
> This document is about loss detection, not what one does after
> detecting.  So, we do say ...
>
>      However, as discussed above, the detected loss need not be
>      repaired
>
> I am happy to re-enforce this point.  Text suggestions welcome.
>
>> Nits/editorial comments:
>>
>> The terminology section confuses ID-nits - I think it should be a
>> section in its own right later in the document.
> Yeah- id-nits as it is run when submitting doesn't flag this.  It
> was flagged by someone else in LC.  Because I am old school it's
> hard to renumber everything and so I was just leaving this for the
> rfc-ed to do something reasonable here.
>
>> The following nits issues need looking at
>>
>>    == Missing Reference: 'RFC5681' is mentioned on line 377, but not defined
>>
>>    == Unused Reference: 'RFC3940' is defined on line 515, but no explicit
>>       reference was found in the text
>>
>>    == Unused Reference: 'RFC4340' is defined on line 519, but no explicit
>>       reference was found in the text
>>
>>    == Unused Reference: 'RFC6582' is defined on line 540, but no explicit
>>       reference was found in the text
> I will fix all these.  Again, I was trusting the id-nits when I
> submitted and these were not flagged (or, if they were it wasn't in
> a way that foisted them on my screen).  But, they're easy fixes, so
> thanks!
>
> allman
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm