[core] John Scudder's No Objection on draft-ietf-core-new-block-12: (with COMMENT)

John Scudder via Datatracker <noreply@ietf.org> Fri, 21 May 2021 16:22 UTC

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: John Scudder via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-core-new-block@ietf.org, core-chairs@ietf.org, core@ietf.org, marco.tiloca@ri.se, marco.tiloca@ri.se
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: John Scudder <jgs@juniper.net>
Message-ID: <162161414645.14500.9969284754936809565@ietfa.amsl.com>
Date: Fri, 21 May 2021 09:22:26 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/FvCzJEYV7xVT9JI78cFZH11AwDk>
Subject: [core] John Scudder's No Objection on draft-ietf-core-new-block-12: (with COMMENT)

John Scudder has entered the following ballot position for
draft-ietf-core-new-block-12: No Objection

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-core-new-block/



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

My further comments are resolved in the current GitHub copy of the document, so
once it's published as version 13 I think we're good to go as far as I'm
concerned. Thanks for the discussion and changes.

--

My initial comments have been resolved or partially resolved in version 12, see
https://mailarchive.ietf.org/arch/msg/core/6geK8P9I0jZBPp9a5qSrQwQNhUo/. Note
I've added two new ones at the end of the list.

(draft-ietf-core-new-block-11)

1. Section 3.2

   This mechanism is not intended for general CoAP usage, and any use
   outside the intended use case should be carefully weighed against the
   loss of interoperability with generic CoAP applications.

I’m curious: is the only reason the mechanism isn’t intended for general usage,
the fact some implementations won’t support it? Or does it have other
deficiencies that also make it unsuitable?

2. Section 4.1

   Q-Block2 Option is useful with GET, POST, PUT, FETCH, PATCH, and
   iPATCH requests and their payload-bearing responses (2.01, 2.02,
   2.03, 2.04, and 2.05) (Section 5.5 of [RFC7252]).

I found the list of codes incomprehensible on first encountering it, since the
concept of response codes hadn’t been introduced yet. I do understand that the
document assumes familiarity with CoAP; nonetheless for basic clarity I think
this should say “(response codes 2.01, 2.02…”. Additionally, the reference to
RFC 7252 §5.5 doesn’t seem to be especially germane?

By the way, is 2.03 indeed a payload-bearing response? The only other place the
spec touches on it is in §4.4, which says “the server could respond with a 2.03
(Valid) response with no payload”.

3. Section 4.1

   To indicate support for Q-Block2 responses, the CoAP client MUST
   include the Q-Block2 Option in a GET or similar request (FETCH, for
   example), the Q-Block2 Option in a PUT or similar request, or the
   Q-Block1 Option in a PUT or similar request so that the server knows
   that the client supports this Q-Block functionality should it need to
   send back a body that spans multiple payloads.  Otherwise, the server
   would use the Block2 Option (if supported) to send back a message
   body that is too large to fit into a single IP packet [RFC7959].

Is this paragraph really supposed to mention both Q-Block2 and Q-Block1? In
particular, I’m confused by the mention of both of these in relation to PUT.

4. Section 4.1

   The Q-Block1 and Q-Block2 Options are unsafe to forward.  That is, a
   CoAP proxy that does not understand the Q-Block1 (or Q-Block2) Option
   MUST reject the request or response that uses either option.

Presumably (hopefully) this is simply describing the behavior of existing
spec-compliant proxies when processing the new messages. As such, is the MUST
appropriate? I would think not.

5. Section 4.3

      body.  Note that the last received payload may not be the one with
      the highest block number.

“Might not” would be less ambiguous than “may not”.

6. Section 4.4 (also two places in §4.3)

(This comment rehashes, in more detail, the difficulty explained in my DISCUSS.
You may want to skip over it until we’ve resolved the DISCUSS, after which this
may, or may not, be relevant.)

   The client SHOULD wait for up to NON_RECEIVE_TIMEOUT (Section 7.2)

I read this as meaning the client should wait for as little as zero, or as long
as NON_RECEIVE_TIMEOUT — that’s my understanding of “up to”. Is that the
intended meaning? If it is, I think it’s worth writing out as I’ve done, for
clarity. If it’s not, it definitely needs to be fixed.

There’s a similar issue with “up to NON_PARTIAL_TIMEOUT” later in the section.

Referring ahead to Section 7.2 muddies the waters further. Even though the text
quoted above says NON_RECEIVE_TIMEOUT is an upper limit on how long to wait,
§7.2 says it’s a lower limit instead... maybe? From §7.2:

   NON_RECEIVE_TIMEOUT is the initial maximum time to wait for a missing

“Maximum”, ok great, that means “upper bound” and so lines up with §4.4
although the “initial” is surprising since §4.4 doesn’t say anything about the
upper limit increasing. It continues:

   payload before requesting retransmission for the first time.  Every
   time the missing payload is re-requested, the time to wait value
   doubles.  The time to wait is calculated as:

      Time-to-Wait = NON_RECEIVE_TIMEOUT * (2 ** (Re-Request-Count - 1))

But this part says it’s (a) an exact time-to-wait, not a “maximum”, and (b) it
says it increases exponentially, so NON_RECEIVE_TIMEOUT isn’t a maximum at all,
but a minimum.

This later text in §7.2 implies that perhaps the problem in the above passages
is the word “maximum”, and it should simply be deleted:

   For the server receiving NON Q-Block1 requests, it SHOULD send back a
   2.31 (Continue) Response Code on receipt of all of the MAX_PAYLOADS
   payloads to prevent the client unnecessarily delaying.  If not all of
   the MAX_PAYLOADS payloads were received, the server SHOULD delay for
   NON_RECEIVE_TIMEOUT (exponentially scaled based on the repeat request
   count for a payload) before sending the 4.08 (Request Entity
   Incomplete) Response Code for the missing payload(s).

Similarly “up to” in the quote that began this comment should be “at least”.

Whether you adopt those suggestions or not,  it seems as though all this needs
to be rewritten with careful attention to conveying what the desired behavior
is.

But the plot thickens. Later in §7.2 we have

   It is likely that the client will start transmitting the next set of
   MAX_PAYLOADS payloads before the server times out on waiting for the
   last of the previous MAX_PAYLOADS payloads.  On receipt of the first
   payload from the new set of MAX_PAYLOADS payloads, the server SHOULD
   send a 4.08 (Request Entity Incomplete) Response Code indicating any
   missing payloads from any previous MAX_PAYLOADS payloads.

The point being that the retransmission request can be triggered by an event
other than timer expiration. So in that sense, “maximum” is right — it provides
an upper bound on how long to wait before requesting a retransmission — but in
another sense it’s wrong because the exponential increase is applied to it. I
think the word “maximum” is trying to do too much work, and more words are
probably required in order to make this clear. I also think the problem is
exacerbated by the fact both §4.4 and §7.2 are talking normatively about how to
use NON_RECEIVE_TIMEOUT. It seems as though the main description is found in
§7.2, and some confusion would be avoided by making §4.4 less specific, and
simply referring forward to §7.2.

And, as noted in my DISCUSS, example 10.2.3 muddies the waters still further
since it illustrates yet another behavior.

7. Section 4.4

   The client SHOULD wait for up to NON_RECEIVE_TIMEOUT (Section 7.2)
   after the last received payload for NON payloads before issuing a
   GET, POST, PUT, FETCH, PATCH, or iPATCH request that contains one or
   more Q-Block2 Options that define the missing blocks with the M bit
   unset.  The client MAY set the M bit to request this and later blocks
   from this MAX_PAYLOADS set.  Further considerations related to the
   transmission timing for missing requests are discussed in
   Section 7.2.

I find this whole paragraph pretty confusing with the dueling SHOULD and MAY,
where it appears the SHOULD might be doing two jobs at once. I *think* your
intent is something like the following?

“The client SHOULD wait as specified in Section 7.2 for NON payloads before
requesting retransmission of any missing blocks. Retransmission is requested by
issuing a GET, POST, PUT, FETCH, PATCH, or iPATCH request that contains one or
more Q-Block2 Options that define the missing block(s). Generally the M bit on
the Q-Block option(s) SHOULD be unset, although the M bit MAY be set to request
this and later blocks from this MAX_PAYLOADS set, see Section 10.2.4 for an
example of this in operation.”

8. Section 5

   If the size of the 4.08 (Request Entity Incomplete) response packet
   is larger than that defined by Section 4.6 [RFC7252], then the number
   of missing blocks MUST be limited so that the response can fit into a
   single packet.  If this is the case, then the server can send

Suggestion: “then the number of missing blocks reported MUST...” (The thing
being limited is not the actual number of missing blocks. You’re limiting the
number you report on.)

9. Section 7.1

   It is implementation specific as to whether there should be any
   further requests for missing data as there will have been significant
   transmission failure as individual payloads will have failed after
   MAX_TRANSMIT_SPAN.

This paragraph seems as though it’s a non-sequitur. It just doesn’t make sense
to me. :-(

10. Section 7.2

(This comment relates to the difficulty explained in my DISCUSS. You may want
to skip over it until we’ve resolved the DISCUSS, after which this may, or may
not, be relevant.)

   NON_TIMEOUT is the maximum period of delay between sending sets of
   MAX_PAYLOADS payloads for the same body.  By default, NON_TIMEOUT has
   the same value as ACK_TIMEOUT (Section 4.8 of [RFC7252]).

Presumably the use of “maximum” means it’s fine to delay zero seconds (or any
value lower than NON_TIMEOUT).

11. General

By the way, none of the timers specify jitter (and indeed, if read literally,
jitter would be forbidden). Is this intentional?

12. Section 7.2

   If the CoAP peer reports at least one payload has not arrived for
   each body for at least a 24 hour period and it is known that there
   are no other network issues over that period, then the value of
   MAX_PAYLOADS can be reduced by 1 at a time (to a minimum of 1) and
   the situation re-evaluated for another 24 hour period until there is
   no report of missing payloads under normal operating conditions.  The
   newly derived value for MAX_PAYLOADS should be used for both ends of
   this particular CoAP peer link.  Note that the CoAP peer will not
   know about the MAX_PAYLOADS change until it is reconfigured.  As a
   consequence of the two peers having different MAX_PAYLOADS values, a
   peer may continue indicate that there are some missing payloads as
   all of its MAX_PAYLOADS set may not have arrived.  How the two peer
   values for MAX_PAYLOADS are synchronized is out of the scope.

I take it this is just thrown in here as an operational suggestion? It’s not
specifying protocol, right? It seems a little misplaced, if so.

13. Section 10.1.3

(This comment relates to the aside in my DISCUSS. You may want to skip over it
until we’ve resolved the DISCUSS, after which this may, or may not, be
relevant.)

Why doesn’t the server request 1,9,10 in one go? Since its rxmt request is
triggered by rx of 11, one would think it could infer 10 had been lost.

14. Section 10.1.4 (also 10.3.3)

(This comment relates to the aside in my DISCUSS. You may want to skip over it
until we’ve resolved the DISCUSS, after which this may, or may not, be
relevant.)

Why doesn’t reception of a message with More=0 trigger the server to request
retransmission of the missing block? Why does it have to wait for timeout?

15. Section 10.2.3

(This comment relates to my DISCUSS. You may want to skip over it until we’ve
resolved the DISCUSS, after which this may, or may not, be relevant.)

Why doesn’t reception of QB2:10/0/1024 trigger the client to request
retransmission? Why does it have to wait for timeout? Similarly reception of
QB2:9/1/1024 later in the example.

16. Section 10.2.4

Since MAX_PAYLOADS is 10, why does the example say “MAX_PAYLOADS has been
reached” after payloads 2-9 have been retransmitted? That’s only 8 payloads.

--
 I do have a couple new comments raised during my review of the changes in
 version 12:

(draft-ietf-core-new-block-12)

17. Section 1:

  This document introduces the CoAP Q-Block1 and Q-Block2 Options which
  allow block-wise transfer to work with series of Non-confirmable
  messages, instead of lock-stepping using Confirmable messages
  (Section 3).  In other words, this document provides a missing piece
  of [RFC7959], namely the support of block-wise transfer using Non-
  confirmable where an entire body of data can be transmitted without
  the requirement for an acknowledgement (but recovery is available
  should it be needed).

As far as I can tell the spec does not really remove the requirement for
acknowledgement, it just amortizes the acknowledgements by only sending them
every MAX_PAYLOADS_SET. Response Code 2.31 is essentially an acknowledgement,
and it gets sent that frequently, right? There’s also (if I recall correctly)
some flavor of acknowledgement that is sent when the entire body has been
transferred. So, I think the new paragraph isn’t accurate.

This observation also applies to this claimed benefit in §3:

  o  They support sending an entire body using NON messages without
     requiring an intermediate response from the peer.

Response Code 2.31 is exactly an intermediate response. I guess maybe your
focus is that if the intermediate response isn’t received, transmission
continues, albeit more slowly than it would otherwise, and unreliably too, so
in that sense the responses aren’t “required”. I think this requires awfully
close parsing of the word “required”, though.

18. Section 2:

  MAX_PAYLOADS_SET is the set of blocks identified by block numbers
  that, when divided by MAX_PAYLOADS, they have the same numeric

Remove “they”

  result.  For example, if MAX_PAYLOADS is set to '10', a
  MAX_PAYLOADS_SET could be blocks #0 to #9, #10 to #19, etc.
  Depending on the data size, the MAX_PAYLOADS_SET may not comprise all
  the MAX_PAYLOADS blocks.

I don’t understand the last sentence ("Depending on the data size, the
MAX_PAYLOADS_SET may not comprise all the MAX_PAYLOADS blocks.”) Are you trying
to say that if the body size isn’t evenly divisible by MAX_PAYLOADS then the
final MAX_PAYLOADS_SET will have fewer than MAX_PAYLOADS blocks in it?

(I do think this change, to introduce the term MAX_PAYLOADS_SET, is generally
helpful; thanks.)

[core] John Scudder's No Objection on draft-ietf-… John Scudder via Datatracker