[core] Benjamin Kaduk's Discuss on draft-ietf-core-new-block-11: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 06 May 2021 01:58 UTC
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-core-new-block@ietf.org, core-chairs@ietf.org, core@ietf.org, marco.tiloca@ri.se, marco.tiloca@ri.se
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <162026630680.17506.6477675472375470197@ietfa.amsl.com>
Date: Wed, 05 May 2021 18:58:27 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/cTz5FQMhM5_4kWAuQ0fgFtk4xgE>
Subject: [core] Benjamin Kaduk's Discuss on draft-ietf-core-new-block-11: (with DISCUSS and COMMENT)
Benjamin Kaduk has entered the following ballot position for
draft-ietf-core-new-block-11: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-core-new-block/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I have a concern about the MAX_PAYLOADS congestion-control parameter.
In Section 7.2 it is stated that both endpoints only SHOULD have the
same value.  I don't see how this can be anything less than MUST, given
that we attribute semantics to whether NUM modulo MAX_PAYLOADS is zero or
non-zero in the processing of the Q-Block2 option.  If the endpoints
disagree on the value of MAX_PAYLOADS they will disagree on the
semantics of Q-Block2 -- how can that be interoperable?
(Being able to negotiate the value does not seem inherently problematic,
but since it is relevant for protocol semantics it seems like the value
must be identical on both endpoints.)
This seems especially important to have clarity on given that the
current specification allows for MAX_PAYLOADS to be decreased at runtime
in response to congestion feedback over a 24-hour period, with no
synchronization between peers provided ("Note that the CoAP peer will
not know about the MAX_PAYLOADS change until it is reconfigured".)


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I made some editorial suggestions in a github pull request at
https://github.com/core-wg/new-block/pull/21 .  It seems that there are
now some merge conflicts; I cannot promise to have availability to try
to resolve them particularly quickly, but I can do so "eventually" if
needed.

Section 1

   There is a requirement for these blocks of data to be transmitted at
   higher rates under network conditions where there may be asymmetrical
   transient packet loss (i.e., responses may get dropped).  An example
   is when a network is subject to a Distributed Denial of Service
   (DDoS) attack and there is a need for DDoS mitigation agents relying
   upon CoAP to communicate with each other (e.g.,
   [RFC8782][I-D.ietf-dots-telemetry]).  As a reminder, [RFC7959]

I suppose the RFC Editor will do the right thing about referencing 8782
vs 8782bis ... and I am overdue in following up on the IETF LC results
for the latter :(

Section 2

We currently introduce the concept of MAX_PAYLOADs by implicit use in a
few places before it is actually given a proper definition.  I wonder if
mentioning here that it is used to group a batch of blocks would help
the reader.

Section 3

   o  They support sending an entire body using Non-confirmable (NON)
      messages without requiring a response from the peer.

I put this change in my github PR, but repeating here for visibility
since I am making an assumption: I propose adding "intermediate" for
"without requiring an intermediate response from the peer".  My
understanding is that a final message indicaing successful receiption is
still used (or a selective ack in case of loss), so the contrast to RFC
7959 is in the (lack of) need for intermediate responses for each block.

   o  Mixing of NON and CON during requests/responses using Q-Block is
      not supported.

There is perhaps subtle differences across "not supported", "forbidden",
and "not defined in this specification".  Do we perhaps actually mean
"forbidden"?

Section 3.2

   (DOTS) that cannot use CON responses to handle potential packet loss
   and that support application-specific mechanisms to assess whether
   the remote peer is able to handle the messages sent by a CoAP
   endpoint (e.g., DOTS heartbeats in Section 4.7 of [RFC8782]).

Can we get greater clarity on what "able to handle" is intended to mean?
I can't tell if it's anywhere between "the transport is able to deliver
message bodies" and "the software stack implements and enables a
particular feature".

Section 4.1

   When the Content-Format Option is present together with the Q-Block1
   or Q-Block2 Option, the option applies to the body not to the payload
   (i.e., it must be the same for all payloads of the same body).

Do we have a normative requirement somewhere that the recipient track
and compare the content-format values across blocks?  If not, should we?

   Q-Block2 Option is useful with GET, POST, PUT, FETCH, PATCH, and
   iPATCH requests and their payload-bearing responses (2.01, 2.02,
   2.03, 2.04, and 2.05) (Section 5.5 of [RFC7252]).

Do we need an "e.g." in front of the list, to account for the potential
future registration of new payload-bearing response codes?

   If Q-Block1 Option is present in a request or Q-Block2 Option in a
   response (i.e., in that message to the payload of which it pertains),

Can we reword this parenthetical in a less convoluted way?  I'm not even
sure I'm parsing it properly.

   [RFC7252]).  To reliably get a rejection message, it is therefore
   REQUIRED that clients use a Confirmable message for determining
   support for Q-Block1 and Q-Block2 Options.

(I know that some other discussion happened on this mechanism, but I
forget if there are already plans to add a clarification that this is
only needed once per peer within a given set of exchanges.)

   The Q-Block2 Option is repeatable when requesting retransmission of
   missing blocks, but not otherwise.  Except that case, any request
   carrying multiple Q-Block1 (or Q-Block2) Options MUST be handled
   following the procedure specified in Section 5.4.5 of [RFC7252].

Since these are critical options, the referenced procedures involve
rejecting the message, right?  Is that important enough to note
directly?

   Note that if Q-Block1 or Q-Block2 Options are included in a packet as
   Inner options, Block1 or Block2 Options MUST NOT be included as Inner
   options.  Similarly there MUST NOT be a mix of Q-Block and Block for
   the Outer options.  [...]

(Hopefully a silly question, but do we make the analogous prohibition
against combining Q-Block and regular Block for non-OSCORE cases
anywhere?  I thought we did, but now I can't find it...)

Section 4.3

   being transferred.  The Request-Tag is opaque, the server still
   treats it as opaque but the client MUST ensure that it is unique for
   every different body of transmitted data.

(nit) the structure of this sentence seems off, to me.  I may just want
a comma after "server still treats it as opaque", but looking more
closely I might rewrite to more like "The Request-Tag is opaque to the
server, but the client MUST ensure that it is unique for every different
request body being transmitted".

      Implementation Note: It is suggested that the client treats the
      Request-Tag as an unsigned integer of 8 bytes in length.  An
      implementation may want to consider limiting this to 4 bytes to
      reduce packet overhead size.  The initial Request-Tag value should
      be randomly generated and then subsequently incremented by the
      client whenever a new body of data is being transmitted between
      peers.

In the vein of draft-gont-numeric-ids-sec-considerations, is the
increment necessarily 1 or can there be gaps?  Similarly, the risk of
information disclosure (via side channel) is reduced if the initial
random value is generated anew for each connection.  This is maybe
implied by the current text but could be stated more clearly.

   The client MUST send the payloads with the block numbers increasing,
   starting from zero, until the body is complete (subject to any
   congestion control (Section 7)).  Any missing payloads requested by
   the server must in addition be separately transmitted with increasing
   block numbers.

When I first read this, I thought that the block numbers of
retransmissions needed to continue to increase in the same sequence as
the original transmission, i.e., retransmitted blocks are assigned new
block numbers.  The examples do not bear this out (and it seems like it
would be complicated to specify clearly), so I suggest rephrasing to "in
order of increasing block number".

      If the FETCH request includes the Observe Option, then the server
      MUST use the same token as used for the initial response for
      returning any Observe triggered responses so that the client can
      match them up.

      The client should then release all of the tokens used for this
      body unless a resource is being observed.

If a resource is being observed, should the client release all the other
tokens (than the one used for the initial response)?

Also, is the "initial response" the first response for the blockwise
transfer (which might be a 2.31 or 4.08 for NON requests), or the first
one with response code 2.05?

   2.31 (Continue)

      This Response Code can be used to indicate that all of the blocks
      up to and including the Q-Block1 Option block NUM (all having the
      M bit set) have been successfully received.  The token used MUST
      be one of the tokens that were received in a request for this
      block-wise exchange.  However, it is desirable to provide the one
      used in the last received request.

Can the client release any tokens upon receipt of such a response?

   4.02 (Bad Option)

      This Response Code MUST be returned for a Confirmable request if
      the server does not support the Q-Block Options.  Note that a
      reset message must be sent in case of Non-confirmable request.

Reset only needs to be sent if the server is not ignoring the request
entirely, though, right?


%%%
The following few comments are interrelated:

      This Response Code returned with Content-Type "application/
      missing-blocks+cbor-seq" indicates that some of the payloads are
      missing and need to be resent.  The client then retransmits the
      missing payloads using the same Request-Tag, Size1 and Q-Block1 to
      specify the block NUM, SZX, and M bit as appropriate.

The new 'M' bit is "as appropriate" for the new flight of messages, or
as was sent initially?  (The examples in §10.x suggest "as was sent
initially".)

      The Request-Tag value to use is determined by taking the token in
      the 4.08 (Request Entity Incomplete) response, locating the
      matching client request, and then using its Request-Tag.

The "value to use" here seems to be indicating the value to use in the
retransmitted request...

      The token used MUST be one of the tokens that were received in a
      request for this block-wise exchange.  However, it is desirable to
      provide the one used in the last received request.  See Section 5
      for further information.

... but here the "token used" seems to be indicating the token to be
used in constructing the response that has response code 4.08.

If my understanding is correct, we really should have more clarity on
which value is "used" for which message.

Additionally, in the last quoted paragraph we refer to Section 5 for
further information, which includes a SHOULD-level requirement to
"provide the [token] used in the last received request".  It is very
surprising to have the normative requirements for behavior split across
sections in this manner.  (Or was the intent that Section 5 also use the
"desirable" wording?)
%%%

Section 4.4

   The ETag is opaque, the client still treats it as opaque but the
   server MUST ensure that it is unique for every different body of
   transmitted data.

[analogous comment as for Request-Tag]

      Implementation Note: It is suggested that the server treats the
      ETag as an unsigned integer of 8 bytes in length.  An
      implementation may want to consider limiting this to 4 bytes to
      reduce packet overhead size.  The initial ETag value should be
      randomly generated and then subsequently incremented by the server
      whenever a new body of data is being transmitted between peers.

[analogous comment as for Request-Tag]

   The client SHOULD wait for up to NON_RECEIVE_TIMEOUT (Section 7.2)
   after the last received payload for NON payloads before issuing a
   GET, POST, PUT, FETCH, PATCH, or iPATCH request that contains one or
   more Q-Block2 Options that define the missing blocks with the M bit
   unset.  The client MAY set the M bit to request this and later blocks
   from this MAX_PAYLOADS set.  Further considerations related to the
   transmission timing for missing requests are discussed in
   Section 7.2.

Does the MAY grant permission to send with M bit set prior to
NON_RECEIVE_TIMEOUT, or just permission to send with M bit set in
addition to with M bit unset (but still after the timeout)?

   For Confirmable responses, the client continues to acknowledge each
   packet.  Typically, the server acknowledges the initial request using
   an ACK with the payload, and then sends the subsequent payloads as
   CON responses.  The server will detect failure to send a packet, but
   the client can issue, after a MAX_TRANSMIT_SPAN delay, a separate
   GET, POST, PUT, FETCH, PATCH, or iPATCH for any missing blocks as
   needed.

Starting out with "for confirmable responses" implies that we're going
to separately cover non-confirmable responses later, or at some point
transition to statements of general applicability (to both confirmable
and non-confirmable responses).  Where does that happen?

   A client SHOULD maintain a partial body (missing payloads) for up to
   NON_PARTIAL_TIMEOUT (Section 7.2) or as defined by the Max-Age Option
   (or its default of 60 seconds (Section 5.6.1 of [RFC7252])),
   whichever is the less.  On release of the partial body, the client
   should then release all of the tokens used for this body unless a
   resource is being observed.

[as above, can the client release any subset of tokens in the case of
observe?]

   It is RECOMMENDED that the server maintains a cached copy of the body
   when using the Q-Block2 Option to facilitate retransmission of any
   missing payloads.

It's surprising to write that the client SHOULD but it is RECOMMENDED
that the server cache, when those two requirements keywords have an
equivalent strength per BCP 14.  Can't we used consistent terminology
for the same requirement level?

   If the server detects part way through a body transfer that the
   resource data has changed and the server is not maintaining a cached
   copy of the old data, then the transmission is terminated.  Any
   subsequent missing block requests MUST be responded to using the
   latest ETag and Size2 Option values with the updated data.

This sounds like the server starts responding "in the middle" of the new
representation, so the client would need to go back and re-request the
initial parts, possibly across multiple groups of MAX_PAYLOADS blocks.
It seems like this requirement for client behavior should be more
clearly documented somewhere.  We do go on to talk about the client
removing the stale partial body, but not about completing the new body.

Section 4.5

   For a response that uses Q-Block2, the Observe value MUST be the same
   for all the payloads of the same body.  This is different from Block2
   usage where the Observe value is only present in the first block
   (Section 3.4 of [RFC7959]).  This includes payloads transmitted
   following receipt of the 'Continue' Q-Block2 Option (Section 4.4) by
   the server.  If a missing payload is requested by a client, then both
   the request and response MUST NOT include the Observe Option.

(side note?) It seems very surprising to omit Observe from only
retransmitted payloads but keep it in all initial payload transmissions.

Section 4.6

   The Size1 or Size2 option values MUST exactly represent the size of
   the data on the body so that any missing data can easily be
   determined.

Is this MUST duplicating the behavior already specified by RFC 7959?

Section 5

   The data payload of the 4.08 (Request Entity Incomplete) response is
   encoded as a CBOR Sequence [RFC8742].  It comprises of one or more

I think we want some qualifying text that reaffirms that the behavior
being described is applicable only to the
application/missing-blocks+cbor-seq content-type case, possibly by
having the previous discussion state that "this section defines the
behavior and semantics for 4.08 responses using the new content-type."

   The Concise Data Definition Language [RFC8610] (and see Section 4.1
   [RFC8742]) for the data describing these missing blocks is as
   follows:

(Should we mention that this is only informational and that the prose
description is normative, in line with RFC 8610 being only an
informative reference?)

         ; A notional array, the elements of which are to be used
         ; in a CBOR Sequence:

(nit) Is there a reason to use a different wording than the referenced
example from RFC 8742?

Section 6

   Implementation Note:  By using 8-byte tokens, it is possible to
      easily minimize the number of tokens that have to be tracked by
      clients, by keeping the bottom 32 bits the same for the same body
      and the upper 32 bits containing the current body's request number
      (incrementing every request, including every re-transmit).  This
      allows the client to be alleviated from keeping all the per-
      request-state, e.g., in Section 3 of [RFC8974].

If we're going to introduce structure into a nominally opaque
identifier, we need to discuss the consequences of that in the security
considerations.  draft-gont-numeric-ids-sec-considerations has some
guidance in this regard.

Section 7.1

   Congestion control for CON requests and responses is specified in
   Section 4.7 of [RFC7252].  For faster transmission rates, NSTART will
   need to be increased from 1.  However, the other CON congestion
   control parameters will need to be tuned to cover this change.  [...]

I thought there had been some discussion in a different AD's ballot
thread on this text, but I can't find it now.  I'm happy to defer to the
previous discussion if I'm not just imagining it.
Anyways, I might suggest phrasing this as "if faster transmission rates
are needed, NSTART will need to be increased from 1".

   It is implementation specific as to whether there should be any
   further requests for missing data as there will have been significant
   transmission failure as individual payloads will have failed after
   MAX_TRANSMIT_SPAN.

(editorial) I don't think I can successfully parse this sentence.  There
may be a few missing words, and splitting into multiple sentences would
likely help as well.

Section 7.2

   NON_RECEIVE_TIMEOUT is the initial maximum time to wait for a missing
   payload before requesting retransmission for the first time.  Every
   time the missing payload is re-requested, the time to wait value
   doubles.  The time to wait is calculated as:

Thank you for being very clear about the exponential backoff procedure
:)

   payloads to prevent the client unnecessarily delaying.  If not all of
   the MAX_PAYLOADS payloads were received, the server SHOULD delay for
   NON_RECEIVE_TIMEOUT (exponentially scaled based on the repeat request
   count for a payload) before sending the 4.08 (Request Entity
   Incomplete) Response Code for the missing payload(s).  If this is a
   repeat for the 2.31 (Continue) response, the server SHOULD send a
   4.08 (Request Entity Incomplete) response detailing the missing
   payloads after the block number that would have been indicated in the
   2.31 (Continue).  [...]

I don't understand what "if this is a repeat for the 2.31 (Continue)
response" is intended to mean.

   The client does not need to acknowledge the receipt of the entire
   body.

Does that mean that the last group of response blocks will always be
retransmitted NON_MAX_RETRANSMIT times?

Section 10

            QB1: Q-Block1 Option values NUM/More/SZX
            QB2: Q-Block2 Option values NUM/More/SZX

What's depicted in the figure seems to be the actual block size, and not
the three-bit SZX value.

Section 10.1.3

Should we indicate somehow in Figure 6 that the 4.08 responses use the
new content-format?

Also, is there any value in indicating that there might be a race
between the client continuing to send the next set of payloads and the
initial 4.08 response?

Section 10.2.3

I don't understand why the NON_RECEIVE_TIMEOUT (client) triggers --
shouldn't the delivery of the 11th block indicate that the server thinks
it sent a full MAX_PAYLOADS group and thus a selective ACK, after
perhaps just a modest reordering delay?

Section 10.3.2

   [[MAX_PAYLOADS has been reached]]
      |     [[MAX_PAYLOADS blocks acknowledged by client using
      |       'Continue' Q-Block2]]
      +--------->| NON FETCH /path M:0x3b T:0xab QB2:10/1/1024
      |<---------+ NON 2.05 M:0x8b T:0xaa O:1334 ET=21 QB2:10/0/1024

Shouldn't the server switch to using T:0xab now?

      +--------->| NON FETCH /path M:0x3c T:0xac QB2:10/1/1024
      |<---------+ NON 2.05 M:0x96 T:0xaa O:1335 ET=22 QB2:10/0/1024

and 0xac here?

Section 10.3.3

          |<---------+ NON 2.05 M:0xa6 T:0xc6 ET=23 QB2:3/0/1024
          |   ...    |
       [[NON_RECEIVE_TIMEOUT (client) delay expires]]

Why does the client time out here (at least with the full
NON_RECEIVE_TIMEOUT); the final-message indication seems like it would
allow for an ~immediate response (delayed only for some reordering
threshold)?
[core] Benjamin Kaduk's Discuss on draft-ietf-cor… Benjamin Kaduk via Datatracker
Re: [core] Benjamin Kaduk's Discuss on draft-ietf… mohamed.boucadair
Re: [core] Benjamin Kaduk's Discuss on draft-ietf… mohamed.boucadair
Re: [core] Benjamin Kaduk's Discuss on draft-ietf… Benjamin Kaduk
Re: [core] Benjamin Kaduk's Discuss on draft-ietf… mohamed.boucadair
Re: [core] Benjamin Kaduk's Discuss on draft-ietf… Benjamin Kaduk
Re: [core] Benjamin Kaduk's Discuss on draft-ietf… mohamed.boucadair