Re: [core] WG Last Call on draft-ietf-core-new-block

Christian Amsüss <christian@amsuess.com> Wed, 23 December 2020 05:01 UTC

Return-Path: <christian@amsuess.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 841F63A0544; Tue, 22 Dec 2020 21:01:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aTSH03xlNQX5; Tue, 22 Dec 2020 21:01:00 -0800 (PST)
Received: from prometheus.amsuess.com (alt.prometheus.amsuess.com [IPv6:2a01:4f8:190:3064::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E31C23A048B; Tue, 22 Dec 2020 21:00:57 -0800 (PST)
Received: from poseidon-mailhub.amsuess.com (095129206250.cust.akis.net [95.129.206.250]) by prometheus.amsuess.com (Postfix) with ESMTPS id 32021407C1; Wed, 23 Dec 2020 06:00:55 +0100 (CET)
Received: from poseidon-mailbox.amsuess.com (hermes.amsuess.com [10.13.13.254]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id 7F025AB; Wed, 23 Dec 2020 06:00:47 +0100 (CET)
Received: from hephaistos.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:bd73:2e62:80f9:8152]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id BCEDA63; Wed, 23 Dec 2020 06:00:44 +0100 (CET)
Received: (nullmailer pid 2758466 invoked by uid 1000); Wed, 23 Dec 2020 05:00:42 -0000
Date: Wed, 23 Dec 2020 06:00:42 +0100
From: Christian Amsüss <christian@amsuess.com>
To: draft-ietf-core-new-block@ietf.org
Cc: "core@ietf.org WG (core@ietf.org)" <core@ietf.org>, dots@ietf.org
Message-ID: <X+LO+lfQLd73LMRM@hephaistos.amsuess.com>
References: <263d6f84-4a57-2085-288f-068b1d78f7ae@ri.se>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="EQlrsLUWml5Hqwie"
Content-Disposition: inline
In-Reply-To: <263d6f84-4a57-2085-288f-068b1d78f7ae@ri.se>
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/YydfrhQ3wQgdj-LrvWAIVdABHxQ>
Subject: Re: [core] WG Last Call on draft-ietf-core-new-block
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Dec 2020 05:01:04 -0000

Hello new-block authors, hello CoRE,

I've finally managed to have another look at the document (and it's
still the 22nd *somewhere*...).

My over-all summary is that while this document has matured quite a bit
(and turned out better than I had expected for the special-purpose
blocks it is), I think that the topic of response suppression and
congestion control sections still have to be sharpened, and the examples
could pick a more realistic balance between going all-NON and all-CON
(or explain why that's something that wouldn't be done even though the
text reads like that's what's expected to be used but maybe it's just
me).


Except for the first item (which is coming back to the notes for the
WGLC), all should be more or less sequential in the document (with
forward/back references as needed).

* On the downref to No-Response: I don't see the downref as too big an
  issue. It's a bit odd that that document didn't go through the CoRE
  WG, but my impression is that it's widely used (on a "when the library
  I maintain broke it people filed issues" level), and both OSCORE and
  groupcomm-bis (which is to replace RFC7390) reference it. It's an
  informative reference because they don't mechanically depend on it,
  but it shows how well No-Response has been taken up.

  (Frankly, I'd support a retroactive adoption. I don't suppose we can
  do that with "adults"?).

  From how I understand Q-Block to be intended to work, No-Response does
  not cover all of the behavior (as it's timeout based); still, that
  document lays good groundwork for what's done here in a rather precise
  way (see comment on "For Confirmable transmission, the server MUST
  continue" later on) where this document needs the examples to be fully
  understood.

  Based on the later text on MAX_TRANSMIT_SPAN, why not at least
  acknowledge the equivalence? Maybe like this:

  > The behavior of endpoints following this draft can equivalently b
  > described in terms of the No-Response option {{?RFC7967}}:
  >
  > For a message with a Q-Block1 option with M=1 that is followed by
  > another message on the same Request-Tag within MAX_TRANSMIT_SPAN [
  > or MAX_BLOCK_JITTER, see later ], the default value for No-Response
  > is 8 (suppressing the 4.xx code that would be returned due to the
  > incomplete request); an explicitly set No-Response option would
  > override that."

  although I'd still advocate just using No-Response for the whole
  definition even if No-Response is not typically expressed.

  (I'm not really sure what the correct response would be for an
  individual non-terminal request if it were to be answered due to an
  explicit No-Response:0 -- might be 2.31 Continue that's just not
  used in this specification because it's always suppressed? I'll come
  back to that -- but then it'd rather be No-Response: 10).

* intro: WebSockets (capitalization)

* The list of pros and cons (with the cons being almost trivial) does
  not explain to the reader why these are not a replacement; I suggest
  to add:

  * The Q-Block options do not support stateless operation / random
    access.

  * Proxying of Q-Block is limited to caching full representations.

  (The latter might be mitigated by additional text around caching, but
  I doubt it's worth the effort given it's not part of the use case).

* "compromises of": I don't understand that sentence.

* "the asymmetrical packet loss is not a benefit here": It never is;
  what is meant here?

* "Updated CoAP Response Code": "This document updates" sounds like a
  formal update to RFC7959, which it neither is nor needs to be.
  Phrasing it along the lines of "adds a media type that can be used
  with 4.08" would ease that.

* "Only C and U columns are marked": "The Q-Block1 option is critical, and
  unsafe for proxies to forward" would be easier to read -- which
  checkboxes are marked is visible alrady from the table. (Or just leave
  it for the later paragraphs that say that in more detail).

* "Q-Block1 Option is useful with": ... and FETCH. (Also in "Using the
  Q-Block1 option" first paragraph).

* "Is opaque in nature, but it is RECOMMENDED" on being a counter and
  starting off random: Like other similar suggestions (ETag), this is
  implementation guidance level and not required for interoperability.
  Maybe phrase this the same way as the recommendations on tokens?

* "For Confirmable transmission, the server MUST continue": This reads
  like new-block could change anything about that, when all it does is
  do things on the response level. [1] has a good note on that
  separation topic.

  [1]: https://tools.ietf.org/html/rfc7967#section-2

* "If the client transmits a new body of data with a new Request-Tag
  to": Processing parallel requests is something Request-Tag opens up. I
  don't see why there's a MUST to that; the server certainly MAY drop
  the old request, but it may just as well process them in parallel. 

* "If the server receives a duplicate block with the same Request-Tag":
  Why? Being silent is the default on nonterminal blocks alredy, but in
  a situation like figure 5 if the 2.04 is lost, that rule would make
  it impossible for the client to ever get a successful response.

  A better rule here may be to say that it processes it all the same
  (and if the payload is distinct from the first transmission's payload,
  it should err out.)

* "If the server receives multiple requests (implied or otherwise) for
  the same block, it MUST only send back one instance of that block.":
  This might be read as "ever" rather than "per incoming request", where
  probably the latter is meant.

* "The ETag Option MUST NOT be used": This is more a factural than a
  normative statement; it *can* not be used there as the server would
  respond thusly. It may be used, but then that indicates that the
  client is trying to verify a freshness. (However, the client should
  not *start* sending an ETag once it learned the current resource's
  ETag when still attempting to pull out more blocks, but that's also not
  a normative requirement but a consequence of those two requests not
  being matchable any more.)

* "then the client SHOULD drop all the payloads for the current body":
  "Drop" is overly prescriptive; the client may well keep them, but
  just can't consider them fresh any more. (If the client has ample
  caching abilities, they might come in handy if the resource goes back
  to that ETag state). Same for later "the client MUST remove any
  partially received".

* "For Confirmable transmission, the client SHOULD continue to": As
  above in the other direction, that's not news.

* "If there is insufficient space to create a response PDU": I don't
  understand what that means. (Where are request options reflected
  back?).

* "If the client requests missing blocks, this is treated as a new
   request.": I don't think the client should even make these follow-up
   requests Observe, as it already has an ongoing observation. They'd be
   sent on a different token too, so setting Observe would be opening
   observation up on that token, which AFAIU is not intended. (Figure 7
   example looks good to me in that respect.)

   (It may make sense to ask the client to keep Observe to make the
   requests matchable just for the sake of staying in atomic-request
   mode. Either way, the server should then not accept that observation
   as it's not for a block 0.)

* "First is CBOR encoded Request-Tag": Why? Each 4.08 response can be
  matched by the token to a unique request that already had a
  Request-Tag, and the client needs to have kept that token around
  matched to the transfer to make sense of it.

  No need to move that value around between subsystems, and just
  dropping it from here would also remove the need for the "If the
  client does not recognize the Request-Tag" clause (which would
  otherwise need clarification as to what it'd mean if it recognizes it
  but it doesn't match what the request was for).

* "limit the array count to 23 (Undefined value)": 23 is the maximum
  length of a zero-byte length indication, not indefinite-length (31).
  Both using 23 and 31 here makes sense (up to 23 to have definite
  length that can be updated in-place, or exceeding that switch to
  indefinite length if they still fit), but the paragraph seems to be
  mixing them up.

* "Each new request MUST use a unique token": Like above, this is
  stating something that's not intended to be changed.

Congestion Control:

* "Each NON 4.08 (Request Entity Incomplete) Response Codes is subjected
   to PROBING_RATE.": That is unexpected here. At most one such
   response is sent to each request message, so why is additional
   congestion control needed?

   On the other hand, *ever* NON request is subject to PROBING_RATE, so
   why point out the body of blocks and "GET or similar" particularly?

* "a delay is introduced of ACK_TIMEOUT": As I understand MAX_PAYLOADS,
  this is (rather implicitly) introduced as the package count up to
  which it is OK to exceed PROBING_RATE temporarily (but after that it
  kicks in all the harder by requiring to wait until complete-sent-bytes
  over PROBING_RATE has expired). If that holds, at that time a much
  larger delay than just ACK_TIMEOUT is needed to get a response from
  the server: About 3 hours (see later note on parameters).

  This is the crucial point in the document, and for it a recommendation
  alone is not good enough. The protocol can be run with a vastly
  increased PROBING_RATE (however externally determined) and from the
  point of MAX_PAYLOADS just observe it. Or it has to get feedback from
  the server; a single 4.08 is probably enough to kick off another
  vollley of blocks. (How many? MAX_PAYLOADS for every response?).
  Both can be permitted, but just waiting ACK_TIMEOUT isn't doing any
  good.

* "For NON transmissions": This seems to imply that the full exchange of
  a body is either NON or CON; I don't see where that'd come from. I'd
  have expected to read something like "Each individual request can be
  NON or CON independent of the others. In particular, it can be
  convenient to send the ultimate payload...".

* "If a Confirmable packet is used, then the transmitting peer MUST wait
  for the ACK": Why? A NSTART > 1 would give it leisure to still
  transmit.

* General on congestion control: It may help implementors if this were
  split up into newly introduced rules and concepts (that is,
  MAX_PAYLOADS and the answer to whether you may send MAX_PAYLOADS en
  block again after having only even one response from the last round,
  and probably the recommended parameters of the "Also on parameters"
  comment), and another subsection on how Q-Block behaves well when
  observing these.

Caching:

* "are not part of the cache key": How about "are removed as part of the
  block assembly and thus do not reach the cache"?

* "When the next client completes building the body": If the proxy
  chooses not to let them happen in parallel (which it may, see above on
  parallel requests, although the server might still not support it and
  cancel one of them), why bother letting the first finish just to abort
  it? (IOW: If the proxy does not intend to see both through, which it
  could if it held back the second until the first is through on the
  uplink, it could just as well err out of one of them early, but it may
  also rather see both through.)

* Examples:

  * Figure 5: The ... between e3 request and response indicate the
    MAX_TRANSMIT_SPAN before sending the 4.08 response. I suppose there
    should be the same kind of delay between the failed e5 transmission
    and the e4 response.

  * If the second burst had 3 requests out of which 2 made it, is there
    any guidance for which of them the 4.08 would come back on? (In the
    end, none of them is terminal).

  * If that e4 response gets lost, does the whole mechanism recover from
    it in any way?

    Generally, the all-NON and all-CON examples don't look to me like
    what I'd be doing with this spec; the mixed "a CON every
    MAX_PAYLOADS" appears much more realistic.

  * Figure X: The request ahs M unset and thus indicats a request for
    just that block. If more than one is expected, it should say
    QB2:0/1/1024.

* New Content Format: I think this needs a media type registration to go
  with it first; based on that, a content format can be registered.

* General on MAX_TRANSMIT_SPAN and other timing parameters: I'm not sure
  they're 1:! applicable here. For example, MAX_TRANSMIT_SPAN is defined
  in terms of reliable transmission, but used for NONs as well. (So is
  the alternative ot 2x ACK_TIMEOUT).

  For the purpose of delaying a 4.08 or a follow-up GET, it may make
  more sense to define a new parameter based on MAX_LATENCY and the time
  it takes the sender to pump out the options (which I don't think we
  have a good factor for, but may even be negligible here).

  Could read like this:

  > The timing parameter MAX_BLOCK_JITTER is introduced, and by default
  > takes a value of MAX_LATENCY + MAX_PAYLOADS * MTU / BANDWIDTH.
  >
  > With Q-Block2, a client can ask for any missing blocks after not
  > having received any further response for the duration of
  > MAX_BLOCK_JITTER.
  >
  > With Q-Block1, a server holds off any response for MAX_BLOCK_JITTER
  > unless all blocks have been received. Only then it evaluates whether
  > to respond with a 2.0x code, a 4.08 with payload, or not at all
  > (because it responded to a later request).

  This also brings me back to the earlier matter of 2.31: What is a
  server supposed to send when no packages were lost, but it's pasing
  the timeout and wants to help the client flush out more packages by
  confirming something? It says 4.08 in 3.3, but it's not like there's a
  hole in the contiguous range. Does it need to send 4.08 enumerating
  all (or at least some) numbers between the first unreceived and what's
  indicated by Size1? Or can it just send 2.31 and the client knows all
  it needs to know b/c the response came to the largest block that was
  sent and 2.31 indicates that everything is good up to that point?

* Also on parameters: This document is describing flow control stuff
  around a situation CoAP was not originally designed for. Wouldn't it
  make sense to include a set of parameters (PROBING_RATE, MAX_LATENCY,
  ACK_TIMEOUT) that's suitable for the DOTS use case? I doubt that
  PROBING_RATE will be left to 1 byte/second for any DOTS application
  using this (for sending 10KiB in the initial 10-package MAX_PAYLOADS
  burst would mark that connection as unusable for about 3 hours if they
  all get lost), so better give justifiable numbers here than rely on
  implemnetors to come up with unreviewed numbers or disregard
  PROBING_RATE altogether.

  I don't know if it needs additional justification, but an increased
  N_START may be justifiable there.

* Somewhere (never comes up but I think it should): When CONs are used,
  a 4.08 (or 2.31?) response to a later request can indicate to the
  client that an earlier CON request has been processed successfully. If
  the client can match that up (and it should be able to), then it can
  (and should) cancel that particular CON request.

Best regards
Christian

-- 
There's always a bigger fish.
  -- Qui-Gon Jinn