[core] John Scudder's Discuss on draft-ietf-core-new-block-11: (with DISCUSS and COMMENT)
John Scudder via Datatracker <noreply@ietf.org> Thu, 06 May 2021 00:41 UTC
Return-Path: <noreply@ietf.org>
X-Original-To: core@ietf.org
Delivered-To: core@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id BBDB53A1B0C; Wed, 5 May 2021 17:41:32 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: John Scudder via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-core-new-block@ietf.org, core-chairs@ietf.org, core@ietf.org, marco.tiloca@ri.se, marco.tiloca@ri.se
X-Test-IDTracker: no
X-IETF-IDTracker: 7.28.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: John Scudder <jgs@juniper.net>
Message-ID: <162026169267.30008.8195219304146866165@ietfa.amsl.com>
Date: Wed, 05 May 2021 17:41:32 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/yB3vwuumabvv-ZepbU4AW9DP5Y8>
Subject: [core] John Scudder's Discuss on draft-ietf-core-new-block-11: (with DISCUSS and COMMENT)
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.29
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 May 2021 00:41:33 -0000
John Scudder has entered the following ballot position for draft-ietf-core-new-block-11: Discuss When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-core-new-block/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- For the most part I found this document relatively easy to follow, considering my complete lack of background in CoAP. However, despite a concerted effort I have not been able to nail down with confidence what the intended semantics of several of your timeouts are, notably NON_RECEIVE_TIMEOUT. Some of the text (for example, §4.4) implies that the timeout is an upper bound on how long an implementation should wait before declaring a block to have been lost (“The client SHOULD wait for up to NON_RECEIVE_TIMEOUT”). At the very least, this is imprecise because the timeout increases exponentially with repeated timeouts — but this is a relatively minor matter, discussed further in my comments. Later, in §7.2, you say that expiry of the timeout is not the only trigger for a 4.08 response: It is likely that the client will start transmitting the next set of MAX_PAYLOADS payloads before the server times out on waiting for the last of the previous MAX_PAYLOADS payloads. On receipt of the first payload from the new set of MAX_PAYLOADS payloads, the server SHOULD send a 4.08 (Request Entity Incomplete) Response Code indicating any missing payloads from any previous MAX_PAYLOADS payloads. It makes sense to me that you use this additional trigger. At this point in my reading of the spec, my understanding of the retransmission algorithm was that a 4.08 should be sent when either a payload is received from a new set of MAX_PAYLOADS, or NON_RECEIVE_TIMEOUT expires. But then I got to the example in 10.2.3, which shows the client waiting for the expiration of NON_RECEIVE_TIMEOUT even though it has received the first of a new set of MAX_PAYLOADS, and I concluded that either I’ve missed something basic, or the document is internally inconsistent. As an aside, I’m also unclear as to why the only trigger you specify for sending a 4.08 is the arrival of the first of a new MAX_PAYLOADS flight. Other possible triggers I noticed include a gap in the sequence, and reception of a payload with More=0. Some of these issues are repeated in my comments, below — I’ve noted those in the comment. Possibly in addressing this DISCUSS we’ll clear up some of those comments too. ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- Comments: (draft-ietf-core-new-block-11) 1. Section 3.2 This mechanism is not intended for general CoAP usage, and any use outside the intended use case should be carefully weighed against the loss of interoperability with generic CoAP applications. I’m curious: is the only reason the mechanism isn’t intended for general usage, the fact some implementations won’t support it? Or does it have other deficiencies that also make it unsuitable? 2. Section 4.1 Q-Block2 Option is useful with GET, POST, PUT, FETCH, PATCH, and iPATCH requests and their payload-bearing responses (2.01, 2.02, 2.03, 2.04, and 2.05) (Section 5.5 of [RFC7252]). I found the list of codes incomprehensible on first encountering it, since the concept of response codes hadn’t been introduced yet. I do understand that the document assumes familiarity with CoAP; nonetheless for basic clarity I think this should say “(response codes 2.01, 2.02…”. Additionally, the reference to RFC 7252 §5.5 doesn’t seem to be especially germane? By the way, is 2.03 indeed a payload-bearing response? The only other place the spec touches on it is in §4.4, which says “the server could respond with a 2.03 (Valid) response with no payload”. 3. Section 4.1 To indicate support for Q-Block2 responses, the CoAP client MUST include the Q-Block2 Option in a GET or similar request (FETCH, for example), the Q-Block2 Option in a PUT or similar request, or the Q-Block1 Option in a PUT or similar request so that the server knows that the client supports this Q-Block functionality should it need to send back a body that spans multiple payloads. Otherwise, the server would use the Block2 Option (if supported) to send back a message body that is too large to fit into a single IP packet [RFC7959]. Is this paragraph really supposed to mention both Q-Block2 and Q-Block1? In particular, I’m confused by the mention of both of these in relation to PUT. 4. Section 4.1 The Q-Block1 and Q-Block2 Options are unsafe to forward. That is, a CoAP proxy that does not understand the Q-Block1 (or Q-Block2) Option MUST reject the request or response that uses either option. Presumably (hopefully) this is simply describing the behavior of existing spec-compliant proxies when processing the new messages. As such, is the MUST appropriate? I would think not. 5. Section 4.3 body. Note that the last received payload may not be the one with the highest block number. “Might not” would be less ambiguous than “may not”. 6. Section 4.4 (also two places in §4.3) (This comment rehashes, in more detail, the difficulty explained in my DISCUSS. You may want to skip over it until we’ve resolved the DISCUSS, after which this may, or may not, be relevant.) The client SHOULD wait for up to NON_RECEIVE_TIMEOUT (Section 7.2) I read this as meaning the client should wait for as little as zero, or as long as NON_RECEIVE_TIMEOUT — that’s my understanding of “up to”. Is that the intended meaning? If it is, I think it’s worth writing out as I’ve done, for clarity. If it’s not, it definitely needs to be fixed. There’s a similar issue with “up to NON_PARTIAL_TIMEOUT” later in the section. Referring ahead to Section 7.2 muddies the waters further. Even though the text quoted above says NON_RECEIVE_TIMEOUT is an upper limit on how long to wait, §7.2 says it’s a lower limit instead... maybe? From §7.2: NON_RECEIVE_TIMEOUT is the initial maximum time to wait for a missing “Maximum”, ok great, that means “upper bound” and so lines up with §4.4 although the “initial” is surprising since §4.4 doesn’t say anything about the upper limit increasing. It continues: payload before requesting retransmission for the first time. Every time the missing payload is re-requested, the time to wait value doubles. The time to wait is calculated as: Time-to-Wait = NON_RECEIVE_TIMEOUT * (2 ** (Re-Request-Count - 1)) But this part says it’s (a) an exact time-to-wait, not a “maximum”, and (b) it says it increases exponentially, so NON_RECEIVE_TIMEOUT isn’t a maximum at all, but a minimum. This later text in §7.2 implies that perhaps the problem in the above passages is the word “maximum”, and it should simply be deleted: For the server receiving NON Q-Block1 requests, it SHOULD send back a 2.31 (Continue) Response Code on receipt of all of the MAX_PAYLOADS payloads to prevent the client unnecessarily delaying. If not all of the MAX_PAYLOADS payloads were received, the server SHOULD delay for NON_RECEIVE_TIMEOUT (exponentially scaled based on the repeat request count for a payload) before sending the 4.08 (Request Entity Incomplete) Response Code for the missing payload(s). Similarly “up to” in the quote that began this comment should be “at least”. Whether you adopt those suggestions or not, it seems as though all this needs to be rewritten with careful attention to conveying what the desired behavior is. But the plot thickens. Later in §7.2 we have It is likely that the client will start transmitting the next set of MAX_PAYLOADS payloads before the server times out on waiting for the last of the previous MAX_PAYLOADS payloads. On receipt of the first payload from the new set of MAX_PAYLOADS payloads, the server SHOULD send a 4.08 (Request Entity Incomplete) Response Code indicating any missing payloads from any previous MAX_PAYLOADS payloads. The point being that the retransmission request can be triggered by an event other than timer expiration. So in that sense, “maximum” is right — it provides an upper bound on how long to wait before requesting a retransmission — but in another sense it’s wrong because the exponential increase is applied to it. I think the word “maximum” is trying to do too much work, and more words are probably required in order to make this clear. I also think the problem is exacerbated by the fact both §4.4 and §7.2 are talking normatively about how to use NON_RECEIVE_TIMEOUT. It seems as though the main description is found in §7.2, and some confusion would be avoided by making §4.4 less specific, and simply referring forward to §7.2. And, as noted in my DISCUSS, example 10.2.3 muddies the waters still further since it illustrates yet another behavior. 7. Section 4.4 The client SHOULD wait for up to NON_RECEIVE_TIMEOUT (Section 7.2) after the last received payload for NON payloads before issuing a GET, POST, PUT, FETCH, PATCH, or iPATCH request that contains one or more Q-Block2 Options that define the missing blocks with the M bit unset. The client MAY set the M bit to request this and later blocks from this MAX_PAYLOADS set. Further considerations related to the transmission timing for missing requests are discussed in Section 7.2. I find this whole paragraph pretty confusing with the dueling SHOULD and MAY, where it appears the SHOULD might be doing two jobs at once. I *think* your intent is something like the following? “The client SHOULD wait as specified in Section 7.2 for NON payloads before requesting retransmission of any missing blocks. Retransmission is requested by issuing a GET, POST, PUT, FETCH, PATCH, or iPATCH request that contains one or more Q-Block2 Options that define the missing block(s). Generally the M bit on the Q-Block option(s) SHOULD be unset, although the M bit MAY be set to request this and later blocks from this MAX_PAYLOADS set, see Section 10.2.4 for an example of this in operation.” 8. Section 5 If the size of the 4.08 (Request Entity Incomplete) response packet is larger than that defined by Section 4.6 [RFC7252], then the number of missing blocks MUST be limited so that the response can fit into a single packet. If this is the case, then the server can send Suggestion: “then the number of missing blocks reported MUST...” (The thing being limited is not the actual number of missing blocks. You’re limiting the number you report on.) 9. Section 7.1 It is implementation specific as to whether there should be any further requests for missing data as there will have been significant transmission failure as individual payloads will have failed after MAX_TRANSMIT_SPAN. This paragraph seems as though it’s a non-sequitur. It just doesn’t make sense to me. :-( 10. Section 7.2 (This comment relates to the difficulty explained in my DISCUSS. You may want to skip over it until we’ve resolved the DISCUSS, after which this may, or may not, be relevant.) NON_TIMEOUT is the maximum period of delay between sending sets of MAX_PAYLOADS payloads for the same body. By default, NON_TIMEOUT has the same value as ACK_TIMEOUT (Section 4.8 of [RFC7252]). Presumably the use of “maximum” means it’s fine to delay zero seconds (or any value lower than NON_TIMEOUT). 11. General By the way, none of the timers specify jitter (and indeed, if read literally, jitter would be forbidden). Is this intentional? 12. Section 7.2 If the CoAP peer reports at least one payload has not arrived for each body for at least a 24 hour period and it is known that there are no other network issues over that period, then the value of MAX_PAYLOADS can be reduced by 1 at a time (to a minimum of 1) and the situation re-evaluated for another 24 hour period until there is no report of missing payloads under normal operating conditions. The newly derived value for MAX_PAYLOADS should be used for both ends of this particular CoAP peer link. Note that the CoAP peer will not know about the MAX_PAYLOADS change until it is reconfigured. As a consequence of the two peers having different MAX_PAYLOADS values, a peer may continue indicate that there are some missing payloads as all of its MAX_PAYLOADS set may not have arrived. How the two peer values for MAX_PAYLOADS are synchronized is out of the scope. I take it this is just thrown in here as an operational suggestion? It’s not specifying protocol, right? It seems a little misplaced, if so. 13. Section 10.1.3 (This comment relates to the aside in my DISCUSS. You may want to skip over it until we’ve resolved the DISCUSS, after which this may, or may not, be relevant.) Why doesn’t the server request 1,9,10 in one go? Since its rxmt request is triggered by rx of 11, one would think it could infer 10 had been lost. 14. Section 10.1.4 (also 10.3.3) (This comment relates to the aside in my DISCUSS. You may want to skip over it until we’ve resolved the DISCUSS, after which this may, or may not, be relevant.) Why doesn’t reception of a message with More=0 trigger the server to request retransmission of the missing block? Why does it have to wait for timeout? 15. Section 10.2.3 (This comment relates to my DISCUSS. You may want to skip over it until we’ve resolved the DISCUSS, after which this may, or may not, be relevant.) Why doesn’t reception of QB2:10/0/1024 trigger the client to request retransmission? Why does it have to wait for timeout? Similarly reception of QB2:9/1/1024 later in the example. 16. Section 10.2.4 Since MAX_PAYLOADS is 10, why does the example say “MAX_PAYLOADS has been reached” after payloads 2-9 have been retransmitted? That’s only 8 payloads.
- [core] John Scudder's Discuss on draft-ietf-core-… John Scudder via Datatracker
- Re: [core] John Scudder's Discuss on draft-ietf-c… mohamed.boucadair
- Re: [core] John Scudder's Discuss on draft-ietf-c… John Scudder
- Re: [core] John Scudder's Discuss on draft-ietf-c… mohamed.boucadair
- Re: [core] John Scudder's Discuss on draft-ietf-c… John Scudder
- Re: [core] John Scudder's Discuss on draft-ietf-c… mohamed.boucadair
- Re: [core] John Scudder's Discuss on draft-ietf-c… John Scudder
- Re: [core] John Scudder's Discuss on draft-ietf-c… Martin Duke
- Re: [core] John Scudder's Discuss on draft-ietf-c… mohamed.boucadair
- Re: [core] John Scudder's Discuss on draft-ietf-c… Martin Duke
- Re: [core] John Scudder's Discuss on draft-ietf-c… supjps-ietf
- Re: [core] John Scudder's Discuss on draft-ietf-c… Martin Duke
- Re: [core] John Scudder's Discuss on draft-ietf-c… mohamed.boucadair