[core] Benjamin Kaduk's Discuss on draft-ietf-core-stateless-06: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Mon, 20 April 2020 21:06 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: core@ietf.org
Delivered-To: core@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 9E3C63A0FE5; Mon, 20 Apr 2020 14:06:39 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-core-stateless@ietf.org, core-chairs@ietf.org, core@ietf.org, Carsten Bormann <cabo@tzi.org>, cabo@tzi.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.127.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <158741679923.20291.1071401061463555301@ietfa.amsl.com>
Date: Mon, 20 Apr 2020 14:06:39 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/ne6kSEmaDFQEJItOMkKKwShhJ4M>
Subject: [core] Benjamin Kaduk's Discuss on draft-ietf-core-stateless-06: (with DISCUSS and COMMENT)
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.29
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Apr 2020 21:06:40 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-core-stateless-06: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-core-stateless/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Let's discuss whether the various and sundry conditional SHOULDs in Section
3.1 are better written as conditional MUSTs (i.e., with the listed
exclusions being the only allowed exclusion).

Also, Appendix A.2 seems to show "Len (extended)" as just 0-2 bytes when
IIUC it is 0-4 bytes.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Section 2.1

   The new definition of the TKL field increases the maximum token
   length that can be represented in a message to 65804 bytes.  However,
   the maximum token length that sender and recipient implementations
   support may be shorter.  For example, a constrained node of Class 1
   [RFC7228] might support extended token lengths only up to 32 bytes.

Is there anything to say about IP MTU here?

Section 2.2.2

   In CoAP over UDP, the way a request message is rejected depends on
   the message type.  A Confirmable message with a message format error
   is rejected with a Reset message (Section 4.2 of RFC 7252).  A Non-
   confirmable message with a message format error is either rejected
   with a Reset message or just silently ignored (Section 4.3 of RFC
   7252).  It is therefore RECOMMENDED that clients use a Confirmable
   message for determining support.

When might one want to use non-confirmable messages for probing (i.e., why
is this not a MUST)?

   Since network addresses may change, a client SHOULD NOT assume that
   extended token lengths are supported by a server later than 60
   minutes after receiving a response with an extended token length.

nit: maybe "after receiving the most-recent response with [...]"?

   If a server supports extended token lengths but receives a request
   with a token of a length it is unwilling or unable to handle, it MUST
   NOT reject the message, as that would imply that extended token
   lengths are not supported at all.  Instead, if the server cannot
   handle the request at the time, it SHOULD return a 5.03 (Service
   Unavailable) response; if the server will never be able to handle
   (e.g., because the token is too large), it SHOULD return a 4.00 (Bad
   Request) response.

This is a fairly subtle way of saying that core RFC 7252
procedures/semantics are being updated; I'd suggest calling out (alongside
the other updates) the new(?) requirement for distinguishing whether an
extension is unrecognized vs. an invalid value by producing reset vs. a
distinguished error code.  (I do see that the semantics for 5.03 and 4.00
are not changing, but the use of Reset vs. error code for feature
negotiation seems important for implementors to be aware of.)

Section 3

   As servers are just expected to return any token verbatim to the
   client, this implementation strategy for clients does impact the
   interoperability of client and server implementations.  However,

nit: is this intended to be "does not impact"?
Given the subsequent sentence I might suggest "does not substantially
impact" instead, though.

Section 3.1

   o  A client SHOULD integrity protect the state information serialized
      in a token, unless processing a response does not modify state or
      cause any other significant side effects.

If the intent is that the "does not modify state" clause is the only case
when one would disregard the need for integrity protection, "MUST [...]
unless" seems more appropriate.  (I would prefer unconditional MUST and am
not sure I understand the cases where there is a need to skip integrity
protection.)

   o  Even when the serialized state is integrity protected, an attacker
      may still replay a response, making the client believe it sent the
      same request twice.  For this reason, the client SHOULD implement

(Basically the same comments about "SHOULD".)

      cause other any significant side effects.  For replay protection,
      integrity protection is REQUIRED.

I'm not entirely sure if the normative keyword is needed for effect, here;
it's simply a fact that replay protection is impossible in the absence of
integrity protection, isn't it?

   o  If processing a response without keeping request state is
      sensitive to the time elapsed since sending the request, then the
      serialized state SHOULD include freshness information (e.g., a
      timestamp).

Continuing the theme, this seems like a conditional MUST (not SHOULD).
Actually, all the rest of the SHOULDs in this section do.
This particular one should also note that the response processing needs to
actually check the timestamp and reject ones that are insufficiently fresh.
Also, integrity protection is again required for this to work.

Section 3.2

   A client that depends on support for extended token lengths
   (Section 2) from the server to avoid keeping request state SHOULD
   perform a discovery of support (Section 2.2) before it can be
   stateless.

This feels like a descriptive "needs to" rather than normative "SHOULD".
Stateless operation just isn't going to work if the server doesn't support
extended token lengths and the client needs it.

Section 3.3

   Reset messages, however.  Non-confirmable messages are therefore
   better suited.  In any case, a client still needs to keep congestion

nit: better suited for what?

   o  If a piggybacked response passes the token integrity protection
      and freshness checks, the client processes the message as
      specified in RFC 7252; otherwise, it silently discards the
      message.

It sounds like this entails discarding even the ACK portion of the
piggybacked response, which seems like it might interact oddly with the
retransmit schedule.

Section 4

Perhaps it's worth noting that this nesting of state will necessarily
increase the token size as it progresses along a chain of intermediaries?

There's also some considerations relating to how the freshness window of the
client an intermediary interact, with the client effectively being limited
to the minimum of all windows in use by client and intermediate(s) on the
path.

If the intermediary has a very long freshness window it could be tricked
into sending "replies" to addresses that it thinks are clients but may not
be any more, e.g., allowing a DoS attack to traverse a NAT or firewall.

Section 4.3

RFC 7252 doesn't really suggest that there's a protocol element that would
be set to "infinite" here; perhaps we should just say that "in this case,
the gateway cannot return such a response and as such cannot implement such
a timeout".

Section 5

With no integrity protection on the rejection of trial-and-error (section
2.2.2) it's susceptible to downgrade, IIUC even by an off-path attacker.
(I did not think too hard about whether OSCORE could protect the Resets in
question or not, though.)  It seems like such forced downgrade would have
second-order effects in causing clients to use more local state and thus be
more readily susceptible to other DoS vecros.

Also, when integrity protection is not in use, the client is susceptible to
spoofed responses that had no corresponding request -- only a very limited
subset of request/response pairs are safe to convert to "unauthenticated
server push", as that would effectivley do, and we should probably mention
that explicitly.

I'd also suggest noting that a self-encrypted state token bears significant
resemblance to a TLS self-encrypted session ticket, and reference the RFC
5077 security considerations.  (Yes, I know that RFC 8446 Obsoletes RFC
5077; it would be an informational reference only.)

This could also lead to some discussion about having in general an
appropriate amount of sanity checks on the returned token that may or may
not reflect serialized state, to limit the scope of various attacks even in
the absence of cryptographic protections.

Section 5.1

   size that need to be mitigated.  A node in the server role supporting
   extended token lengths may be vulnerable to a denial-of-service when
   an attacker (either on-path or a malicious client) sends large tokens
   to fill up the memory of the node.  Implementations need to be
   prepared to handle such messages.

This seems particularly problematic given that we disallow sending Reset in
response to too-large tokens and instead imply that it should echo the large
token in a 4.00 response.  I guess technically this is a SHOULD and not a
MUST, so there is some leeway to do something else, but what would that
"something else" be in this case?  It seems like we have a hard requirement
to do something sane with a token as large as 65804 bytes.

Section 5.2

   The use of encryption, integrity protection, and replay protection of
   serialized state is recommended, unless a careful analysis of any
   potential attacks to security and privacy is performed.  [...]

I suggest an alternative wording:

% It is generally expected that the use of encryption, integrity protection,
% and replay protection for serialized state is appropriate.  However, a
% careful analysis of any potential attacks to the security and privacy
% properties of the system might reveal that there are cases where such
% cryptographic protections do not add value in a specific case.

   a 64 bit tag is recommended, combined with a sequence number and a
   replay window.  Where encryption is not needed, HMAC-SHA-256,
   combined with a sequence number and a replay window, may be used.

Can we give guidance on sizing the replay window?
Should the HMAC-SHA-256 output be truncated akin to the truncated CCM tag?
In what cases would one want to use an absolute timestamp instead of/in
addition to a sequence-based replay window?

   guarantees are voided.  Devices with low-entropy sources -- as is
   typical with constrained devices, which incidentally happen to be a
   natural candidate for the stateless mechanism described in this

nit: "low-entropy sources" is a weird phrasing; "low-quality entropy
sources" would feel more natural to me.
Also, draft-irtf-cfrg-randomness-improvements may be of interest to at least
some such devices.

   provides the above uniqueness guarantee.  Additionally, since it can
   be difficult to use AES-CCM securely when using statically configured
   keys, implementations should use automated key management [RFC4107].

This is BCP 107, so I think we could use stronger language than "should
use".  Also we should cite it as the BCP.

Section 6.1

Should the table formatting be consistent between here and Section 2.2.1?