Re: [core] I-D Action: draft-ietf-core-oscore-groupcomm-10.txt

Christian Amsüss <christian@amsuess.com> Tue, 10 November 2020 17:16 UTC

Return-Path: <christian@amsuess.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 60EF93A09DA for <core@ietfa.amsl.com>; Tue, 10 Nov 2020 09:16:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pS6sWXlctkTM for <core@ietfa.amsl.com>; Tue, 10 Nov 2020 09:16:24 -0800 (PST)
Received: from prometheus.amsuess.com (prometheus.amsuess.com [5.9.147.112]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 08DA43A0D20 for <core@ietf.org>; Tue, 10 Nov 2020 09:15:35 -0800 (PST)
Received: from poseidon-mailhub.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bd]) by prometheus.amsuess.com (Postfix) with ESMTPS id 5538E40835 for <core@ietf.org>; Tue, 10 Nov 2020 18:15:33 +0100 (CET)
Received: from poseidon-mailbox.amsuess.com (poseidon-mailbox.amsuess.com [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bf]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id 9B4ABAB for <core@ietf.org>; Tue, 10 Nov 2020 18:15:31 +0100 (CET)
Received: from hephaistos.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:7908:6ab4:a9d3:2528]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id 18A2F34 for <core@ietf.org>; Tue, 10 Nov 2020 18:15:31 +0100 (CET)
Received: (nullmailer pid 1383528 invoked by uid 1000); Tue, 10 Nov 2020 17:15:30 -0000
Date: Tue, 10 Nov 2020 18:15:30 +0100
From: Christian Amsüss <christian@amsuess.com>
To: core@ietf.org
Message-ID: <20201110171530.GA1301772@hephaistos.amsuess.com>
References: <160433906029.4820.14907779807204481594@ietfa.amsl.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="y0ulUmNC+osPPQO6"
Content-Disposition: inline
In-Reply-To: <160433906029.4820.14907779807204481594@ietfa.amsl.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/pXEyxhbf-s2wgGDzrDhUNPsHZZc>
Subject: Re: [core] I-D Action: draft-ietf-core-oscore-groupcomm-10.txt
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Nov 2020 17:16:27 -0000

Hello OSCORE-groupcomm authors,

thanks for incorporating my earlier comments.

While reading -10 for the purpose of implementing it, a few more points
came up (in linear order of the document):

* "Furthermore, sufficiently large replay windows should be
  considered,": This paragraph reads as if a server receiving a high
  sequence number gets completely out of sync (to the point where it
  needs to recover), when actually the only ill-effect of a, say, a jump
  from 1000 to 1100 with a 32bit replay window is that a delayed message
  with sequence number 1002 will be rejected. The next message at 1101
  will still be accepted.

  Recovery *will* be necessary if the server has, in the mean time,
  evicted the client's replay window out of its list of available
  windows. So maybe the recommendation should rather be to not have
  overly long replay windows (or to suggest the replay windows may be
  truncated when unused), as that'd allow the server to keep more
  clients' replay windows around.

  (Or I just misread the paragraph).

* It felt like there are different ways for the sender sequence number
  to go back to 0. (Search for " to 0", first three occurrences). What's
  really the case is that the first two *trigger* the third, but the
  repetition around there made them read like separate things.

* "and SHOULD preserve the current value of the Sender ID of each group
  member.": Why bother? It's not like anything can be reused after the
  master secret changed.

  ("bother", because the GM may want to hand out KIDs sequentially
  rather than tracking a bitfield).

  * Oh, I see -- later on it says that KIDs can never be reused, even
    when master parameters change. Doesn't that lead to irrecoverably
    large KIDs over time? (Ie. they're shiny short first, but after a
    few joinings, group leaves, device restarts and other fluctuation,
    they wind up in the >=4-byte range).

    This also reads a bit controversially -- it says "Even if Gid and
    Master Secret are renewed as described in this section, the Group
    Manager MUST NOT reassign", which sounds like "never ever", but
    misses the master salt from the line above. And it refers to
    2.4.3.1, where it only talks about KID reuse ~"if none of the master
    parameters change". 3.2 item 5 reads like "never ever" again, so
    this needs clarification first and then editing.

* "The value of the 'kid' parameter in the 'unprotected' field of
   response messages MUST be set to the Sender ID of the endpoint
   transmitting the message."

  Is this the case even when requests were made in pairwise mode?

  (For comparison, the section above (4.1) is explicitly "for group mode
  only", and 4.3 says "in group and pairwise mode".)

  * Does it make sense to use the 4.3 extended parameters (with the new
    algorithm data and the added group ID) in pairwise mode?

    After all, everything else in there (including the group mode bit)
    is really just vanilla OSCORE. Or are observations in
    pairwise-pairwise mode supposed to survive rekeying? (If that's the
    case, the question may arise whether anyone might be tempted to use
    pairwise-pairwise in a 1:1 setting just to use that property).

* external_aad: Why is countersign_key_type_capab in there twice?

* "The new element 'request_kid_context' contains the value of" could
  use some explaining why this now is in here.

  AFAIR, it is to allow observations to extend over rekeyings.
  (Shouldn't also, due to this, KIDs be reusable once the KID-Context is
  changed?)

  Same for the OSCORE_option being present in the signature
  external_aad. (A forward reference to 10.6.2 may do).

* Examples in Group Mode: I think it's easier to grasp if rather than
  using the value "COUNTERSIGN", a signature value a la "de 9e ... f1" /
  0xde9e...f1 (depending on the representation) were used; the
  COUNTERSIGN value reads too much like a constant in the
  before-compression COSE object ("What's that, a numeric, a tagged
  value?").

* "Note that, if the Group Flag is set to 0, and the recipient endpoint
   retrieves a Security Context which is valid to process the message
   but is not associated to an OSCORE group,"

  I don't have a fleshed-out proposal here, but we may manage to use the
  same bit for other purposes in non-group-OSCORE messages.

  As the introduction to that section states, the receiver MUST be able
  to distinguish from the KID context and KID whether it's from a Group
  OSCORE or a different context. For Group OSCORE, that bit
  distinguishes between group and pairwise. For other, that other may
  use that bit.

  (I know that this is contradicting the previous suggestion that led to
  this bit being flipped in -09; recent discussions about flag bit usage
  indicate to me that other ways of getting OSCORE keys may have
  different varieties as well, and this would align well then).

* "For each ongoing observation, the server MUST include in the first
  notification": This may get sucked up by a proxy that doesn't forward
  until a following notification arrives.

  Moreover, it may be unnecessary if the rekeying fits snugly between
  notifications.

  Suggested alternative:

  The server can help the client synchronize by sending the Gid as the
  ID Context for notifcations following a rekeying. If there is a known
  upper limit to the duration of a full rekeying, it SHOULD set the Gid
  during that time. Otherwise, it SHOULD set it until the Max-Age of the
  last notification before the rekeying has expired.

* "Senders MUST NOT use the pairwise mode to protect a message intended
  for multiple recipients.": That's a factual and not an
  interoperability statement. It can not.

  If that's meant to rule out a pairwise request to the broadcast
  address (hoping that the right node will answer), it should say
  "protect a message sent to multiple recipients".

  (But probably that's not what's meant, given 9.1 proposes an address
  discovery resource, and for accessing that by a single KID the
  pairwise mode would make a good choice).

* Sections 9.2ff are described as a delta to 8.1ff.

  Given the external_aad and key choices are already in the section 9
  header, would there be any per-step delta left to explain at all if
  they were instead relative to RFC8613 Section 8? That'd make Section 9
  a lot slimmer.

* I'm still not fully sold on why the Object-Security option needs to be
  in the external_aad -- but having two versions of the external_aad
  seems extraneous to me. If the option needs to be in there, I think
  it'd be easier to have it in the external_aad for all purposes right
  away.

  Also, was treating the OSCORE option as Class I considered? That
  doesn't solve the problem of having many different external_aad
  constructions, but it'd be using existing mechanism. (Although
  implementors may curse if they didn't do Class I yet).

  I'll come back to that once I've actually implemented it.

* "Unless exchanges in a group rely only on unicast messages": I think
  that's only understandable for readers that have an active memory of
  Section 7.4 of RFC8613; it'd need either more context or slimming
  down, maybe like this?

  "As multicast usually involves unreliable transports, the
  simplification of the replay window to a size of 1 that's suggested in
  RFC8613 Section 7.4 is not viable with Group OSCORE."

* Summarizing the KID-reuse questions, I think a diagram of the
  componetns may help, like:

                        ←--→ KID
  Secret ←--→ Group ID  ←--→ KID
                        ←--→ KID
          ↑
  Change in             ↑  ↑-- Group ID changes allow old KIDs to be reused
  lockstep              |      (but KIDs usually stay the same for efficient rollout)
                        --- KID changes don't necessitate a Group ID change

  (are there any other components that can chagnge?)

  In that context: Can an ancient group ID be reused?

* E.1 and E.2 Best Effort Synchronization / Baseline Synchronization

  A server doing any of these needs to use own Partial IVs all the time.
  It may be acceptable in a scenario to eat the replays, but it's almost
  certainly not acceptable to commit nonce reuse (which *can* happen
  with Best Effort Synchronization).

* E.3 Challenge-Response Synchronization:

  "The server MUST NOT set the Echo Option to a value which is both
  predictable and reusable." While this is technically correct, it's
  hard to evaluate as a reader. How about "The Echo option value SHOULD
  not be reused, and when it is MUST be highly unlikely to have been
  used with this client recently."?

  * "sends a request as a unicast message addressed to the same server":

    That should indicate pairwise mode.

    There's later paragraphs on this where 10.7 is mentioned, but given
    how widely this open things up, pairwise mode should be mandatory on
    Echo recovery.

  * "The server either delivers the request to the application if it is
    an actual retransmission of the original one, or discards it
    otherwise": For that the server would need to remember. Ther *is*
    somethign to take care of, though: Only the first time the Echo
    request arrives it can be processed, otherwise it may reset an
    already good replay window.

    Suggesting to replace the paragraph with:

    "If the verification is successful and the replay window has not
    been set yet, the server updates its Replay Window to mark the
    current sequence number as seen (but all newer ones as new), and
    forwards the message as fresh to the application.

    Otherwise, it discards the verification result and treats the
    message as fresh or replay according to the existing replay window."

    "(believed to be) lost"

    How is that not a binary thing? (May relate to the first question).
    Either there is a replay window, in which case it can be used (and
    it does get fast-forwarded in need be), or not and it needs to be
    recovered.

    The following paragraphs elaborate on what a loss would mean, but
    don't explain why it would be relevant. Something like this would be
    applicable if we only ever transported the last bits of the sequence
    numbers in a sliding-window fashion, but that's not what happens in
    OSCORE.

KR
Christian

-- 
To use raw power is to make yourself infinitely vulnerable to greater powers.
  -- Bene Gesserit axiom