[core] Review of draft-ietf-core-oscore-groupcomm-17

Christian Amsüss <christian@amsuess.com> Fri, 07 April 2023 16:11 UTC

Return-Path: <christian@amsuess.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ABE4AC1516E1; Fri, 7 Apr 2023 09:11:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.597
X-Spam-Level:
X-Spam-Status: No, score=-2.597 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HHvgJi0ed8F3; Fri, 7 Apr 2023 09:10:56 -0700 (PDT)
Received: from smtp.akis.at (smtp.akis.at [IPv6:2a02:b18:500:a515::f455]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0AC85C15152F; Fri, 7 Apr 2023 09:10:49 -0700 (PDT)
Received: from poseidon-mailhub.amsuess.com (095129206250.cust.akis.net [95.129.206.250]) by smtp.akis.at (8.17.1/8.17.1) with ESMTPS id 337GAibI088241 (version=TLSv1.2 cipher=ECDHE-ECDSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 7 Apr 2023 18:10:45 +0200 (CEST) (envelope-from christian@amsuess.com)
X-Authentication-Warning: smtp.akis.at: Host 095129206250.cust.akis.net [95.129.206.250] claimed to be poseidon-mailhub.amsuess.com
Received: from poseidon-mailbox.amsuess.com (poseidon-mailbox.amsuess.com [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bf]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id 7FE821E500; Fri, 7 Apr 2023 18:10:41 +0200 (CEST)
Received: from hephaistos.amsuess.com (213-147-163-69.nat.highway.bob.at [213.147.163.69]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id 1B06720BB7; Fri, 7 Apr 2023 18:10:16 +0200 (CEST)
Received: (nullmailer pid 28026 invoked by uid 1000); Fri, 07 Apr 2023 16:10:14 -0000
Date: Fri, 07 Apr 2023 18:10:14 +0200
From: Christian Amsüss <christian@amsuess.com>
To: core@ietf.org, draft-ietf-core-oscore-groupcomm@ietf.org
Message-ID: <ZDBAZn9uxU6eB9IY@hephaistos.amsuess.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="UCsniw8SKGQbSkdH"
Content-Disposition: inline
In-Reply-To: <167155165139.12883.5074467066117832003@ietfa.amsl.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/yAiNmM9_FxhSIvmXpt4WGt8bBx0>
Subject: [core] Review of draft-ietf-core-oscore-groupcomm-17
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Apr 2023 16:11:00 -0000

Hello OSCORE groupcomm authors,

I've finally managed to update my aiocoap implementation, and can thus
now use the experience gained there in the long-promised shepherd
review. Some suggestions were already placed in PRs (but are still
listed here). Items are grouped by topic as well as I could; I hope
rearranging didn't break the reading flow too much.

As a whole, I feel rather positive about the protocol described; as for
the way how it is described, I think that it can still be enhanced, and
hope that discussions spinning off this mail will contribute towards
that.

Notes from updating implementations / plugtesting
=================================================

* For the alg_pairwise_key_agreement, it is unclear which number to put
  there -- and which effect the choosen alorithm has.

  Concretely, one implementer (me) left -8 in the
  alg_pairwise_key_agreement field -- given that there is an obvious way
  to do pairwise key agreement on Ed25519 (that is, through the
  Montgomery coordinate change described). Rikard pointed out to me that
  -27 goes in there (our cryptographic primitives interoperated fine, we
  just disagreed on the numbers) -- but I couldn't find a place that
  said which algorithm to name here. We tested with -27 (for ECDH-SS +
  HKDF-256), but given we're only using the ECDH-SS portion of that key,
  it was not clear whether to pick -27 or -28, given that both are the
  same in the used property. (The HKDF property of that registered
  algorithm is not used; instead, the material is used with OSCORE's
  custom KDF steps that use the HKDF hash function).

  This issue seems to be similar in style than the one we've had with
  said hash function in OSCORE and its key inputs: Is "Use SHA-512 as
  the HKDF algorithm in the common context" expressed COSE algorithm 5
  or -10? In both cases, the trouble appears to stem from the use of
  COSE algorithms for something that is not a firbreak the reading flow
  too muchst-class COSE citizen.
  Unlike in that old discussion, here the number does wind up in the
  eAAD already, and is thus critical for interoperability.

  It would be helpful if at some place in the specification, discussed
  algorithms were mentioned with their full names (and numbers), and/or
  what preferred numbers are (eg. in the case of the -27/-28 case).

  Things might be easier if the reader had a way of deciding easily
  which algorithms fit in which position. The COSE algorithms registry
  is probably lacking information for that purpose.  If the information
  were available in the table, in Section 2 Security Context / Common
  Context, it would be easy to say that the Signature Encryption
  Algorithm is any out of the registered ones with "usage pattern":
  "symmetric encryption" or "symmetric authenticated encryption", or
  even that the Pairwise Key Agreement Algorithm needs to have "usage
  pattern" "key agreement" (sound obvious phrased like that, but it's
  not clear in the table which these are). Ideally, the reader could
  even look up which Pairwise Key Agreement Algorithm goes with which
  Signature Algorithm (which also does not follow from the tables; just
  two of them are made compatible explicitly citing Thormarker)

  Maybe that last aspect of compatibility is best asked about through a
  scenario: Put yourself in the shoes of a lowly implementer who is told
  by The Bosses to implement Group OSCORE with Ed448 signatures, and
  secp256k1 pairwise mode. What can the lowly implementer cite to show
  The Bosses it doesn't work that way?

* (Not from own experience, but from looking into how the long-term
  static-static secrets could be stored). The way several secrets are
  concatenated in the HKDF input does, AFAICT, not work with the current
  PSA Crypto API which I think is gaining some traction in the embedded
  space -- it can use keys as inputs to other key derivations (without
  ever exposing the key to the application), but not a concatenation of
  several keys. I don't think that the protocol should change, but hope
  that this can be used as a motivating example to allow a concatenating
  key derivation function into the library.

Naming things
=============

* The Group Encryption Key is merely used to encrypt the signature (and
  not for encrypting group messages). That's explicit in its
  definition, but maybe we could still have a better name. Also, while
  being a key and being used for encryption, it's not a classical
  encryption key. So, "Signture Encryption Key"? "Signature Encryption
  IKM"?

* Conversely, the Signature Encryption Algorithm is *not* used to
  encrypt signatures, but as the (A)E(AD) algorithm used when a
  signature is present. Not only would a more fitting name make the
  specification easier to understand, it would also allow removing the
  "This algorithm is not used to encrypt the countersignature" sentence.
  If the Group Encryption Key is renamed, this could become the Group
  Encryption Algorithm (because the term group encryption is then no
  longer hogged by the signature encryption key, and it actually *is*
  the encryption used in group mode).

Consistency
===========

* The string "Group Encryption Key" is used in Group Encyrption Key
  derivation, even though the keystream request and keystream response
  types already went for short CBOR values. (IIRC that was to avoid
  lugging large byte strings through firmware images needlessly). Why is
  a string still used here?

  The keystream ones also have subtle differences in structure from the
  usual content -- all fine given how they're used, just thought it
  might have played in to the current state (but it's unclear how).

* 2.3 Authentication Credentials: "All authentication credentials MUST
  be encoded according to the same format used in the group". Is that
  format part of the common context? I think it'd make sense to list
  that piece of information there. (Otherwise, the reader will likely
  group that information it'll later obtain from the GM together with
  the common context anyway).

  Also, in tune with the confusion about numbers such as -27 and -10,
  it'd be helpful to point to concrete example numbers such as TBD2 from
  lake-edhoc for kccs.

* 9 / 9.1, " a sender can use the pairwise mode to protect a message
  sent to (but not intended for) multiple recipients": This refers to
  the service roughly outlined in 9.1, but 9.1 describes "a resource to
  which a client can send a group request".

  A concrete suggestion is in https://github.com/core-wg/oscore-groupcomm/pull/101

Clarity
=======

* In the 2.1.6 Group Encryption Key derivation, the key length is chosen
  to match the Signature Encryption Algorithm. This is a sound choice,
  but could be confusing to a reader who is wearing a "dimensional
  analysis" hat, where it would trigger the expectation that this key
  would be used with that algorithm (which it is not -- it's only used
  as input to a KDF where length only matters to the extent it limits
  entropy). I suggest adding a note here:

  > [are the size of the key for the Signature Encryption Algorithm from
  > the Common Context (see Section ...), in bytes.] While the obtained
  > key is never used with the Signature Encryption Algorithm, its
  > length was chosen to obtain a matchin level of security.

* 2.1.1 Common Context / AEAD Algorithm: Is this not exactly *the* AEAD
  algorithm of 8613? Why is it an extension (as the 2.1 introductory
  text labels all its subsections)?

* "before a further following group rekeying occurs, the Group Manager
  MUST NOT rekey the group upon their re-joining". I don't quite get
  what is forbidden here. If a rekeying happens before the elder member
  joins, it's OK. If a rekeying happens right after the elder member
  joins, it's OK. Why is rekeying in that very instant forbidden, when
  "and we're using this opportunity to also rekey" sounds like an
  otherwise common thing?

* 4.1, COSE_Countersignature0: Is it justified to call this parameter
  COSE_Countersignature0, when it contains an *encrypted*
  countersignature? Is COSE_Countersignature0 even a parameter name, or
  not a type (that's typically in a "Countersignature0 version 2"
  field)?

  It matters not on the wire (where the compression elides the field's
  name, and it is not an input to any cryptographic operation), but it
  matters on specification clarity.

* 8. Message Processing in Group Mode: "Instead, applications should
  store such outgoing messages for a predefined, sufficient amount of
  time, in order to correctly perform potential retransmissions at the
  application layer." -- I think the intention here is that the
  application sends repetitions of the same protected message (and
  doesn't protect the message multiple times), but that is not said.
  (Nothing in that paragraph *needs* to be said, for it is all just
  reiterating other documents' requirements, but this line might be
  putting implementers on the wrong track).

Editorials
==========

* The references "CWT Claims Sets (CCSs) [RFC8392]" are a bit weird, fix
  suggested in https://github.com/core-wg/oscore-groupcomm/pull/100

* The title contains an abbreviation that's not in the RFC editor's
  asterisk list[1]. It may be possible to argue that "Group Object
  Security for Constrained RESTful Environments (Group OSCORE) -- Secure
  Group Communication for CoAP" is an 'impossibly ugly title'. Do you
  plan to use the impossibly-ugly defense, or is there a different title
  you'd fall back to?

* -17's PDF has a very weird authors wrapping (but that's probably more
  of a tooling issue)

* "Each endpoint of a group is aware of whether the group uses the group
  mode, or the pairwise mode, or both": I think this sentence is a bit
  early here -- as a reader, is this somehow more special information I
  need to keep in mind than the rest? (Similar also for "Group Manager
  indicates whether the group uses the group mode, the pairwise mode, or
  both of them"). AIU, what the group members actually agree on
  (directed by the GM) are the values for Signature Encryption
  Algorithm, Signature Algorithm and Pairwise Key Agreement Algorithm.
  If they're set (or: set to a value other than null), "group mode is
  used" or "pairwise mode is used".

  I think the document would be easier to digest if whether the group
  "uses X mode" were not emphasized as an extra property, but a
  consequence of whether or not the relevant X algorithms are null or
  not.

* 2.1.6 "The label is an ASCII string": Given that the type element in
  the info array is a CBOR item, it can only be a byte or a text string;
  I assume that the latter is meant. (And these don't typically contain
  trailing NUL bytes).
  
* 2.2 "An endpoint admits": "The endpoint may admit"?

* 2.5.2 Exhaustion of Sender Sequence Numbers: "has consumed the largest
  usable value". This may be a good point to informatively point out
  that more considerations may play in here than the mere 40-bit
  technical limit.

* 6.2 Update of Replay Window speaks of "if its installation has
  required to delete another Recipient Context", and similarly 2.5.1.2
  of "Every time this happens". This is assuming that dropping of
  security contexts will happen precisely when the group's quota is
  "full". That works fine assuming a fixed number of recipient contexts
  per group, but doesn't work so well if there is an LRU cache of
  recipient contexts for multiple groups a node is member of (or any
  even wider memory pressure situation).

  I suggest that wording conditional on "what is happening in the
  instant the context is created" be changed to wording based on "if
  since the last re-keying any recipient context has been dropped".

* 4., "OSCORE uses the untagged COSE_Encrypt0 structure with an
  Authenticated Encryption with Associated Data (AEAD) algorithm":
  After compression, it doesn't matter any more whether it's the tagged
  or the untagged COSE_Encrypt0 structure.

* 5.1.1 Examples in Group Mode: I think its readability would be
  enhanced if rather than duplicating the values in the "before"
  description, CBOR Diagnostic Notation comments were used. An example
  of that is in [4].

* In various places, data loss is associated with "rebooting" (eg. in
  6.3, Message Freshness). A device can go through a reboot without
  losing any synchronization state. It's the "unplanned", or
  "spontaneous" or maybe "unprepared" reboots that cause (unavoidable:
  keeping replay windows persisted is just not practical) loss of state.

* 8. Message Processing in Group Mode: The sentenced "For encryption and
  decryption operations, the Signature Encryption Algorithm from the
  Common Context is used." doesn't fit with the topic of the rest of
  that paragraph. I think it can be dropped without replacement; it is
  reiterated in 8.1, 8.2 etc anyway.

* "However, a recipient MUST stop processing": The wording gives me the
  impression that this is deviating from RFC8613, but it's not clear to
  me how.

* Appendix D is good content, but it feels like it was placed in an
  appendix just because no suitable location in the introduction could
  be found at the time of writing.

Cryptographic aspects that are actually above my level
======================================================

* In verifying that the now explicit arithmetic for the Ed25519 /
  Curve25519 mapping is what I'm using, I found that libsodium (from
  where I lifted my implementation) since very recently contains a note
  on its ed25519_..._to_curve25519 functions that "An alternative is do
  the ECDH operation over the Edwards curve, avoiding the conversion
  altogether"[2].

  Given that a lot of work has gone into making the Ed25519/X25519
  conversion work, I hope that no easy alternatives to this moderately
  cumbersome process (admittedly, it's the hardest in terms of spec
  text, and probably easier on the target CPU than on the programmer)
  have been missed, but I'd appreciate if someone who understood what's
  actually going on here could verify that whatever they're proposing
  doesn't apply here.

Points where I'm not sure what Group OSCORE can or should accomplish
====================================================================

* 4.3 has a provision that 'If no Group Manager maintains the group,
  this parameter MUST encode the CBOR simple value "null"'. While that's
  kind of convenient (during plug tests, we had no GM, so we put null in
  there), other places in the text don't give the impression that it's
  optional to have a gm_cred in the first place ("Each group member MUST
  obtain the authentication credential of the Group Manager").

  This is inconsistent; I don't have a pronounced preference in which
  direction it should be resolved.

  To cater for exotic distributed GMs, it may be good to allow making
  this optional -- but such setups then lose the protection against
  Group Cloning. (But it is doubtful whether that attack is meaningful
  there in the first place, given that the GM creating the new group is
  after all accepted by its new members).

* Now that authentication credentials are a thing in Group OSCORE, and
  these can in particular be CWTs or ?509 certificates that can have
  chains leading there: Does this mean that for deployments such as the
  pairwise-only one sketched at the last paragraph above Section 1.1,
  it's possible to run a group on more-or-less unchanged group key
  material, and members join by means of learning the current key
  material and obtaining a signed certificate to their key? (When they
  then arrive in the group, they start communicating, and when their
  peers need the credentials, those can be disseminated in a
  self-contained way, eg. by the new node itself -- without involving
  the GM).

* "By removing authentication credentials and deleting Recipient
  Contexts associated with stale Sender IDs": The rationale sounds
  flawed to me. We could be way less strict about the remembering old
  credentials (eg. allowing to get the current key material even when
  there is no precise list of stale IDs), and would still not
  communicate with non-group peers, because those are excluded by lack
  of key material. The better reason to me is that this strictness
  ensures that we never use the wrong (old) credential with a sender ID
  that has now been issued to a different group member, and thus don't
  need to recover from such a mixup.

  AIU, this is the one and only purpose of the Key Generation Number. By
  having a never wrapping, monotonous key generation number, groups can
  effectively not be updated forever, but are limited by the size of the
  number that is supported in the peers.

Freshness, replay window and ordering
=====================================

* The term "absolute freshness" is not defined, and I don't understand
  it. (Note to self: When clarified, revisit the "Assuming an honest
  server, the message binding guarantees that a response is not older
  than its request. Hence, the following holds." paragraph).

* 2.5.1.1/2.5.1.2 Reboot and Total Loss, and related "invalid replay
  window" sections:

  There are two aspects to replay protection:

  * Nonce reuse: Can I respond to this message without using an own
    Partial IV? From this PoV, it's OK to initialize with "valid" in
    section 2.5.1.1.

  * Replay detection: Have I never processed this request before? From
    that PoV, the device is prone to processing requests again that it
    has processed before, and it'd be advisable to initialize the replay
    window with "not valid" (triggering the 2.5.1.2 item 2 mechanism).

  (Message freshness is a third aspect, that's related insofar as that
  message freshness can be used to infer absence of replays).

  So after reading 2.5.1.1, it appears we mainly care about nonce reuse
  (which is absolutely critical), and treat replay protection more
  sloppily. Treating replay protection sloppily is, in my opinion, often
  fine depending on the message (I've sketched a bit on that in [3]),
  but so far that has not been taken up in OSCORE.

  Inconsistently, when 2.5.1.2 talks of silent servers (which can't do
  the Echo dance to re-initialize their replay windows), the only option
  described for a silent server is to wait for a rekeying, whereas it
  already has all the properties a regular server gains by doing 2.5.1.2
  item 1 (retrieve a new security context) implicitly: It will not do
  nonce-reuse (because it never sends anything), and it will be just as
  prone to replays as a server that just obtained a new security
  context. So by the standards of everything else, silent servers could
  just keep going. (But 6.4 implies that replay protection is just as
  good as with regular OSCORE, so getting a new security context may
  *not* be good enough).

* 10. Challenge-Response synchronization: I don't think it's a good idea
  to allow this mechanism when there is no pairwise mode. If the server
  protects the follow-up request (the one with the Echo) in group mode,
  even if it sends it to a unicast address, that request could be
  replayed to any server that already processed the request's original
  instance. This leads to request being processed twice.

  The whole section is quite hard to process, as it mixes the topics of
  initializing a replay window and finding freshness properties. In
  particular, it speaks of not delivering non-fresh requests requests to
  the application, when generally freshness is not an absolute term, and
  can often only be judged by the application that sets the
  requirements. The large-enough gap mechanism in introduced in this
  section is weird to me:

  If, as a server, my concern is that a malicious party is delaying
  requests to me (say, a sensor sends values every 5 seconds with a
  max-age of 11 seconds, and the attacker delays messages such that the
  server receives a single message every 9 seconds), then if anything, a
  large gap is an indication of *not* being under attack any more -- and
  is thus a bad criterion. IMO the level of freshness that is implied
  here is unobtainable without excessive synchronization, at which point
  it is easier to rekey frequently (thus ensuring that the request is
  more recent than the last rekeying).

  The challenge-response mechanism makes sense for recovering a replay
  window, particularly for reusing the the client's PIVs -- where it is
  nice to have, but not critical to have (because it's always an option
  to take an own PIV instead). It makes some sense for avoiding
  duplicate processing of requests, but whether it's necessary depends
  on the application semantics (not really needed for idempotent
  requests). For freshness, I'm not convinced the full screen page of
  spec text gives any benefits.

* A.2 freshness: Unlike for general freshness (difficult, see above),
  freshness in the sense of the last item of A.2 (ie. ordering of
  messages) is a good goal.

  The term freshness is not well defined here. The use for ordering is
  inconsistent with the term's use in RFC9175 (to which this document
  doesn't explicitly defer, but whose mechanisms it uses for freshnes),
  as RFC9175 freshness is a measure of when the message was sent on the
  recipient's time scale, whereas the message ordering used here is on
  the sender's time scale.

  As a note, when implementing Group OSCORE for constrained devices, I
  would likely not attempt to provide any total order of received
  messages to the application. For received requests, it will need to
  suffice that the request is not replayed (ie., that I'll only send
  every request once to the application). For responses, the two easy
  behaviors are to "accept responses in any sequence" or to "accept only
  the latest and discard all other" messages. There might be a third
  option to allow the application to allocate some counter (so that
  families of responses could each go through an accept-only-the-latest)
  filter. Keeping a persistent total order on all messages received for
  a request is likely not practical without extra allocations in the
  first place.

* 13.13: See other comments related to freshness and ordering. I think
  the only practical freshness properties are "request is newer than the
  last rekeying", and the ordering of messages (which is not freshness
  as defined in RFC9175).

I don't have a good section for this
====================================

* OSCORE was described to also apply to HTTP. While HTTP can't do
  multicast, multicast is just the *typical* transport of Group OSCORE.
  Both when doing multicast-ish things at a proxy and when using
  pairwise mode, HTTP would work for Group OSCORE just as well, with
  (AFAICT) no need for any normative statements (because the text in
  RFC8613 Section 11.1 should suffice), but a simple statement that this
  contains no further provisions because said section makes everything
  work might be good to have.

  (It's mainly the "for CoAP" in the title that prompted this comment,
  and the strong focus on CoAP vs. RESTful environments in the intro).

* Non-notification group exchange: I still think this is better unified
  with observations -- if only for the sole reason that before
  nontraditional-responses (or some document that does them implicitly,
  like groupcomm-proxy) lands, there are no requests with multiple
  responses from the same source other than observations.

  Also, in sentences like "This ensures that: i) an Observe
  notification", saying "This ensures that Observe notifications (or any
  other subsequent responses to a request) may never match to a
  different than their original request" instead of going through the
  cases separately.

  In particular, I'm playing around with wording updates for 6.4.1 -- if
  this item is acted on before I've made a more concrete suggestion via
  PR, please remind me to revisit that section.

* Non-notification group exchanges: In all the 8.x entries, doing a
  non-notification group exchange triggers requirements of storing the
  kid and kid_context. Is that not just as well true of requests that
  just take a long time to process, so that a rekeying may happen
  inbetween? (A rekeying may happen any time, and as per 8.3, it's
  always the latest context that's used to protect the response). The
  requirement "The server MUST use the stored value of the 'kid'
  parameter from the group request" is true no matter whether "the
  server intends to reply with multiple non-notification responses" or
  not.

  Also, in those "The server MUST store", the "MUST" is oddly normative.
  It *needs to* store, them because it MUST use those that were in the
  request when sending the response, I don't think this is an
  interoperability requirement. ("Storing" is a flexible term anyway
  -- the server can also keep the request around and look up the
  KID/context from there.)

* "the untagged CWT associated with an entity is stored in the Security
  Context and used as authentication credential for that entity" / "then
  the authentication credential associated with that entity that the
  signature checker stores and uses is the untagged CWT.": Why does the
  credential get untagged? I was under the impression that one of the
  points of the introduction of credentials was that there is an opaque
  thing that can be put into the key derivation reproducibly by all
  parties (and they'd then verify their common understanding of that
  credential by extracting a public key), but untaggging it (or,
  generally, making rules about how to process a CWT, a CCS or an ?509
  thing) seems to counteract that goal.

* 3.2, "The Group Manager MUST rekey the group when one or more endpoints
  leave the group.". "MUST ... when" sounds like no delay is
  permissible, but unless the endpoint removal happens atomically, there
  will be delay. May the GM cut itself and the network a little slack,
  and have a policy that dictates how long the group may keep running
  after a member is removed? After all, rekeying is not an immediate
  process anyway. On the other hand, "MUST eventually" would be an
  invitation to delay it for a long time (which is also not desirable).
  Would "without undue delay" work?

* Is the attack described in 13.7 ("Cross-group Message Injection")
  still possible now that request_kid_context and gm_cred are part of
  the eAAD? My impression (but without thorough checking) is that it
  wouldn't work any more because the gm_cred differ.

  Having the OSCORE option in the signature is a bit convoluted in
  implementations (not impossible, and can be optimized out, but
  requires some mental gymnastics), and AIU 13.7 is the only reason to
  have the OSCORE option in the eAAD in the first place.

* 8.3 Protecting the Response: This starts with "If a server generates a
  CoAP message in response to a Group OSCORE request, then the server
  SHALL follow". I think the criterion is wrong here -- these apply
  whenever the response is in group mode, which is independent of
  whether the request was in group mode.

* "When receiving another valid non-notification response to the same
  group request": Why are non-notification responses (and, later,
  observes) pinned to a particular KID? Observations can jump KIDs, so
  why not other multiple-responses? I'd think that the ordering is still
  clear because whenever the server jumps KIDs, the client will still
  obtain the credentials (which hopefully are the same, otherwise the
  notifcation has jumped servers) from the GM in a particular order.

* A.1 Assumptions / Group size: Do these upper limits also apply to
  silent servers? I figure that as long as key updates can be
  distributed to them (and that's up to the GM protocol), a group with
  thousands of (maybe silent) servers should be well possible.
  (Metropolitan area info boards about public transport delays come to
  mind).

* A.1, forward and backward security: These look more like objectives
  (A.2) than assumptions (A.1) to me.

* C, "Thus, the expected highest rate for addition/removal of group
  members and consequent group rekeying should be taken into account for
  a proper dimensioning of the Group Epoch size." When the GM is capable
  of reassigning GIDs, is this really a concern? I'd expect that if
  triggers to rekeyings happen with a frequency in the order of
  magnitude of the rekeying time, then the rekeyings themselves would be
  delayed rather than churning through parallel rekeyings. (See also the
  comment about "undue delay" -- and if that wording is not accepted,
  the GM can still make the "evict device" operation take some time). So
  rekeyings would always be roughly ordered, and even a nibble of group
  epoch values would by far suffice.

  Suggestion for the paragraph are in PR
  https://github.com/core-wg/oscore-groupcomm/pull/102

* 13.5.1, "the recipient can still try to process the received message
  using the old retained Security Context as a second attempt": Why is
  that a second attempt? This makes it sound like the server attempts
  decryption, encounters a bad signature, and then retries with the old
  context, whereas what actually happens is that the server sees the old
  Gid, and still has that security context around.

* I should comment on core-groupcomm-bis that I don't quite see why
  Block2 requests to group addresses are ruled out: Fetching data may
  well be efficient by GETting Block2:n from a multicast address (where
  servers with shorter resource representations just reject the request
  silently).

  Anyhow, this document reiterates that requirement (eg. in "With
  particular reference to block-wise transfers") in a fashion I'd
  consider needless. (If the requirement in groupcomm-bis persists, it
  is well stated there -- if not, this one would need an update).

* 13.14: Does it make sense to show client aliveness? The server gets
  client aliveness relative to the last rekeying, but an application
  that requires aliveness exceeding that would cause a storm of Echo
  options that could have better been handled by 1:1 communication (or
  more frequent rekeying) in the first place.

Scope notes
===========

* I have to admit I occasionally zoned out going through 8.1..8.4 and
  9.3..9.6. These are highly repetitive -- as were their counterparts in
  RFC8613.

Best regards
Christian

[1]: https://www.rfc-editor.org/materials/abbrev.expansion.txt
[2]: https://github.com/jedisct1/libsodium-doc/commit/0732187608798b7b6d48d291ed1562fb28cf1e36 / https://doc.libsodium.org/advanced/ed25519-curve25519
[3]: https://www.ietf.org/archive/id/draft-amsuess-lwig-oscore-00.html#name-replay-freshness-and-safety
[4]: https://www.ietf.org/archive/id/draft-ietf-core-comi-12.html#section-4.2.3.1-6