Re: [ippm] Secdir early review of draft-ietf-ippm-ioam-data-integrity-07

Justin Iurman <justin.iurman@uliege.be> Fri, 26 January 2024 15:57 UTC

Return-Path: <justin.iurman@uliege.be>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 759FFC14F6A0; Fri, 26 Jan 2024 07:57:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.405
X-Spam-Level:
X-Spam-Status: No, score=-4.405 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=uliege.be
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QRdRxUPMAOd2; Fri, 26 Jan 2024 07:57:26 -0800 (PST)
Received: from serv108.segi.ulg.ac.be (serv108.segi.ulg.ac.be [139.165.32.111]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C07FDC14F61A; Fri, 26 Jan 2024 07:57:20 -0800 (PST)
Received: from [192.168.1.55] (125.179-65-87.adsl-dyn.isp.belgacom.be [87.65.179.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by serv108.segi.ulg.ac.be (Postfix) with ESMTPSA id 923C9200C968; Fri, 26 Jan 2024 16:57:18 +0100 (CET)
DKIM-Filter: OpenDKIM Filter v2.11.0 serv108.segi.ulg.ac.be 923C9200C968
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uliege.be; s=ulg20190529; t=1706284638; bh=PAlDQZ4Y469JqXqyq7uqhGJZ+c7j7cDCuRmG50P0C0A=; h=Date:From:Subject:To:Cc:References:In-Reply-To:From; b=Ehk0aIVU418EpZMOHNy8r7nCHoZ56/sDNWOiBVCZPZ01dCFZ7NwJgBj0ELwNlgwKS RL2WvffHPgn9N9efD8wabjaktlnZo/LNKZNGhno3dW1ccVL2NSrHuy9hpOw9BUh/RK 6YEcBbIzFW2NBuWjcDqcQGf2Eo1N8VJ82TngQU8ibxYqxJub591+li8w9w+oMk8BwX +JFFbD63gmyX7wVLgEbANtLV2r55bJoAL7OMwLQBcaWmowxrmByX4VXMMfDf49MarM jAjh5cwUTN/qZQzTjoywlkhSk9k1pcgymR6ZWWK9U9C9i/li18TfG1xIjB5eNA/dyt WWIQSqgD18i2w==
Message-ID: <569bdc8d-f4d1-4f83-9487-f36adf991477@uliege.be>
Date: Fri, 26 Jan 2024 16:57:18 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Justin Iurman <justin.iurman@uliege.be>
To: Benjamin Kaduk <kaduk@mit.edu>, secdir@ietf.org
Cc: draft-ietf-ippm-ioam-data-integrity.all@ietf.org, ippm@ietf.org
References: <170320017390.55336.7890784505052194132@ietfa.amsl.com>
Content-Language: en-US
In-Reply-To: <170320017390.55336.7890784505052194132@ietfa.amsl.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/32pZbJc-8aEf6_MpH_2Sr4-sBSc>
Subject: Re: [ippm] Secdir early review of draft-ietf-ippm-ioam-data-integrity-07
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Jan 2024 15:57:30 -0000

Hi Ben,

Thank you very much for this awesome and huge review.

Please see inline ([JI] tag).

On 12/22/23 00:09, Benjamin Kaduk via Datatracker wrote:
> Reviewer: Benjamin Kaduk
> Review result: Serious Issues
> 
> # secdir review of draft-ietf-ippm-ioam-data-integrity-06
> CC kaduk
> 
> I have reviewed this document as part of the security directorate's
> ongoing effort to review all IETF documents being processed by the
> IESG. These comments were written primarily for the benefit of the
> security area directors. Document editors and WG chairs should treat
> these comments just like any other last call comments.
> 
> The summary of the review is that this is still early-stage work and we haven't
> landed on the right cryptographic mechanisms yet -- what's currently in place
> has some issues that would be very significant issues for some deployment
> sites/scenarios, and there are some aspects that are not fully specified yet. A
> lot of my comments cover topics that will merit mention in the security
> considerations section, though I did not attempt to call each one out as such
> or produce an exhaustive list at the end.
> 
> ## Discuss
> 
> ### Signature vs MAC
> 
> I'm pretty uncomfortable about using the word "signature" to describe the
> protection that we're applying here.  The current text in §5 reads as if the
> need for a symmetric-key mechanism is an intrinsic requirement of the mechanism
> (which seems reasonable given the use-case -- true asymmetric signatures would
> be too computationally expensive to be suitable here), but in that case it
> seems that we are really expecting a MAC (message authentication code) rather
> than a signature (that provides both data authentication and source
> authentication).  In particular, we cite NIST.800-38D for the AES-256-GCM
> "signature algorithm", when I think we are really using GMAC -- the NIST
> document talks about GMAC a lot and does not use the word "signature" anywhere,
> which seems particularly illustrative to me.

[JI] Good point, it is indeed GMAC and "signature" is quite ambiguous. 
That said, this document not only defines a new Integrity Protection 
Method, it also defines new Option-Types and a generic header (the 
generic part of the document). We therefore propose to replace 
"signature" (and similar) with "Integrity Check Value" as the new 
generic term. The reason we want to keep a generic term is for future 
Integrity Protection Methods (who knows what they will use). Note that 
section 5 defining the new Integrity Protection Method would of course 
mention GMAC when going into details.

> ### Pre-hash vs. direct signature
> 
> The mechanism specified in §5 uses a sign(hash(...)) construction, rather than
> signing directly.  This is sometimes needed for asymmetric signature primitives
> like RSA, but typically a MAC construction will be able to authenticate large
> quantities of data directly, without the need to pre-digest the input data. It
> is worth noting that the current construction does not actually provide the
> MAC's authentication guarantee to the underlying packet fields, just to the has
> value.

[JI] Another good point. For the "signature", we were initially using 
the encryption result, not the tag, which explains the confusion. We 
will rewrite that part accordingly.

> ### Key management
> 
> While the specifics of key distribution and management will inherently be out
> of scope for this specification, I think we do need to settle a core question:
> will each IOAM Node have its own unique key for signature generation, or do we
> expect some level of key reuse across machines within a given domain?  The
> latter scenario intuitively seems like it would make for a simpler deployment,
> but would place some quite stringent limitations on what cryptographic
> mechanisms we can use at runtime.  NIST's key usage limitations for GCM, in
> particular, might actually prove prohibitive for the "one key shared across
> nodes" option in practice.

[JI] I replied to your point "GCM Key usage limitations" below, which is 
directly related to this one.

> ### Nonce guidance
> 
> I think this document is incomplete without some discussion of the properties
> that the nonce can provide, especially the factors that go into selection of
> the nonce length.  I can accept that the actual "common methodology to keep the
> Nonce valid only for a specific period of time" would be outside of the scope
> of this document (though we could certainly provide one or more examples along
> with their caveats and areas of applicability if we wanted to), but there is a
> lot that we can and should talk about.  For example, consumers will need to
> know if the nonce must be unpredictable as well as unique, and whether there
> are factors going into the selection of nonce length other than the number of
> available nonce values without reuse (and the likelihood of collision if a
> random nonce-selection procedure is used).  For GMAC in particular, there is a
> pretty strong argument in favor of using 96-bit nonces (since all other nonce
> lengths internally get hashed down to 96 bits), even though the statistics for
> 96-bit nonces can hinder certain use cases.  We could probably also talk about
> the benefits and risks of using sequential values as the nonce (bearing in mind
> the discussions in RFC 9416) even if there is not a general recommendation one
> way or the other on their use.

[JI] Same, see "GCM Key usage limitations" below.

> ### Signature as nonce for transit nodes
> 
> The specification of the actual cryptographic protection algorithm in §5
> includes a provision (step 2) for a transit node to compute a new signature
> that accounts for the additions it has made to the IOAM data.  It seems to be
> saying that the corresponding cryptographic computation involves using the
> received signature value as the nonce input for producing this new signature.
> While SP800-38D does seem to permit using multiple IV lengths with a given key,
> using the received signature as a nonce seems to set us up for using nonces of
> length other than 96 bits, which (per {{GCM-Key-usage-limitations}}) strongly
> limits how much traffic we can send in a given key.  It also would entail using
> data received from the network as the nonce for a given node's private key,
> which seems like it would make it easy for an attacker to induce (key,nonce)

[JI] Correct, see my reply under "GCM Key usage limitations" below.

> reuse.  (This might be mitigated if the transit nodes diligently validate the
> received signature prior to using it as a nonce, but even that would not
> obivate the need for a fully reliable replay detection mechanism for the
> lifetime of the key, which seems prohibitively expensive.) This scheme also
> seems to make validation quite complicated and expensive, since in order to
> verify the final received signature we'd need to reconstruct the whole chain of
> signature from initial encapsulation through all transit nodes.  While being
> forced to validate all the intermediate signatures does provide a fairly strong
> indication of non-tampering along the path, it's also a lot of recomputation. A
> scheme where the transit nodes that regenerate the signature also generate a
> new nonce that goes in the packet would be simpler/faster, at the cost of
> trustng all the transit nodes to properly validate the received signature and
> to be operating correctly.  This is a trade-off, and could be decided either
> way, but if we opt to go for the stronger validation we should specifically say
> that we made the choice to do that and accept the costlier validation and
> replay protection.

[JI] Unfortunately, the trade-off you mention would defeat the purpose 
of this document (assumption: don't trust nodes), even though we totally 
agree on the fact it would be simpler/faster. We will add some text to 
specifically say why we chose to go that way.

[JI] OTOH, since we now require a 96-bit nonce based on a deterministic 
construction (see my reply for "GCM Key usage limitations" below), we 
cannot ask for a transit node to use the previous "signature" (i.e., 
GMAC tag in this context) as its nonce anymore. But we still need to 
kind of include the previous tag in the process, so that we can make 
sure we validate the whole chain, each node depending on previous ones 
on the path. What we suggest is therefore to also pass the previous tag 
to the AAD. Thoughts on this?

> ### Bit-level hash inputs (Protected flags for trace and DEX option-types)
> 
> I think we need to provide more clear guidance on how to handle the protected
> flags for the trace and DEX option-types.  First, a note in passing that
> interoperability does require locking in which flag bits are (not) covered by
> the MAC at the time the protected option type is specified (i.e., now), which
> blocks off future extensions since any new flag bits cannot be integrity
> protected using this mechanism.  That said, it would be possible to divide the
> unallocated flags range into a protected and unprotected portion, to leave a

[JI] Unfortunately, this is indeed a well known problem we chose to 
"ignore" as a trade-off. Of course, your proposition makes sense but 
sounds complicated. It would require to update both RFC 9197 and RFC 
9326 (or are you thinking of having a centralized IANA registry for IOAM 
integrity fields?). Regardless, it seems tricky to divide flags into 
protected and unprotected portions. For example, the 4-bit flags from 
the Trace Option-Type has only one free bit left, namely bit 3 (RFC 9197 
defines the Overflow flag on bit 0, while RFC 9322 defines the Loopback 
flag and Active flag resp. on bits 1 and 2). Do we want bit 3 to be 
protected or not? We don't know, because we can't predict the future. 
And since it is the only bit left, it's 50/50. Also, while bit 0 
(Overflow) MUST NOT be protected (because it is mutable), bits 1 and 2 
SHOULD/MUST be protected. If you want contiguous portions, it forces bit 
3 to be protected as well. What if the future flag defined on bit 3 MUST 
NOT be protected (i.e., an Overflow-like flag)? For all these reasons, 
we chose to go the current way, that is to "freeze" header fields and 
delegate the decision to a document defining a new flag (which of course 
still makes interop difficult, but at least we won't break 
implementations when new flags arrive).

[JI] The other (easiest) solution would be to *not* have integrity 
protection of the IOAM header (or, for instance, only have the 
Namespace-ID which is common to all IOAM Option-Types and is very 
important for the context of IOAM data). Been there, done that. It 
worked very well, and was so much lighter for, e.g., a Trace Option-Type 
with the Loopback flag set. But it was an incomplete solution. If you 
don't protect the header, then DEX or a Trace+Loopback (for example) 
would trigger actions on nodes, even if the header was altered (e.g., 
Loopback flag set by a compromised node on the path, making each IOAM 
node on the path to loop truncated packets with IOAM data back to the 
sender). Or, even simpler, if a compromised node alters the IOAM 
Trace-Type field (for a Trace Option-Type): you wouldn't notice either. 
Best case would be a malformed IOAM-Data-Fields block or malformed 
packet, worst case would be a misinterpretation of IOAM data. All that 
to say, we definitely need header protection too, even if it makes 
things a bit more challenging.

> little bit of flexibility for the future. My main concern here, though, is to
> concretely specify, at the bit level, what the input to the MAC (or hash)
> function is -- do we mask out the unprotected bits (if so, to 0s or 1s?), or do
> we literally just extract the two bits in question and make the bitstream input
> to the MAC (or hash) be not byte aligned?  I strongly suggest the former, since
> implementation handling for non-octet inputs to hash functions is very poor,
> but reading the current text I would conclude that I must attempt to implement
> the latter.  (I note on re-read that SP800-38D does require the inputs to have
> bit lengths be a multiple of 8, so if we decide to use GMAC directly rather
> than hash-and-MAC, we would need to specify padding if we opt for the latter
> approach.)

[JI] The intent is indeed the former, i.e., we mask out the unprotected 
bits (to 0s). We initially thought it was unnecessary to specify it in 
the document and let implementers decide. But, as you mentioned, it is 
indeed important to specify it, thanks. We will add some text.

> ### GCM Key usage limitations
> 
> Since we're using GMAC as the integrity protection mechanism, we need to look
> at the GCM key usage limitations to know how many times a given key can be
> used.  Unfortunately, what SP800-32D says is quite restrictive: its §8.3 lays
> out scenarios where the total number of invocations of the authenticated
> encryption function cannot exceed 2**32 for a given key.  Even if we have
> unique keys per node generating a signtaure, this limit can still be hit fairly
> quickly at modrately high traffic rates.  To avoid that limit we'd need to
> exclusively use 96-bit IVs that use the "determinstic construction" and in that
> case would be limited by the need to avoid reusing "invocation field" values on
> a given device.  Depending on what deployment scenarios we are thinking about,
> there's a significant chance that we'll need to have the protocol be able to
> accomodate key rotations in order to avoid the key usage limits.  This might
> take the place of an in-protocol key identifier field, or guidance to use some
> other protocol element (such as Namespace-ID) to select which key to use.

[JI] Based on NIST recommendations, we will now require (MUST) a 96-bit 
nonce with a deterministic construction, which one will also follow 
their recommendations, namely a 32-bit fixed field and a 64-bit 
invocation field (probably more RECOMMENDED than MUST, to be discussed). 
The invocation field could be a counter or similar, or a timestamp (as a 
solution for replay attacks). The fixed field could be completely random 
(despite its name), or a 16-bit unique ID with a 16-bit random part, or 
completely fixed (e.g., a unique ID). Those should be recommendations, 
we probably don't want to force the content of a nonce. So, the limit 
will no longer be 2**32, it will be 2**S where S is the number of bits 
used by the invocation field. And, depending on how the fixed field is 
built, it can even be much bigger than 2**64. Anyway, such a "limit" is 
much better than 2**32, and a key could be used much much longer. Do we 
really need key rotations? Maybe, although not required in that case... 
but it might also be fine with the deterministic 96-bit nonce described 
above anyway, right? If you agree, we will rewrite the current text and 
add some new text to reflect the above.

> ### GMAC output length
> 
> The specification of integrity protection signature suite 1 in §5 says that
> we're using AES-256-GCM and that "the signature consumes 32 octets".  I'm
> having trouble understanding where that number comes from.  While it's true
> that AES-256-GCM needs a 32-byte key and SHA-256 produces a 32-byte output, the
> GCM authentication tag length is specified separately, and has to be one of a
> handful of preordained values (128, 120, 112, 104, of 96 bits per SP800-38D).
> I would assume that we want the full 16-byte authentication tag for our
> purposes, but then what are the other 16 bytes of "signature" supposed to be?

[JI] Our mistake. Mainly due to the fact that, as mentioned above, we 
were using the encryption result (pre-hashed with SHA-256, which 
explains the "32-octet signature"), not the tag. We will rewrite 
accordingly.

> ## Comments
> 
> ### Requirement for security
> 
> When the abstract says
> 
>     IETF protocols require features to ensure their security.
> 
> This is true in a certain sense, but the sense may be quite subtle for
> some readers.  In particular, we require (of new work) that they have
> the capability to be used in a secure fashion, but in many cases we do
> not require that the security mechanisms are actually used at runtime.
> So, for example, while TLS 1.3 does require that you actually provide
> (server) authentication, data confidentiality, and in most cases forward
> secrecy, we also see that (to pick an arbitrary example) one can use the
> RFC 8300 Network Service Header without using RFC 9145's integrity
> protection.
> 
> Which is a long-winded way of saying that I'd propose to say "require
> features that can provide secure operation" rather than "require
> features to ensure their security".

[JI] Ack.

> ### References for Ping and Traceroute
> 
> Do we want to provide references or links for Ping and/or Traceroute
> (mentioned in passing in §1)?

[JI] §1 is the same as in RFC 9197, where Ping and Traceroute are also 
mentioned without reference. Not sure this is really necessary here either.

> ### Threat model
> 
> §1 lays out the scenario as having an IOAM-Domain that's under a single
> administrative control but invokes the possibility of data collected in
> untrused or semi-trusted environments as a motivation for integrity protection.
>   Is this just a risk that nodes which are supposed to be under the domain's
> administrative control get compromised, or are we intending to consider a
> subtler scenario with semi-trusted entities being authorized parts of the
> administrative domain directly, or something else?

[JI] The former, to keep things simple. I.e., a compromised node in a 
single IOAM domain. Do you think we should add more text here?

> ### detectability problem
> 
> In a certain sense a nit, but coupled enough to the core intent of the document
> that I promote it to a comment.  I think we need to more concretely introduce
> the "detectability problem", c.f.  §1¶3:
> 
>> The following considerations and requirements are to be taken into account in
> addition to addressing the problem of detectability of any integrity breach of
> the IOAM-Data-Fields collected:
> 
> The "detectability" problem hasn't been introduced yet, as such.  Maybe we want
> another paragraph before this one, like
> 
> % Since arbitrary nodes and middleboxes are free to tamper with all packet
> data, including IOAM fields, and the packets are (in general) processed by
> other intermediary nodes before they might come to a node in a position to
> verify the packet's contents, there is little value in attempting to use
> cryptographic mechanisms to prevent such modifications to the packet contents.
> Instead, we limit ourselves to the "detectability problem", namely, to allow an
> endpoint or IOAM control point to detect that such modification has occurred
> since the generation of the IOAM fields.  (Note that, as an IOAM-layer
> mechanism, the scope of modifications that can be detected may be limited to
> just the IOAM fields themselves.) % % In addition to this detectability
> problem, the following considerations are to be taken into account in
> constructing an IOAM integrity mechanism:
> 
> (This also serves to give a bit more motivation for why we don't consider
> confidentiality protection.  That content might be applicable in §3 as well,
> but I put it in the proposal here since it appears prior to that section.)

[JI] Agree, thanks for the text!

> ### Separation of layers
> 
> While it's generally reasonable (as §3 does) to require the lower layer
> protocol to handle threats at their own layer, I would probably call out that
> since IOAM is defined as data fields rather than a dedicated packet structure,
> we also rely on the lower layer to provide integrity protection for whch data
> fields (that is, IOAM Option-Types) are present in a given packet.

[JI] Will remove that part.

> ### limited off-path attackers
> 
> §3 refers to RFC 9055 for definition of on- and off-path attackers.  QUIC (RFC
> 9000) considers an additional case of "limited on-path" attackers that are
> initially off-path but in some cases can change packet routing and become on
> path for some portions of a flow.  Do you think that considering this level of
> subtlety is relevant for this document?  On initial read, I'm not really seeing
> much where considering this distinction would actually change what we say, so
> perhaps not.

[JI] It seems that the additional case does not change what is already 
described in the document and would fall under one or the other anyway. 
Thanks for mentioning it though.

> ### threat: false error injection
> 
> The discussion in §3 around creating a failure report for a nonexistent failure
> mentions the potential for additional processing/export by IOAM nodes along the
> path.  Could this be a privacy concern where the additional reported data
> contains "sensitive" information (for some definition of "sensitive")?  I am
> not sure if it is worth also mentioning the time of humans who get to look at
> the false positive reports and analyze them, only to ultimately discard them as
> bogus.

[JI] This is more a question of availability than privacy. Although 
management/exporting is out of scope, it is supposed to have its own 
secure scheme. Do you think this sentence should be clearer?

> ### threat: removal of fields
> 
> We cover modification and injection already, but might have a bit of discussion
> on what happens when the attacker just removes some or all IOAM fields.

[JI] Fields removal was implicitly considered part of fields 
modification. Basically, the consequences would be the same for both 
modification or removal, with the latter having an additional possible 
consequence: ending up with a malformed packet. Will add specific text 
in fields modification to mention and distinguish it.

> ### IOAM-Data-Fields modification
> 
> In §3.1 I might expand "false picture of the paths in the network" to cover
> both the notion of providing false paths in the network (topology-wise) and
> providing false data about (real or false) paths in the network.

[JI] Not sure if we can easily distinguish the two. IMO, a false 
(topology-wise) path can't be provided in the network as is. If so, it 
would come indirectly from data modification as well. You might for 
instance switch some node data, therefore representing a path with nodes 
in different positions. OTOH, you might modify some data fields 
directly, making a node appear as healthy or faulty (even if it's not), 
and making the current path to be modified if the system reacts 
accordingly, whether it's a real or false path (which can be the same 
consequence if you change some nodes position too). Proposed change:

OLD:

"[...] an attacker can create a false picture of the paths in the 
network, the existence of faulty nodes and their location, and the 
network performance."

NEW:

"[...] an attacker can create a false picture of the network status. For 
example, the attacker can hide the existence of a faulty node or make a 
healthy node appear as faulty. Despite the likely global impact on 
network performance, a direct consequence can be a change in network 
paths, either based on false node positions or false data provided by 
the attacker to fool the system that ingests IOAM-Data-Fields."

> ### Option-Type Headers scope
> 
> In §3.2 we talk about the implications of changing the header of IOAM
> Option-Types, but the discussion here is intrinsically limited to the
> option-types defined at the time of this writing; we should probably
> acknowledge that limitation explicitly (and possibly just say that the listed
> implications are intended to be examples rather than exhaustive, as well). We

[JI] Well, we could do so. However, it would clearly make a difference 
for future Option-Types, which is not the intent here. More like the 
opposite, actually. Modifying future Option-Types' headers is assumed to 
be the same issue as the one described here, based on existing 
Option-Types. What we can do, though, is to rewrite that part a little 
so that it makes it clear that these are examples, not an exhaustive 
list of possibilities. Proposed change:

OLD:

"Changing the header of IOAM Option-Types may have several implications. 
An attacker can maliciously increase the processing overhead in nodes 
that process IOAM-Data-Fields and increase the on-the-wire overhead of 
IOAM-Data-Fields, for example by modifying the IOAM-Trace-Type field in 
the IOAM Trace Option-Type header. An attacker can also prevent some of 
the nodes that process IOAM-Data-Fields from incorporating 
IOAM-Data-Fields, by modifying the RemainingLen field in the IOAM Trace 
Option-Type header."

NEW:

"Changing the header of currently defined IOAM Option-Types, just like 
it is assumed for future IOAM Option-Types, may have several 
implications. The following list of examples is not exhaustive, and is 
based on existing IOAM Option-Types. An attacker could maliciously 
increase the processing overhead in nodes that process IOAM-Data-Fields 
and increase the on-the-wire overhead of IOAM-Data-Fields, by modifying 
the IOAM-Trace-Type field in the IOAM Trace Option-Type header. An 
attacker could also prevent some of the nodes that process 
IOAM-Data-Fields from incorporating IOAM-Data-Fields, by modifying the 
RemainingLen field in the IOAM Trace Option-Type header. [...]"

[JI] See next two points for the remaining part.

> might also want to say that modifying the headers can have similar effects to
> modifying the data-fields directly, in terms of making the interpreted data
> useless.

[JI] Correct, we can add a sentence at the end of the (new, see next 
point below) paragraph. Proposed text:

NEW:

"[...] This attack can also lead to similar impacts described in Section 
3.1."

> ### Namespace-ID modification
> 
>> Another possibility for the attacker is to change the context of
> IOAM-Data-Fields by modifying the Namespace-ID field in IOAM Option-Type
> headers, which makes the integrity protection of IOAM-Data-Fields completely
> useless.
> 
> This "completely useless" probaly merits further exposition.  (That is to say,
> I don't think I actually know what you mean by it.)

[JI] An IOAM-Namespace, identified by a Namespace-ID, plays the role of 
context for IOAM-Data-Fields. You can think of it as metadata for data. 
We can refer to RFC 9197, Section 4, last paragraph, which says: 
"IOAM-Namespaces allow for a namespace-specific definition and 
interpretation of IOAM-Data-Fields". Proposed change:

OLD:

"Another possibility for the attacker is to change the context of 
IOAM-Data-Fields by modifying the Namespace-ID field in IOAM Option-Type 
headers, which makes the integrity protection of IOAM-Data-Fields 
completely useless."

NEW:

"Another possibility for the attacker is to change the definition or 
interpretation of IOAM-Data-Fields by modifying the Namespace-ID field, 
which is common to all IOAM Option-Type headers. Indeed, an 
IOAM-Namespace, identified by a Namespace-ID, allows for a 
namespace-specific definition and interpretation of IOAM-Data-Fields 
(see [RFC9197] Sec. 4.2.). Without the right context (i.e., 
Namespace-ID), IOAM-Data-Fields become useless, just like data without 
metadata."

> ### Injection defenses
> 
> While I agree that the impacts of injection (§3.3,3.4) are similar to
> modification in general, it does seem that an IOAM deployment would be able to
> protect itself from injection (but not modification) if it know a priori what
> IOAM mechanisms were going to be in use on each flow, so that unexpected ones
> could be rejected.  That said, this scenario may not be worth mentioning in the
> document, since an attacker in a position to inject IOAM content into otherwise
> valid packets would very likely also be able to modify preexisting IOAM
> content...but an off-path attacker could inject wholly new packets with IOAM
> content while being unable to modify existing IOAM content.

[JI] Not sure what your intention is here. Are you suggesting we should 
remove both sections 3.3 and 3.4?

> ### Replay scope
> 
> § 3.5 mentions that an attacker can replay an IOAM Option-Type on a new data
> packet as a specific example; I'd suggest prefacing that remark with a
> statement that "In addition to wholesale replay of old packets" to highlight
> this scenario as a special case of the more generic replay topic. As far as
> impact goes, I might also add that replaying old IOAM data might allow an
> attacker to mask other elements of an attack, such as a change in network path.

[JI] +1. And, for the "Impact" paragraph, I think we could just say that 
it is the same as the impact of "IOAM-Data-Fields modification" (see new 
proposed text above), e.g., "The impacts of this attack are similar to 
Section x.x".

> ### Clarity on Management Attacks
> 
> I'm not entirely sure what the scope and intent of §3.6 is.  While the overall
> statement that management-plane attacks are out of scope for this document is
> clear (and probably reasonable), I don't have a picture of what attacks are
> envisioned -- are we looking at changing the data reported by IOAM as it goes
> from IOAM nodes to a reporting system?  Or are we including management-plane
> traffic that configures nodes on what IOAM traffic to expect/process, what
> IOAM-domain and/or namespace to participate in, etc?

[JI] Both, see below.

> A message of "once it leaves the IOAM layer, we can't do more for it" is simple
> and easy to explain, but some of the other more complicated topics may also be
> interesting to talk about if we want to consider the security of the overall
> ecosystem.

[JI] This document is about the integrity protection of IOAM-Data-Fields 
(which also includes headers). So management-plane traffic that 
configures IOAM nodes is way out of scope, but is still covered so that 
readers know what does and does not apply to this document. What's more 
in scope (but still out of it) is the exporting part, because here we 
have IOAM-Data-Fields. Not sure we should discuss all of the above in 
more detail, as it would unnecessarily clutter the document and diverge 
from the main topic. After all, they're out of scope, and §3.6 probably 
does a reasonable job already. It might be better suited to another 
document that would describe the security of a specific part or even the 
ecosystem as a whole.

> ### Anti-Delay
> 
> While §3.7 does a reasonable job discussing delay, there is a niche case of
> anti-delay attacks possible as well, where an attacker has acces to a faster
> path and can skew the delay measurements in the "wrong way".  I am not sure if
> this presents any sufficiently interesting consequences to merit its own
> mention, though

[JI] Not sure either. If we follow the same logic as the previous point, 
probably not. However, we can add a sentence to describe this use case 
if you wish.

> ### DEX Integrity Protected
> 
> §4 lists IOAM Option-Type 68 as allocated to DEX Integrity Protected, but I do
> not see this allocation reflected at
> https://www.iana.org/assignments/ioam/ioam.xhtml .

[JI] IOAM Option-Type 68 (DEX Integrity Protected) is now allocated, 
sorry for that. It was a pending request for IANA and, once privately 
approved by email, we submitted a new version to include the code point 
(maybe too early, it took some time to be officially allocated).

> ### Order of Headers and Data
> 
> I'd suggest being very explicit about the relative ordering of the Option-Type
> header being integrity protected, the Integrity Protection Header, and the
> actual IOAM data/data-fields.  Almost everything I see suggests that we insert
> the Integrity Protection Header between the Option-Type header being protected
> and its corresponding data, but in §4.4 we say that the optional fields in the
> DEX Option-Type header are treated as optional IOAM-Data-Fields while appearing
> before the Integrity Protection Header -- that leaves me unsure where the data
> fields go for the other Option-Header types.  Some statement and/or diagram
> (perhaps in toplevel §4) would avoid any ambiguity.

[JI] You're right, it's ambiguous. We will clarify and add 
IOAM-Data-Fields too in diagrams to make it crystal clear.

> ### Protected flags for POT option-type
> 
> Right now we (implicitly, by not protecting the IOAM-POT-Flags field at all)
> say that any future POT flags will not be integrity protected.  Is that the
> right choice?  There are not currently any POT flags defined, so it's a little
> hard to predict what kinds of use cases they might find.  As for the trace
> option-types mentioned above, it would be possible to subdivide the flags space
> into a protected and unprotected range, to leave a little flexibility for the
> future.

[JI] See my reply to your related DISCUSS above.

[JI] TL;DR -> we prefer not to go that way, even if we agree on the main 
problem (just another trade-off).

> ### Protected flags for DEX Option-Type
> 
> There are currently no "Flags" defined for DEX, which leaves it in a similar
> state as POT.  I comment separately on DEX because of the "Extension-Flags"
> field -- I would expect that all of this field, not just two bits, should be
> protected.  That's because these bits determine the structure of the following
> packet, which is something that we have been consistently applying protection
> throughout the document, and I don't see a reason to diverge from that pattern.

[JI] Good catch, will fix that.

> ### Mutable fields to skip for signing
> 
> The discussion in §5 very quickly glosses over "IOAM-Data-Fields supposed to be
> modified by other IOAM nodes on the path MUST be excluded from the signature".
> This is actually a critical point for constructing and validating ciphertexts,
> and seems like it would merit a longer treatment.  Most notably, I would
> probably want to have a central table (or maybe add to the IANA registry?) to
> indicate "does this field get skipped for integrity protection: Y/N" to try to
> leave out any guesswork by the implementor as to what is mutable or not.

[JI] We probably want to avoid the IANA registry for this, because it's 
related to the method defined in this document, and this document also 
(generically) sets the stage for future methods. Again, who knows what 
they will do and how. +1 for the central table though. Do you think it 
should be in addition to sections 4.1.1., 4.2.1., 4.3.1., 4.4.1., or 
should we remove them, or add a reference to the table, e.g., "see table x"?

> ### Signature-as-nonce is taking action
> 
> I'd clarify in §5 step (3) first bullet point that an intermediate node using a
> received signature as a nonce counts as "taking action" triggered by a field in
> the protected header and thus incurs the obligation to validate first.  So this
> requirement would be in force regardless of whether Loopback or Active are
> used, IMO.

[JI] As mentioned in my reply under your related DISCUSS, we want to 
avoid a validation on each transit node (for all the reasons described 
in the reply above). For that, we distinguish "taking actions" from 
"inserting IOAM-Data-Fields". Moreover, a validation on each transit 
node would be heavy (node n would have to check the whole chain from n0 
to n-1), and so for each transit node, which is definitely not what we 
want (not even talking about the key exchange nightmare). It's currently 
the case for the Loopback flag, though, but we don't have a choice for 
that one. Well, actually, there is a solution but it would require two 
"Integrity Check Values": one for the header (set by the encap), and the 
other for IOAM data (modified by transit nodes). This way, if we 
distinguish the two, a transit node could only check the "Integrity 
Check Value" related to the header without the need to recompute the 
whole chain for the second "Integrity Check Value" related to IOAM data 
(i.e., a node won't execute a triggered action if there is a problem 
with the header, but it does not care when it comes to inserting its 
data; it will be up to the validator to check data, in addition to the 
header). But, to be honest, I don't know how we could specify this 
solution without being hacky.

> ### Guidance on what fields to integrity protect
> 
> In §6.1 I'd want to include alongside the requirement for new integrity
> protected option types to specfy which fields they protect, some guidance on
> which sorts of things to protect or not protect.  (This would be a place to
> codify the behavior I noted as being implicitly present in the document, that
> fields that affect the structure/interpretation of the rest of the packet
> should be integrity protected.)

[JI] Ack. Proposed change:

OLD:

"A document defining a new IOAM Integrity Protected Option-Type MUST 
define the IOAM Option-Type header fields involved in the integrity 
protection of IOAM-Data-Fields, as done in Section 4.1.1, Section 4.2.1, 
Section 4.3.1, and Section 4.4.1 of this document."

NEW:

"A document defining a new IOAM Integrity Protected Option-Type MUST 
specify the IOAM Option-Type header fields involved in the integrity 
protection of IOAM-Data-Fields, as done in Section 4.1.1, Section 4.2.1, 
Section 4.3.1, and Section 4.4.1 of this document. The following rule 
MUST be followed to determine what IOAM Option-Type header fields to 
include in the integrity protection: ''Mutable IOAM Option-Type header 
fields MUST NOT be integrity protected. Immutable IOAM Option-Type 
header fields that affect the structure or interpretation of 
IOAM-Data-Fields, or that trigger actions on IOAM nodes, MUST be 
integrity protected. IOAM Option-Type header bit fields can be partially 
protected (e.g., flags), following the same logic on a bit level (i.e., 
mutable vs immutable bits).''"

> ## Nits

[JI] Thanks for the nits below, will fix them!

Justin

> ### proving
> 
> ¶1 of §1 mentions that IOAM might be used for "proving that a certain traffic
> flow takes a pre-defined path"; my sense is that some readers will read more
> stringent requirements into the word "proving" than are met by the current
> technologies.  I would suggest rephrasing to "verifying that a certain traffic
> flow takes a pre-defined path" or "assuring that...".
> 
> ### IOAM-Domain as "set of nodes"
> 
> ¶2 of §1 leads with a few sentences about IOAM-Domains, one of which is "An
> IOAM-Domain is a set of nodes that use IOAM."  This is true, but when read
> without the caveats of the adjacent sentences, gives a misleading sense that it
> could be a set of unrelated nodes.  Please consider joining this sentence with
> the following one ("...that use IOAM, bounded by ..."), or adding a qualifier
> like "related" ("set of related nodes").
> 
> ### in the clear
> 
> §1 ¶2 s/in clear/in the clear/.
> 
> ### the viability
> 
> §1 first numbered point, s/viability/the viability/
> 
> ### false illusion
> 
> Used a couple times in §3, I believe that "false illusion" is redundant and
> just "illusion" would do fine.
> 
> ### time synchronization
> 
> In §3.7, I suggest s/synchronization/time synchronization" since that's more
> common in RFC 7384 and is less ambiguous.
> 
> ### Delay non-mitigation
> 
> Also §3.7, I propose s/It is noted that this threat is not within the scope of
> the threats that are mitigated in this document/Note that the mechanisms in
> this document do not attempt to provide any mitigation against this threat/.
> 
> ### Trivial validation
> 
> In §5 step (3), I'd s/trivial/one-step/ -- verifying a single MAC is not
> exactly trivial, it's just simpler than the iterative scheme that's needed in
> the general case.
> 
> ### Transit Node-IDs
> 
> Also §5 step (3), I'd expound on "node-ids MUST be included in IOAM
> Data-Fields" to clarify that we need to be able to identify the nodes that
> regenerated the signature so that we can look up their keys, and so accordingly
> we require those node-ids to be present in the packet alongside the signature.
> 
> ## Notes
> 
> This review is formatted in the "IETF Comments" Markdown format, see
> https://github.com/mnot/ietf-comments.
> 
> 
>