[ippm] Secdir early review of draft-ietf-ippm-ioam-data-integrity-07

Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 21 December 2023 23:09 UTC
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: secdir@ietf.org
Cc: draft-ietf-ippm-ioam-data-integrity.all@ietf.org, ippm@ietf.org
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <170320017390.55336.7890784505052194132@ietfa.amsl.com>
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Date: Thu, 21 Dec 2023 15:09:33 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/o_3kqvEucFcbYTGgnkO1pIaX6ek>
Subject: [ippm] Secdir early review of draft-ietf-ippm-ioam-data-integrity-07
Reviewer: Benjamin Kaduk
Review result: Serious Issues

# secdir review of draft-ietf-ippm-ioam-data-integrity-06
CC kaduk

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG. These comments were written primarily for the benefit of the
security area directors. Document editors and WG chairs should treat
these comments just like any other last call comments.

The summary of the review is that this is still early-stage work and we haven't
landed on the right cryptographic mechanisms yet -- what's currently in place
has some issues that would be very significant issues for some deployment
sites/scenarios, and there are some aspects that are not fully specified yet. A
lot of my comments cover topics that will merit mention in the security
considerations section, though I did not attempt to call each one out as such
or produce an exhaustive list at the end.

## Discuss

### Signature vs MAC

I'm pretty uncomfortable about using the word "signature" to describe the
protection that we're applying here.  The current text in §5 reads as if the
need for a symmetric-key mechanism is an intrinsic requirement of the mechanism
(which seems reasonable given the use-case -- true asymmetric signatures would
be too computationally expensive to be suitable here), but in that case it
seems that we are really expecting a MAC (message authentication code) rather
than a signature (that provides both data authentication and source
authentication).  In particular, we cite NIST.800-38D for the AES-256-GCM
"signature algorithm", when I think we are really using GMAC -- the NIST
document talks about GMAC a lot and does not use the word "signature" anywhere,
which seems particularly illustrative to me.

### Pre-hash vs. direct signature

The mechanism specified in §5 uses a sign(hash(...)) construction, rather than
signing directly.  This is sometimes needed for asymmetric signature primitives
like RSA, but typically a MAC construction will be able to authenticate large
quantities of data directly, without the need to pre-digest the input data. It
is worth noting that the current construction does not actually provide the
MAC's authentication guarantee to the underlying packet fields, just to the has
value.

### Key management

While the specifics of key distribution and management will inherently be out
of scope for this specification, I think we do need to settle a core question:
will each IOAM Node have its own unique key for signature generation, or do we
expect some level of key reuse across machines within a given domain?  The
latter scenario intuitively seems like it would make for a simpler deployment,
but would place some quite stringent limitations on what cryptographic
mechanisms we can use at runtime.  NIST's key usage limitations for GCM, in
particular, might actually prove prohibitive for the "one key shared across
nodes" option in practice.

### Nonce guidance

I think this document is incomplete without some discussion of the properties
that the nonce can provide, especially the factors that go into selection of
the nonce length.  I can accept that the actual "common methodology to keep the
Nonce valid only for a specific period of time" would be outside of the scope
of this document (though we could certainly provide one or more examples along
with their caveats and areas of applicability if we wanted to), but there is a
lot that we can and should talk about.  For example, consumers will need to
know if the nonce must be unpredictable as well as unique, and whether there
are factors going into the selection of nonce length other than the number of
available nonce values without reuse (and the likelihood of collision if a
random nonce-selection procedure is used).  For GMAC in particular, there is a
pretty strong argument in favor of using 96-bit nonces (since all other nonce
lengths internally get hashed down to 96 bits), even though the statistics for
96-bit nonces can hinder certain use cases.  We could probably also talk about
the benefits and risks of using sequential values as the nonce (bearing in mind
the discussions in RFC 9416) even if there is not a general recommendation one
way or the other on their use.

### Signature as nonce for transit nodes

The specification of the actual cryptographic protection algorithm in §5
includes a provision (step 2) for a transit node to compute a new signature
that accounts for the additions it has made to the IOAM data.  It seems to be
saying that the corresponding cryptographic computation involves using the
received signature value as the nonce input for producing this new signature.
While SP800-38D does seem to permit using multiple IV lengths with a given key,
using the received signature as a nonce seems to set us up for using nonces of
length other than 96 bits, which (per {{GCM-Key-usage-limitations}}) strongly
limits how much traffic we can send in a given key.  It also would entail using
data received from the network as the nonce for a given node's private key,
which seems like it would make it easy for an attacker to induce (key,nonce)
reuse.  (This might be mitigated if the transit nodes diligently validate the
received signature prior to using it as a nonce, but even that would not
obivate the need for a fully reliable replay detection mechanism for the
lifetime of the key, which seems prohibitively expensive.) This scheme also
seems to make validation quite complicated and expensive, since in order to
verify the final received signature we'd need to reconstruct the whole chain of
signature from initial encapsulation through all transit nodes.  While being
forced to validate all the intermediate signatures does provide a fairly strong
indication of non-tampering along the path, it's also a lot of recomputation. A
scheme where the transit nodes that regenerate the signature also generate a
new nonce that goes in the packet would be simpler/faster, at the cost of
trustng all the transit nodes to properly validate the received signature and
to be operating correctly.  This is a trade-off, and could be decided either
way, but if we opt to go for the stronger validation we should specifically say
that we made the choice to do that and accept the costlier validation and
replay protection.

### Bit-level hash inputs (Protected flags for trace and DEX option-types)

I think we need to provide more clear guidance on how to handle the protected
flags for the trace and DEX option-types.  First, a note in passing that
interoperability does require locking in which flag bits are (not) covered by
the MAC at the time the protected option type is specified (i.e., now), which
blocks off future extensions since any new flag bits cannot be integrity
protected using this mechanism.  That said, it would be possible to divide the
unallocated flags range into a protected and unprotected portion, to leave a
little bit of flexibility for the future. My main concern here, though, is to
concretely specify, at the bit level, what the input to the MAC (or hash)
function is -- do we mask out the unprotected bits (if so, to 0s or 1s?), or do
we literally just extract the two bits in question and make the bitstream input
to the MAC (or hash) be not byte aligned?  I strongly suggest the former, since
implementation handling for non-octet inputs to hash functions is very poor,
but reading the current text I would conclude that I must attempt to implement
the latter.  (I note on re-read that SP800-38D does require the inputs to have
bit lengths be a multiple of 8, so if we decide to use GMAC directly rather
than hash-and-MAC, we would need to specify padding if we opt for the latter
approach.)

### GCM Key usage limitations

Since we're using GMAC as the integrity protection mechanism, we need to look
at the GCM key usage limitations to know how many times a given key can be
used.  Unfortunately, what SP800-32D says is quite restrictive: its §8.3 lays
out scenarios where the total number of invocations of the authenticated
encryption function cannot exceed 2**32 for a given key.  Even if we have
unique keys per node generating a signtaure, this limit can still be hit fairly
quickly at modrately high traffic rates.  To avoid that limit we'd need to
exclusively use 96-bit IVs that use the "determinstic construction" and in that
case would be limited by the need to avoid reusing "invocation field" values on
a given device.  Depending on what deployment scenarios we are thinking about,
there's a significant chance that we'll need to have the protocol be able to
accomodate key rotations in order to avoid the key usage limits.  This might
take the place of an in-protocol key identifier field, or guidance to use some
other protocol element (such as Namespace-ID) to select which key to use.

### GMAC output length

The specification of integrity protection signature suite 1 in §5 says that
we're using AES-256-GCM and that "the signature consumes 32 octets".  I'm
having trouble understanding where that number comes from.  While it's true
that AES-256-GCM needs a 32-byte key and SHA-256 produces a 32-byte output, the
GCM authentication tag length is specified separately, and has to be one of a
handful of preordained values (128, 120, 112, 104, of 96 bits per SP800-38D). 
I would assume that we want the full 16-byte authentication tag for our
purposes, but then what are the other 16 bytes of "signature" supposed to be?

## Comments

### Requirement for security

When the abstract says

   IETF protocols require features to ensure their security.

This is true in a certain sense, but the sense may be quite subtle for
some readers.  In particular, we require (of new work) that they have
the capability to be used in a secure fashion, but in many cases we do
not require that the security mechanisms are actually used at runtime.
So, for example, while TLS 1.3 does require that you actually provide
(server) authentication, data confidentiality, and in most cases forward
secrecy, we also see that (to pick an arbitrary example) one can use the
RFC 8300 Network Service Header without using RFC 9145's integrity
protection.

Which is a long-winded way of saying that I'd propose to say "require
features that can provide secure operation" rather than "require
features to ensure their security".

### References for Ping and Traceroute

Do we want to provide references or links for Ping and/or Traceroute
(mentioned in passing in §1)?

### Threat model

§1 lays out the scenario as having an IOAM-Domain that's under a single
administrative control but invokes the possibility of data collected in
untrused or semi-trusted environments as a motivation for integrity protection.
 Is this just a risk that nodes which are supposed to be under the domain's
administrative control get compromised, or are we intending to consider a
subtler scenario with semi-trusted entities being authorized parts of the
administrative domain directly, or something else?

### detectability problem

In a certain sense a nit, but coupled enough to the core intent of the document
that I promote it to a comment.  I think we need to more concretely introduce
the "detectability problem", c.f.  §1¶3:

> The following considerations and requirements are to be taken into account in
addition to addressing the problem of detectability of any integrity breach of
the IOAM-Data-Fields collected:

The "detectability" problem hasn't been introduced yet, as such.  Maybe we want
another paragraph before this one, like

% Since arbitrary nodes and middleboxes are free to tamper with all packet
data, including IOAM fields, and the packets are (in general) processed by
other intermediary nodes before they might come to a node in a position to
verify the packet's contents, there is little value in attempting to use
cryptographic mechanisms to prevent such modifications to the packet contents. 
Instead, we limit ourselves to the "detectability problem", namely, to allow an
endpoint or IOAM control point to detect that such modification has occurred
since the generation of the IOAM fields.  (Note that, as an IOAM-layer
mechanism, the scope of modifications that can be detected may be limited to
just the IOAM fields themselves.) % % In addition to this detectability
problem, the following considerations are to be taken into account in
constructing an IOAM integrity mechanism:

(This also serves to give a bit more motivation for why we don't consider
confidentiality protection.  That content might be applicable in §3 as well,
but I put it in the proposal here since it appears prior to that section.)

### Separation of layers

While it's generally reasonable (as §3 does) to require the lower layer
protocol to handle threats at their own layer, I would probably call out that
since IOAM is defined as data fields rather than a dedicated packet structure,
we also rely on the lower layer to provide integrity protection for whch data
fields (that is, IOAM Option-Types) are present in a given packet.

### limited off-path attackers

§3 refers to RFC 9055 for definition of on- and off-path attackers.  QUIC (RFC
9000) considers an additional case of "limited on-path" attackers that are
initially off-path but in some cases can change packet routing and become on
path for some portions of a flow.  Do you think that considering this level of
subtlety is relevant for this document?  On initial read, I'm not really seeing
much where considering this distinction would actually change what we say, so
perhaps not.

### threat: false error injection

The discussion in §3 around creating a failure report for a nonexistent failure
mentions the potential for additional processing/export by IOAM nodes along the
path.  Could this be a privacy concern where the additional reported data
contains "sensitive" information (for some definition of "sensitive")?  I am
not sure if it is worth also mentioning the time of humans who get to look at
the false positive reports and analyze them, only to ultimately discard them as
bogus.

### threat: removal of fields

We cover modification and injection already, but might have a bit of discussion
on what happens when the attacker just removes some or all IOAM fields.

### IOAM-Data-Fields modification

In §3.1 I might expand "false picture of the paths in the network" to cover
both the notion of providing false paths in the network (topology-wise) and
providing false data about (real or false) paths in the network.

### Option-Type Headers scope

In §3.2 we talk about the implications of changing the header of IOAM
Option-Types, but the discussion here is intrinsically limited to the
option-types defined at the time of this writing; we should probably
acknowledge that limitation explicitly (and possibly just say that the listed
implications are intended to be examples rather than exhaustive, as well). We
might also want to say that modifying the headers can have similar effects to
modifying the data-fields directly, in terms of making the interpreted data
useless.

### Namespace-ID modification

> Another possibility for the attacker is to change the context of
IOAM-Data-Fields by modifying the Namespace-ID field in IOAM Option-Type
headers, which makes the integrity protection of IOAM-Data-Fields completely
useless.

This "completely useless" probaly merits further exposition.  (That is to say,
I don't think I actually know what you mean by it.)

### Injection defenses

While I agree that the impacts of injection (§3.3,3.4) are similar to
modification in general, it does seem that an IOAM deployment would be able to
protect itself from injection (but not modification) if it know a priori what
IOAM mechanisms were going to be in use on each flow, so that unexpected ones
could be rejected.  That said, this scenario may not be worth mentioning in the
document, since an attacker in a position to inject IOAM content into otherwise
valid packets would very likely also be able to modify preexisting IOAM
content...but an off-path attacker could inject wholly new packets with IOAM
content while being unable to modify existing IOAM content.

### Replay scope

§ 3.5 mentions that an attacker can replay an IOAM Option-Type on a new data
packet as a specific example; I'd suggest prefacing that remark with a
statement that "In addition to wholesale replay of old packets" to highlight
this scenario as a special case of the more generic replay topic. As far as
impact goes, I might also add that replaying old IOAM data might allow an
attacker to mask other elements of an attack, such as a change in network path.

### Clarity on Management Attacks

I'm not entirely sure what the scope and intent of §3.6 is.  While the overall
statement that management-plane attacks are out of scope for this document is
clear (and probably reasonable), I don't have a picture of what attacks are
envisioned -- are we looking at changing the data reported by IOAM as it goes
from IOAM nodes to a reporting system?  Or are we including management-plane
traffic that configures nodes on what IOAM traffic to expect/process, what
IOAM-domain and/or namespace to participate in, etc?

A message of "once it leaves the IOAM layer, we can't do more for it" is simple
and easy to explain, but some of the other more complicated topics may also be
interesting to talk about if we want to consider the security of the overall
ecosystem.

### Anti-Delay

While §3.7 does a reasonable job discussing delay, there is a niche case of
anti-delay attacks possible as well, where an attacker has acces to a faster
path and can skew the delay measurements in the "wrong way".  I am not sure if
this presents any sufficiently interesting consequences to merit its own
mention, though

### DEX Integrity Protected

§4 lists IOAM Option-Type 68 as allocated to DEX Integrity Protected, but I do
not see this allocation reflected at
https://www.iana.org/assignments/ioam/ioam.xhtml .

### Order of Headers and Data

I'd suggest being very explicit about the relative ordering of the Option-Type
header being integrity protected, the Integrity Protection Header, and the
actual IOAM data/data-fields.  Almost everything I see suggests that we insert
the Integrity Protection Header between the Option-Type header being protected
and its corresponding data, but in §4.4 we say that the optional fields in the
DEX Option-Type header are treated as optional IOAM-Data-Fields while appearing
before the Integrity Protection Header -- that leaves me unsure where the data
fields go for the other Option-Header types.  Some statement and/or diagram
(perhaps in toplevel §4) would avoid any ambiguity.

### Protected flags for POT option-type

Right now we (implicitly, by not protecting the IOAM-POT-Flags field at all)
say that any future POT flags will not be integrity protected.  Is that the
right choice?  There are not currently any POT flags defined, so it's a little
hard to predict what kinds of use cases they might find.  As for the trace
option-types mentioned above, it would be possible to subdivide the flags space
into a protected and unprotected range, to leave a little flexibility for the
future.

### Protected flags for DEX Option-Type

There are currently no "Flags" defined for DEX, which leaves it in a similar
state as POT.  I comment separately on DEX because of the "Extension-Flags"
field -- I would expect that all of this field, not just two bits, should be
protected.  That's because these bits determine the structure of the following
packet, which is something that we have been consistently applying protection
throughout the document, and I don't see a reason to diverge from that pattern.

### Mutable fields to skip for signing

The discussion in §5 very quickly glosses over "IOAM-Data-Fields supposed to be
modified by other IOAM nodes on the path MUST be excluded from the signature". 
This is actually a critical point for constructing and validating ciphertexts,
and seems like it would merit a longer treatment.  Most notably, I would
probably want to have a central table (or maybe add to the IANA registry?) to
indicate "does this field get skipped for integrity protection: Y/N" to try to
leave out any guesswork by the implementor as to what is mutable or not.

### Signature-as-nonce is taking action

I'd clarify in §5 step (3) first bullet point that an intermediate node using a
received signature as a nonce counts as "taking action" triggered by a field in
the protected header and thus incurs the obligation to validate first.  So this
requirement would be in force regardless of whether Loopback or Active are
used, IMO.

### Guidance on what fields to integrity protect

In §6.1 I'd want to include alongside the requirement for new integrity
protected option types to specfy which fields they protect, some guidance on
which sorts of things to protect or not protect.  (This would be a place to
codify the behavior I noted as being implicitly present in the document, that
fields that affect the structure/interpretation of the rest of the packet
should be integrity protected.)

## Nits

### proving

¶1 of §1 mentions that IOAM might be used for "proving that a certain traffic
flow takes a pre-defined path"; my sense is that some readers will read more
stringent requirements into the word "proving" than are met by the current
technologies.  I would suggest rephrasing to "verifying that a certain traffic
flow takes a pre-defined path" or "assuring that...".

### IOAM-Domain as "set of nodes"

¶2 of §1 leads with a few sentences about IOAM-Domains, one of which is "An
IOAM-Domain is a set of nodes that use IOAM."  This is true, but when read
without the caveats of the adjacent sentences, gives a misleading sense that it
could be a set of unrelated nodes.  Please consider joining this sentence with
the following one ("...that use IOAM, bounded by ..."), or adding a qualifier
like "related" ("set of related nodes").

### in the clear

§1 ¶2 s/in clear/in the clear/.

### the viability

§1 first numbered point, s/viability/the viability/

### false illusion

Used a couple times in §3, I believe that "false illusion" is redundant and
just "illusion" would do fine.

### time synchronization

In §3.7, I suggest s/synchronization/time synchronization" since that's more
common in RFC 7384 and is less ambiguous.

### Delay non-mitigation

Also §3.7, I propose s/It is noted that this threat is not within the scope of
the threats that are mitigated in this document/Note that the mechanisms in
this document do not attempt to provide any mitigation against this threat/.

### Trivial validation

In §5 step (3), I'd s/trivial/one-step/ -- verifying a single MAC is not
exactly trivial, it's just simpler than the iterative scheme that's needed in
the general case.

### Transit Node-IDs

Also §5 step (3), I'd expound on "node-ids MUST be included in IOAM
Data-Fields" to clarify that we need to be able to identify the nodes that
regenerated the signature so that we can look up their keys, and so accordingly
we require those node-ids to be present in the packet alongside the signature.

## Notes

This review is formatted in the "IETF Comments" Markdown format, see
https://github.com/mnot/ietf-comments.
[ippm] Secdir early review of draft-ietf-ippm-ioa… Benjamin Kaduk via Datatracker
Re: [ippm] Secdir early review of draft-ietf-ippm… Justin Iurman