[tsvwg] Review of draft-ietf-tsvwg-udp-options

Tom Herbert <tom@herbertland.com> Thu, 11 April 2024 16:32 UTC

MIME-Version: 1.0
From: Tom Herbert <tom@herbertland.com>
Date: Thu, 11 Apr 2024 09:32:04 -0700
Message-ID: <CALx6S378TXtzgH7aE6_C-jr4zMtK6iE5rdVaFw-4CBcwp-=UkA@mail.gmail.com>
To: tsvwg <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/qbc3I4qHDHj43vBrJWJFKoI-Byk>
Subject: [tsvwg] Review of draft-ietf-tsvwg-udp-options
Precedence: list

Hello,

I have reviewed draft-ietf-tsvwg-udp-options-32. I will reiterate my
primary objection to UDP options in that APC and Auth are defined as
SAFE options and not UNSAFE options; as described below, I believe
this can lead to harmful effects to the user in silent data corruption
and security risks. Also, I have a number of other comments below.
Additionally, UDP options diverges from the precedents set in other
similar protocols in different aspects which I point out below.

Tom

**** Major objections

#1 If a sender sends an APC the receiver should not be allowed to ignore it

APC is a UDP option containing a CRC that is stated in the draft  Per
the draft a receiver may freely ignore an APC option. This is contrary
to the precedent of other IETF protocols that contain data check
mechanisms: a data check may be optional at the sender but not
optional at a receiver, and if a receiver fails to validate a received
checksum or CRC the behavior is to discard the packet. The most
pertinent example is the UDP checksum. From RFC1122:

"If a UDP datagram is received with a checksum that is non-zero and
invalid, UDP MUST silently discard the datagram"

But the UDP options draft states "APC needs to be silently ignored
when failing by default"

That is almost the complete opposite of how every protocol with a CRC
or checksum works. For those, if CRC or checksum fails the packet is
summarily dropped-- that's not only the default behavior, it's almost
always the only behavior.

The argument that APC should be a SAFE option to ensure behavioral
compatibility with legacy receivers. However, even in the case when
UDP Options are explicitly sent to a non-legacy receiver in a FRAG
option, the receiver may still ignore the APC. There is no way in UDP
Options for a sender to *insist* that the receiver validates the CRC
in an APC, and so when a sender sends an APC it never knows for sure
whether the receiver will actually check the APC (again this is
different behavior than UDP checksum and other protocols). Also, even
if a receiver does process an APC, the draft offers no guidance on
what to do in the case that verification fails. The only requirement
is that "UDP packets with incorrect APC checksums MUST be passed to
the application by default, e.g., with a flag indicating APC
failure.", and there is no recommendation as to what the application
should do in the case of a bad CRC (IMO, minimally it's at least a
SHOULD that a packet with corrupted data be dropped).

All of this leads to the possibility of silent data corruption. When a
sender chooses to use the APC it is doing so to protect the data it's
sending. If a receiver ignores the APC then that will inevitably miss
data errors. If there are no other mechanisms to catch those errors,
then corrupted data may go all the way to the application. This is
silent data corruption which could be a very costly problem and bring
harm to the user. IMO, the risk of silent data corruption outweighs
the ease of deployment argument.

#2 If a sender sends an Auth the receiver should not be allowed to ignore it

The Auth is also a SAFE Option that can be ignored by a receiver, and
this is also a divergence from established precedent. For instance, if
the AH header is essentially equivalent to the IP Auth header, however
from RFC4302:

"If the computed and received ICVs match, then the datagram is valid,
and it is accepted.  If the test fails, then the receiver MUST discard
the received IP datagram as invalid."

Ignoring Auth is perhaps more dangerous than ignoring APC since it is
a security risk. If a sender sends an Auth and it's arbitrarily
ignored by a receiver then that means that most likely no packets are
authenticated and hence there is *no* security for the user. However,
Auth has another property that motivates it to always be an UNSAFE
option: given that authentication requires some key negotiation at
both the sender and the receiver, there doesn't seem to be a valid use
case for a sender ever sending Auth to a legacy receiver. That is, why
would the system negotiate a key for a legacy receiver that doesn't
even support UDP Options? IMO, the security risks outweigh the ease of
deployment argument.

**** Other comments on the draft

Section 3:

Definition of User-- in most other protocols User means users as in
people. I think Application would be a better term for this document

SAFE and UNSAFE options should be defined in this section

Section 5:

"UDP options provide a soft control plane to UDP."

I'm not sure what this means. Isn't UDP Options a data plane protocol?

"Past experience confirms that static length limits will always need
to be exceeded. Each implementation can limit how long/many options
there are, but the specification should not introduce such a limit."

If each implementation can set arbitrary limits that makes
interoperability really difficult. One application might accept 100
options, another might only accept one and the sender doesn't know.
This is a common problem with stateless options. I suggest to at least
referencing draft-ietf-6man-eh-limits as an example for providing
useful guidance to limit stateless options (but not mandating static
limits)

"UDP options are a framework, not a protocol."

I don't think this is true. The draft describes packet formats, sender
and receiver normative requirements for handling UDP options. Pretty
much by any definition, UDP options is a protocol. I think the point
of this principle is that UDP options are an extensible protocol where
we don't define all possible extensions up front.

"Examples herein include REQ/RES and TIME; in both cases, the option
format is defined, but the protocol that uses these is specified
elsewhere (REQ/RES for DPLPMTUD [Fa24]) or left undefined (TIME)."

Why not just define the option format with the specification of the
option? The potential problem I see is that the format specified here
might not be the final format when someone looks deep into all the
requirements. I would recommend removing any option formats and type
assignments that aren't fully specified in this draft-- they can be
specified in other docs.

"Options that do not modify user data should (by default) result in
the user data also being passed, even if, e.g., option checksums or
authentication fails."

As mentioned above, I disagree with this as a default behavior. If
networking stacks *knows* that a packet is corrupted (CRC failed) or
authentication failed (Auth failed) the behavior should be to drop the
packet-- that is how nearly all other protocols behave.

Section 7:

Please change "Next HDR" to "Next Header" to be consistent with RFC8200

"In effect, this document redefines the UDP "Length" field as a
"trailer options offset"."

This text seems unnecessary. I would simply say that the surplus area
offset can is derived from the UDP Length

"They commence with a 2-byte Option Checksum (OCS) field aligned to
the first  byte boundary (relative to the start of the IP datagram) of
that area, using zeroes for alignment."

This description should be more specific. i.e. if the offset from the
first byte of the UDP header is even then OCS begins at the surplus
area, if the offset is off then there is a zero byte followed by the
OCS

"OCS is not intended to prevent future non-standard uses of the
surplus area, nor does it enable shared use with mechanisms that do
not comply with UDP options."

What about existing non-standard uses of the surplus area?

"The design enables traversal of errant middleboxes that incorrectly
compute the UDP checksum over the entire IP payload [Fa18][Zu20],
rather than only the UDP header and UDP payload (as indicated by the
UDP header length)."

The procedures for computing and processing the OCS should be
articulated here, maybe incorporate those from
draft-fairhurst-udp-options-cco. That includes the extra pseudo header
which I believe contains the UDP options length to satisfy middleboxes
that use the IP length instead of UDP length in the pseudo header for
computing the UDP checksum.

"Like the UDP checksum, the OCS is optional under certain
circumstances and contains zero when not used. UDP checksums can be
zero for IPv4 [RFC791] and for IPv6 [RFC8200] when UDP payload already
covered by another checksum, as might occur for tunnels [RFC6935]."

Typo: "when UDP payload already" should be "when UDP payload is already"

As I've mentioned before, there is no correlation between the UDP
checksum and OCS, so if the UDP payload is already covered by another
checksum that is no indication to the user that not using OCS is safe
and we should not imply otherwise. I think the intent here is that the
OCS is required if the UDP checksum is non-zero, but optional if the
UDP checksum is zero (for either IPv4 or IPv6). If it is optional,
then we should provide guidance to the user on the risks of setting a
zero OCS.

"OCS can be disabled, e.g., to conserve energy or processing resources
or when it can improve performance"

This is dependent on implementation. Given the way modern NICs and
stacks work, using an optional OCS is more likely to be a negligible
performance improvement or even a slight performance degradation.
Note, this is also inconsistent with the precedent of TCP options
which are always covered by the TCP checksum.

">> UDP user data that is validated by a correct UDP checksum MUST be
delivered to the application layer, even if the OCS fails, unless the
endpoints have negotiated otherwise for this UDP packet's socket
pair."

Okay, but what is the application supposed to do with a bad OCS?
Should it drop the packet or pretend like nothing is wrong? What if
there was an APC in the data that would need to be validated before
accepting the packet? Please provide clear guidance to the developer
here.

Section 11.3:

"It is not an alternative to the UDP checksum because it does not
cover the IP pseudoheader or UDP header, and it is not a supplement to
the OCS because the latter covers the surplus area only."

It doesn't supplement the OCS, but the OCS is needed to protect the
APC option itself from being corrupted since the APC option can't
protect itself. For instance, if someone sends an APC option but the
kind byte flips to some unknown value then the receiver would
completely miss the APC. For this reason, it should be strongly
RECOMMENDED that the OCS be used when the APC is used (note that the
computation required to compute the CRC over the packet dwarfs that
for computing the checksum over the surplus area so the overhead of
OCS in this case is inconsequential)

"Like all SAFE UDP options, APC needs to be silently ignored when
failing by default, unless the receiver has been configured to do
otherwise."

Accepting a packet that is known to be corrupted is a major departure
from how other protocols work. If TCP checksum fails, Ethernet CRC
fails, UDP checksum fails, or IPv4 header checksum fails to be
validated then the packet is dropped (this isn't just default
behavior, this is the only behavior for those protocols)

Section 11.9:

"Authentication (AUTH), RESERVED Only"

I suggest removing this section. There's little value in reserving the
kind number without any specification of the protocol. Also, this is
making a design decision that Auth is a SAFE which I disagree with, so
if Auth is removed from the draft then we can defer the discussion as
to whether Auth should be a SAFE or UNSAFE option.

There seem to be at least three definitions of UNSAFE options in the draft:

Section 10: "Kind values in the range 192..255 are known as UNSAFE
options because might interfere with use by legacy receiving
endpoints"

Section 12: "UNSAFE options are not safe to ignore"

Section 10: "They stand in contrast to UNSAFE options, which modify
UDP user data in ways that render it unusable by legacy receivers"

Please provide a crisp definition of what UNSAFE and SAFE options are
and apply that definition consistently throughout the draft to avoid
any ambiguity.

Appendix A:

This section would be more relevant if it had reference to source code
(I believe there's some FreeBSD implementation). Basic design and
supported options would also be more useful than a list of sysctls.

Regarding the design of implementation, there's two basic approaches
to processing options in transport layer protocols:

1) Process the options completely in the kernel stack (like how TCP is
implemented)
2) Process the options completely in the application (like how QUIC is
implemented)

UDP options seems to specify a hybrid approach, where the options
would be processed in the kernel and the results are somehow passed to
the application (presumably, setting the results in ancillary data of
recvmsg). Honestly, I think this hybrid approach is going to be a hard
sell to upstream UDP options in LInux. I suggest that the second
approach should be used for UDP options.

In this model, we would add a UDP socket option indicating that the
UDP payload and any surplus area are set to the application as part of
the data in a recvmsg. Ancillary data in the receive message could
contain the length of the UDP payload and we could probably provide
the checksum over the surplus area as well for OCS computation. Once
the payload+surplus area is in userspace then the application can
process UDP options as needed (presumably using a common library for
that).

This approach greatly simplifies the kernel implementation (probably
<50 LOC) since all we need to do is figure out how to post the surplus
area data on the socket. Moving the bulk of protocol processing to
userspace is a huge simplification and allows much faster time to
implement and deploy features (this is one of the major advantages of
QUIC over TCP). Once we have that, userspace implementation can be
freely modified and extended.

The justification for UDP options as a Proposed Standard would be much
stronger if there was some deployment experience that could be
described.

[tsvwg] Review of draft-ietf-tsvwg-udp-options Tom Herbert
Re: [tsvwg] Review of draft-ietf-tsvwg-udp-options C. M. Heard