[TLS] Spec issue with RFC 7627 (EMS) and resumption

David Benjamin <davidben@chromium.org> Mon, 25 October 2021 20:01 UTC

From: David Benjamin <davidben@chromium.org>
Date: Mon, 25 Oct 2021 16:01:07 -0400
Hi all,

In diagnosing an interop issue, I noticed RFC 7627 did not describe the
correct server behavior for EMS very well. Seemingly as a result, some
server implementation has gotten this wrong. I'd like to fix this in the
spec so this doesn't happen again. I think, at minimum, we need to replace
the last paragraph of section 5.4.

The issue is a server that *doesn't* implement EMS, when presented a
ClientHello containing a ticket or session ID by a server that *did* implement
EMS, must ignore the session and continue with a full handshake. Failing to
do so will trip the client check in Section 5.3, "If a client receives a
ServerHello that accepts an abbreviated handshake, [...]". This is
important to meet these three properties:

- If the client and server both support EMS, the connection must negotiate
- On resumption, the EMS status of the connection must match the EMS status
of the session
- In order for EMS to be safely deployable, it must be possible to roll EMS
out gradually, or roll it back, without breaking connections. This means a
mixed pre-EMS and post-EMS server deployment must work.

Note that, although this behavior is only visible at the pre-EMS server
(not directly in scope for this document), it is actually a requirement on
the post-EMS server. When the post-EMS server issues a session, it must
arrange for the pre-EMS server to ignore it. For example, if the pre-EMS
server rejects sessions with unparsable fields (the safest option), the
post-EMS server can add a new field to the session state serialization.
Failing that, it can bump some internal version number. Another strategy is
to rotate session ticket keys alongside the version, but this can be tricky
the way deployments and software updates are often split.

There's an analogous, though less likely, client scenario that a pre-EMS
client must not offer a post-EMS session. Otherwise it will run afoul of a
server requirement. This can be relevant for clients that serialize their
session cache.

As far as I can tell, RFC 7627 does not specify any of this. The first
paragraph of section 5.4 talks about adding a flag, but doesn't talk about
how pre-EMS servers interact with that flag. The last paragraph discusses
this scenario, but says something very strange, if not plain wrong:

   If the original session uses an extended master secret but the
   ClientHello or ServerHello in the abbreviated handshake does not
   include the extension, it MAY be safe to continue the abbreviated
   handshake since it is protected by the extended master secret of the
   original session.  This scenario may occur, for example, when a
   server that implements this extension establishes a session but the
   session is subsequently resumed at a different server that does not
   support the extension.  Since such situations are unusual and likely
   to be the result of transient or inadvertent misconfigurations, this
   document recommends that the client and server MUST abort such


First, the "MAY" is immediately contradicted by the following "MUST", and
by section 5.3. It seems it should have been an English lowercase "may",
not a normative RFC 2119 "MAY". It is also wrong in calling this situation
"unusual and likely to be the result of transient or inadvertent
misconfigurations". Rather, it is the natural transition state of any large
server rollout. I think we need to delete that entire paragraph and replace
it with text that describes the rules above. If we were doing a whole new
version of the document, I think the text could do with reorganization. But
that may not be worth doing, given folks should be using TLS 1.3 now.

Thoughts? I can put together some replacement text if folks agree. What
would be the best way to do this? Just an erratum?