Re: [MMUSIC] Benjamin Kaduk's Discuss on draft-ietf-mmusic-sdp-uks-06: (with DISCUSS and COMMENT)

"Martin Thomson" <> Tue, 06 August 2019 07:42 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3A26A120059; Tue, 6 Aug 2019 00:42:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.701
X-Spam-Status: No, score=-2.701 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key) header.b=fpX5pCOs; dkim=pass (2048-bit key) header.b=mtiCyxyz
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id TvUUXx9hQQ67; Tue, 6 Aug 2019 00:42:35 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 36294120121; Tue, 6 Aug 2019 00:42:35 -0700 (PDT)
Received: from compute1.internal (compute1.nyi.internal []) by mailout.nyi.internal (Postfix) with ESMTP id 1FA1820FBE; Tue, 6 Aug 2019 03:42:34 -0400 (EDT)
Received: from imap2 ([]) by compute1.internal (MEProxy); Tue, 06 Aug 2019 03:42:34 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; h=mime-version:message-id:in-reply-to:references:date:from:to :cc:subject:content-type; s=fm2; bh=jBtPCCo6KAP4BZ5siz5MbSeXveK/ 8c0MmkcyVmJ9UHo=; b=fpX5pCOsFq6xKMd/PTMRgHTeWXpHpVtSQiSrLVHikdO+ YuCrfwGkpuXtPGAJd/7+kp7IXsB1j+l9owZg37vOZi2nSJ/YuYh0Ist0/RBFr8Bg Mp90GSu4Q8x0/gdIQo8uzQTjoAkWRwnGMl0mSlBengaIxIFdZeMZzkKROS7u6RTo 3xXVmPFGt82sSg1MOaHh8cNgSAY/hJP6W12nTGTN8BDCqt/MBs4sBZZwNkKbcPG6 GioRYPnYXW8GfBP95kY3YlGyvtB3aZFNVU6U2maehCPamuU3S8LDK/GcgIhy54Aq OSMuhYsPU5zFR3enWPOXZQDOBDZjxEuuzpM2XlE00Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=jBtPCC o6KAP4BZ5siz5MbSeXveK/8c0MmkcyVmJ9UHo=; b=mtiCyxyzDBk5C29diLdYE+ ae7sU+0UFuHfobcj8IfTvPg8MMLnaUN8GBzBZf1dtJH+4fLVGJqGeMsA53OOpegD aFBx0Elpy17mpb5Y98s2Tdj3T5jo7KUQhQwBD8Q7ysjaZkXGwfmfiXPaTjQ8Aws4 YhBoLZjs6UR/4LqPmdSjTeck5zjLeGlcyR4X5ZJQhdCsZzvgJ+aZQqH3RGE2e55i Pk+AUbeaMXw/DUkAip3OcqS5uzWJvt5GUFWfBRE69tplyexXD2k3vySRzljp/rVe 20Z4AMHb6+CM7e9jjGCdq2FtZahVRjKiJWC0HEd6TSQnl7f9Fmd3+SHywfsh3xFA ==
X-ME-Sender: <xms:aS9JXdy9tQQl0tzvovdW7pCJUYplPyX188m3ugSGyXAMMCvQNM1zgA>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddruddtledguddvhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtsehttdertderreejnecuhfhrohhmpedfofgr rhhtihhnucfvhhhomhhsohhnfdcuoehmtheslhhofigvnhhtrhhophihrdhnvghtqeenuc ffohhmrghinhepghhithhhuhgsrdgtohhmnecurfgrrhgrmhepmhgrihhlfhhrohhmpehm theslhhofigvnhhtrhhophihrdhnvghtnecuvehluhhsthgvrhfuihiivgeptd
X-ME-Proxy: <xmx:aS9JXRh06rntO9PkbIIYtwmCigN678foKfUKr5VHOLSiFp-eHNtzFg> <xmx:aS9JXSXr62RK1tAESf-FSyVFe_Y8WOY67OpxR88manRABJaTsv9Gow> <xmx:aS9JXT1m0djTmN1BeIebIXRzIYwVn0XLVONsqGCSqBZ47MRv5ctIMA> <xmx:ai9JXdwKzAjG_fLSWLq8Pft6RFsIrmVLM_UUtuGsVeNWVwjKxeedkg>
Received: by mailuser.nyi.internal (Postfix, from userid 501) id 59F20E00A2; Tue, 6 Aug 2019 03:42:33 -0400 (EDT)
X-Mailer: Webmail Interface
User-Agent: Cyrus-JMAP/3.1.6-808-g930a1a1-fmstable-20190805v2
Mime-Version: 1.0
Message-Id: <>
In-Reply-To: <>
References: <>
Date: Tue, 06 Aug 2019 17:42:36 +1000
From: "Martin Thomson" <>
To: "'Benjamin Kaduk'" <>, "The IESG" <>
Cc:, mmusic <>, "" <>
Content-Type: text/plain
Archived-At: <>
Subject: Re: [MMUSIC] =?utf-8?q?Benjamin_Kaduk=27s_Discuss_on_draft-ietf-mmus?= =?utf-8?q?ic-sdp-uks-06=3A_=28with_DISCUSS_and_COMMENT=29?=
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 06 Aug 2019 07:42:40 -0000

Hi Ben,

Thanks for the thorough review.  I've made a few clarifications based on this.

I've combined these changes with those suggested by Roman, which you can preview here:

On Tue, Aug 6, 2019, at 02:28, Benjamin Kaduk via Datatracker wrote:
> There are both pretty minor points, in the grand scheme of things, but I
> do think it would be hazardous to publish the document without
> addressing them.
> The semantics surrounding the "external_id_hash" TLS extension seem
> insufficiently specified to admit interoperable implementation.  In
> Section 3.2 we read that it "carries a hash of the identity assertion that
> communicating peers have exchanged", as if there was a single
> distinguished identity assertion for the session.  But, if we read on,
> we learn that there is not one identity assertion, but (in the general
> case) two, one for each party, and that what seems to actually be
> intended is that each party sends the hash of the identity assertion
> corresponding to the sender's identity, with the requirements to send an
> empty external_id_hash if the party in question is not providing
> identity bindings.  Additionally, the text about having an empty
> "external_id_hash" extension in ClientHello or
> ServerHello/EncryptedExtensions is written in a way that implies that
> all parties generate a ClientHello and all parties generate a
> ServerHello or EncryptedExtensions message, whereas these are actually
> conditional on whether the party is acting as (D)TLS client or server.
> Similarly, the current text for the last sentence of Section 3.2 ("In
> TLS 1.3, the "external_id_hash" extension MUST be sent in the
> EncryptedExtensions message.") can be (mis)read as implying that all
> EncryptedExtensions messages sent by TLS servers that implement this
> specification must include this extension, which would violate the TLS
> extension-negotiation model since it mandates the server sending an
> extension without regard to the client having indicated support for the
> extension.  Perhaps "MUST NOT be sent in the TLS 1.3 ServerHello message"
> conveys the restriction more clearly?
> (A similar comment applies to the corresponding statement in Section
> 4.3, which interestingly enough already has a "In TLS 1.3, the
> "external_session_id" extension MUST NOT be included in a ServerHello."
> disclaimer in addition to the problematic sentence.)
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> Section 2
>    A similar attack can be mounted without any communications
>    established based on the SDP "fingerprint" attribute [FINGERPRINT].
> At this point in the document, I don't know how to interpret "without
> any communications established based on".

Yeah, that is pure gibberish.  Sorry about that.  Changed to:

A similar attack can be mounted based solely on the SDP `fingerprint` attribute
[FINGERPRINT] without compromising the integrity of the signaling channel.

> Section 2.1
>    1.  An attacker can only modify the parts of the session signaling
>        for a session that they are part of, which is limited to their
>        own offers and answers.
> nit(?): the first part of the sentence suggests that the attacker can
> modify their peers' offers/answers, and it's not entirely clear (from a
> rhetorical sense) how the latter clause is supposed to relate to the
> first.


   1. An attacker can only modify the parts of the session signaling that they are
   responsible for producing, namely their own offers and answers.

>    The combination of these two constraints make the spectrum of
>    possible attacks quite limited.  An attacker is only able to switch
>    its own certificate fingerprint for a valid certificate that is
>    acceptable to its peer.  Attacks therefore rely on joining two
>    separate sessions into a single session.
> nit: It's not clear to me (at this point in the document) whether this
> is "victim A connects to attacker and also to victim B, and attacker
> merges the first session into the second", or "victim A connects to
> attacker and attacker connects to victim B, and attacker splices the two
> together and steps out of the way".  (I assume the latter, but the text
> hasn't clarified it yet.)

It's really difficult to provide some context without providing all of it.  Writing out the gory details is tricky, so your little uncertainty is hard to resolve (it's both, but the former has greater potential for mischief).  I've resorted to forward references here rather than explaining this further at this stage of the document.

> Section 2.3
>    Third-party call control (3PCC) [RFC3725] is a technique where a
>    signaling peer establishes a call that is terminated by a different
>    entity.  This attack is very similar to the 3PCC technique, except
>    where the TLS peers are aware of the use of 3PCC.
> nit: Rhetorically-wise, I don't know what "except" is intended to mean
> here.  Is the attack like 3PCC but in normal 3PCC the peers are unaware
> of 3PCC use and in the attack they are?  The other way around?  ("except
> that in the 3PCC case the TLS peers are aware of its use" would
> disambiguate fine, I think.)

This turns out to be subtly wrong.  The "except" is in two parts now:

Third-party call control (3PCC) [RFC3725] is a technique where a signaling
peer establishes a call that is terminated by a different entity.  This attack
is very similar in effect to some 3PCC practices, so use of 3PCC could appear to
be an attack.  However, 3PCC that follows RFC 3725 guidance is unaffected, and
peers that are aware of changes made by a 3PCC controller can correctly
distinguish actions of a 3PCC controller from attack.

> Section 3.2
>    A WebRTC identity assertion is provided as a JSON [JSON] object that
>    is encoded into a JSON text.  The resulting string is then encoded
>    using UTF-8 [UTF8].  The content of the "external_id_hash" extension
> I don't really understand the separate UTF-8 step -- RFC 8259 already
> requires text to be UTF-8 encoded.

As I said in response to Roman's email, I don't think that's true.  RFC 8259 says that you need to encode JSON texts using UTF-8 if you aren't in a closed system:

   JSON text exchanged between systems that are not part of a closed
   ecosystem MUST be encoded using UTF-8 [RFC3629].

But this has a comparison requirement that is stronger than "not part of a closed system", so I think that  the encoding rule bears repeating.

> I think this section would be easier to read if the different cases of
> identity encoding/transmission were broken out into a bulleted or
> enumerated list (the latter might make it easier to extend in the
> future): right now I think we have (1) pure WebRTC, (2) SDP "identity",
> and (3) SIP PASSPoRT, but I'm not 100% sure I'm reading the text
> properly.
> If that's done, it would also be a good opportunity to clarify that the
> note about hash agility applies to the TLS extension as a whole, not
> just the PASSPoRT case.

I have done that.  It required a tiny bit of glue, but it seems like it worked well.

>                                                     This allows its peer
>    to include a hash of its identity binding.  An endpoint without an
>    identity binding MUST include an empty "external_id_hash" extension
>    in its ServerHello or EncryptedExtensions message, to indicate
>    support for the extension.
> nit: and that it has validated the client's identity binding?

That isn't possible until the connection is established generally.  Maybe it indicates a willingness to validate an identity binding, but really all that is needed here is that it is willing to receive a binding.

>    A peer that receives an "external_id_hash" extension that does not
>    match the value of the identity binding from its peer MUST
>    immediately fail the TLS handshake with an error.  This includes
>    cases where the binding is absent, in which case the extension MUST
>    be present and empty.
> nit: I'd suggest rewording the second sentence as follows (since the
> conditional logic on "extension present but binding absent" could be
> confusing:
> % The absence of an identity binding does not relax this requirement --
> % an extension received when the peer has not provided an identity
> % binding on the signalling layer must still be validated to have the
> % zero-length extension body.

I went with "The absence of an identity binding
does not relax this requirement; if a peer provided no identity binding, a
zero-length extension MUST be present to be considered valid."

>    A peer that receives an identity binding, but does not receive an
>    "external_id_hash" extension MAY choose to fail the connection,
>    though it is expected that implementations written prior to the
>    definition of the extensions in this document will not support both
>    for some time.
> nit: I don't think the comma after "binding" is needed.
> Also, is the "not" intended?  I'm not entirely sure what "both" is
> intended to refer to.

Reworded to:
Implementations written prior to the definition of the extensions in this
document will not support this extension for some time.  A peer that receives an
identity binding but does not receive an `external_id_hash` extension MAY accept
a TLS connection rather than fail a connection where the extension is absent.

> Section 4.1
>    another honest endpoint.  The attacker convinces the endpoint that
>    their session has completed, and that the session with the other
>    endpoint has succeeded.
> Even with the benfit of the figure, I'm not sure I am properly
> understanding the distinction between "completed" and "succeeded".  Is
> the idea that the "completed" session finishes a DTLS handshake and then
> immediately hangs up?  Or is this entirely at the signalling layer?

reworded to:
"The attacker convinces the first endpoint that their session with the attacker has
been successfully established, but media is exchanged with the other honest
endpoint.  The attacker permits the session with the other honest endpoint to
complete only to the extent necessary to convince the other honest endpoint to
participate in the attacked session."

>                  For this reason, it might be necessary to permit the
>    signaling from Patsy to reach Norma to allow Patsy to receive a call
>    setup completion signal, such as a SIP ACK.  Once the second session
>    is established, Mallory might cause DTLS packets sent by Norma to
>    Patsy to be dropped.  It is likely that these DTLS packets will be
>    discarded by Patsy as Patsy will already have a successful DTLS
>    connection established.
> nit: Is this "it is likely these packets would be discarded even if
> Mallory lets them through"?

Yes.  "However, if Mallory allows DTLS packets to pass, it is
likely that Patsy will discard them as Patsy will already have a successful DTLS
connection established."

>    This attack creates an asymmetry in the beliefs about the identity of
>    peers.  However, this attack is only possible if the victim (Norma)
>    is willing to conduct two sessions nearly simultaneously, if the
>    attacker (Mallory) is on the network path between the victims, and if
>    the same certificate - and therefore SDP "fingerprint" attribute
>    value - is used in both sessions.
> This is the same certificate used by Norma in both sessions, right?

Ack.  Fixed.

> Section 4.3
>    Where RTP and RTCP [RTP] are not multiplexed, it is possible that the
>    two separate DTLS connections carrying RTP and RTCP can be switched.
>    This is considered benign since these protocols are designed to be
>    distinguishable.  RTP/RTCP multiplexing is advised to address this
>    problem.
> What does "switched" mean?  That Mallory could swap the data contents
> around as an active MITM?


RTP and RTCP are usually multiplexed onto the same path today, where they need to be fully distinguishable.  This is by far the most common deployment model, but that doesn't really matter for the purposes of analysis.

If RTP and RTCP use different DTLS connections, an attacker could switch port numbers (for instance) on the two and cause peers to believe that RTP is RTCP and vice versa.  That's highly unlikely to produce meaningful results SRTP (RFC 3711) provides key separation.  Key separation would be enough if it weren't for ludicrously short authentication tags in common use (we still see 32-bit tags; bits, not bytes!).  If key separation fails, the protocols are generally distinguishable, though not perfectly so where multiplexing is not chosen.

> If my understanding is correct, we should probably add a bit more text
> here indicating the need for more validation than just the validation of
> the TLS extension contents that this document describes.

Yes, there are multiple validation steps.  This document assumes that the validation described in RFC 8122 (a=fingerprint) is always followed.  It also assumes that if you are using one of the identity assertions, you follow the respective validation processes.

Added to both sections (the bracket part only for the first):
Any validation performed of the `external_id_hash|external_session_id` extension is done in addition
to the validation required by {{!FINGERPRINT}} [and any identity assertion
> Section 5
>    In the absence of any higher-level concept of peer identity, the use
>    of session identifiers does not prevent session concatenation if the
>    attacker is able to copy the session identifier from one signaling
>    session to another.  This kind of attack is prevented by systems that
>    enable peer authentication such as WebRTC identity [WEBRTC-SEC] or
>    SIP identity [SIP-ID].  However, session concatenation remains
>    possible at higher layers: an attacker can establish two independent
>    sessions and simply forward any data it receives from one into the
>    other.
> And in such a case the attacker has access to the media plaintext, too,
> right?

Yeah, but that doesn't expose the victims to the consequences of mistaken channel bindings.

>    Use of the "external_session_id" does not guarantee that the identity
>    of the peer at the TLS layer is the same as the identity of the
>    signaling peer.  The advantage an attacker gains by concatenating
>    sessions is limited unless it is assumed that signaling and TLS peers
>    are the same.  If a secondary protocol uses the signaling channel
>    with the assumption that the signaling and TLS peers are the same
>    then that protocol is vulnerable to attack unless they also validate
>    the identity of peers at both layers.
> Is this paragraph describing a case like in
> draft-ietf-rtcweb-security-arch, where we do send (and verify) identity
> assertions at the signalling layer?  That is, the verification at the
> IdP counts for validation at the "secondary protocol" layer and the
> verification that the TLS extension matches the signalling constitutes
> verification at the TLS layer, thereby achieving the validation "at both
> layers"?

WebRTC doesn't include validation at both layers in the way that this describes.

Imagine that Mallory pretends that they own fingerprints and assertions from Norma when communicating with Patsy and (vice versa) pretends that they own the Patsy's values when communicating with Norma.  They can then forward packets.  That puts Mallory on the signaling path, but not the media path (aside from having to forward encrypted packets).  If Norma and Patsy were to assume that signaling and media peers were the same, that leads to problems.

Neither Norma nor Patsy are under misapprehension of each other's identity, but they might not realize that they have an on-path attacker that can see (and modify) signaling.  If they were to send something derived from signaling in DTLS, that might fail.  If they were to send something derived from DTLS in signaling, that might leak to Mallory.

>    It is important to note that multiple connections can be created
>    within the same signaling session.  An attacker might concatenate
>    only part of a session, choosing to terminate some connections (and
>    optionally forward data) while arranging to have peers interact
>    directly for other connections.  It is even possible to have
>    different peers interact for each connection.  This means that the
>    actual identity of the peer for one connection might differ from the
>    peer on another connection.
> How do or could the mitigations specified in this document address these
> attacks?

The point of this text is to make it clear that no guarantees transfer from media to signaling or between different connections.

With the mitigations described AND identity assertions, you can be confident that the peer at the media layer is the peer that the identity assertion reports, even in the presence of attackers on the signaling path.

With only fingerprints, the signaling system needs to have integrity, including safeguards on replacement of fingerprints by endpoints.

I've added those notes as security considerations.