Re: [MMUSIC] Benjamin Kaduk's Discuss on draft-ietf-mmusic-sdp-uks-06: (with DISCUSS and COMMENT)

Benjamin Kaduk <> Tue, 06 August 2019 19:32 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DB61812013C; Tue, 6 Aug 2019 12:32:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id m1tQVQ1f8lKz; Tue, 6 Aug 2019 12:32:16 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id BE539120059; Tue, 6 Aug 2019 12:32:15 -0700 (PDT)
Received: from ([]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by (8.14.7/8.12.4) with ESMTP id x76JWA03005506 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 6 Aug 2019 15:32:13 -0400
Date: Tue, 6 Aug 2019 14:32:10 -0500
From: Benjamin Kaduk <>
To: Martin Thomson <>
Cc: The IESG <>,, mmusic <>, "" <>
Message-ID: <>
References: <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <>
Subject: Re: [MMUSIC] Benjamin Kaduk's Discuss on draft-ietf-mmusic-sdp-uks-06: (with DISCUSS and COMMENT)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 06 Aug 2019 19:32:20 -0000

On Tue, Aug 06, 2019 at 05:42:36PM +1000, Martin Thomson wrote:
> Hi Ben,
> Thanks for the thorough review.  I've made a few clarifications based on this.
> I've combined these changes with those suggested by Roman, which you can preview here:

Thanks; I left a few comments there.

> On Tue, Aug 6, 2019, at 02:28, Benjamin Kaduk via Datatracker wrote:
> > There are both pretty minor points, in the grand scheme of things, but I
> > do think it would be hazardous to publish the document without
> > addressing them.
> > 
> > The semantics surrounding the "external_id_hash" TLS extension seem
> > insufficiently specified to admit interoperable implementation.  In
> > Section 3.2 we read that it "carries a hash of the identity assertion that
> > communicating peers have exchanged", as if there was a single
> > distinguished identity assertion for the session.  But, if we read on,
> > we learn that there is not one identity assertion, but (in the general
> > case) two, one for each party, and that what seems to actually be
> > intended is that each party sends the hash of the identity assertion
> > corresponding to the sender's identity, with the requirements to send an
> > empty external_id_hash if the party in question is not providing
> > identity bindings.  Additionally, the text about having an empty
> > "external_id_hash" extension in ClientHello or
> > ServerHello/EncryptedExtensions is written in a way that implies that
> > all parties generate a ClientHello and all parties generate a
> > ServerHello or EncryptedExtensions message, whereas these are actually
> > conditional on whether the party is acting as (D)TLS client or server.
> > 
> > Similarly, the current text for the last sentence of Section 3.2 ("In
> > TLS 1.3, the "external_id_hash" extension MUST be sent in the
> > EncryptedExtensions message.") can be (mis)read as implying that all
> > EncryptedExtensions messages sent by TLS servers that implement this
> > specification must include this extension, which would violate the TLS
> > extension-negotiation model since it mandates the server sending an
> > extension without regard to the client having indicated support for the
> > extension.  Perhaps "MUST NOT be sent in the TLS 1.3 ServerHello message"
> > conveys the restriction more clearly?
> > (A similar comment applies to the corresponding statement in Section
> > 4.3, which interestingly enough already has a "In TLS 1.3, the
> > "external_session_id" extension MUST NOT be included in a ServerHello."
> > disclaimer in addition to the problematic sentence.)

(I'm not sure we quite got all of these in the linked pull request.)

> > 
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > 
> > Section 2
> >    A similar attack can be mounted without any communications
> >    established based on the SDP "fingerprint" attribute [FINGERPRINT].
> > 
> > At this point in the document, I don't know how to interpret "without
> > any communications established based on".
> Yeah, that is pure gibberish.  Sorry about that.  Changed to:
> A similar attack can be mounted based solely on the SDP `fingerprint` attribute
> [FINGERPRINT] without compromising the integrity of the signaling channel.
> > Section 2.1
> > 
> >    1.  An attacker can only modify the parts of the session signaling
> >        for a session that they are part of, which is limited to their
> >        own offers and answers.
> > 
> > nit(?): the first part of the sentence suggests that the attacker can
> > modify their peers' offers/answers, and it's not entirely clear (from a
> > rhetorical sense) how the latter clause is supposed to relate to the
> > first.
> better?
>    1. An attacker can only modify the parts of the session signaling that they are
>    responsible for producing, namely their own offers and answers.

yes, thanks

> >    The combination of these two constraints make the spectrum of
> >    possible attacks quite limited.  An attacker is only able to switch
> >    its own certificate fingerprint for a valid certificate that is
> >    acceptable to its peer.  Attacks therefore rely on joining two
> >    separate sessions into a single session.
> > 
> > nit: It's not clear to me (at this point in the document) whether this
> > is "victim A connects to attacker and also to victim B, and attacker
> > merges the first session into the second", or "victim A connects to
> > attacker and attacker connects to victim B, and attacker splices the two
> > together and steps out of the way".  (I assume the latter, but the text
> > hasn't clarified it yet.)
> It's really difficult to provide some context without providing all of it.  Writing out the gory details is tricky, so your little uncertainty is hard to resolve (it's both, but the former has greater potential for mischief).  I've resorted to forward references here rather than explaining this further at this stage of the document.

That should work well.

> > Section 2.3
> > 
> >    Third-party call control (3PCC) [RFC3725] is a technique where a
> >    signaling peer establishes a call that is terminated by a different
> >    entity.  This attack is very similar to the 3PCC technique, except
> >    where the TLS peers are aware of the use of 3PCC.
> > 
> > nit: Rhetorically-wise, I don't know what "except" is intended to mean
> > here.  Is the attack like 3PCC but in normal 3PCC the peers are unaware
> > of 3PCC use and in the attack they are?  The other way around?  ("except
> > that in the 3PCC case the TLS peers are aware of its use" would
> > disambiguate fine, I think.)
> This turns out to be subtly wrong.  The "except" is in two parts now:
> Third-party call control (3PCC) [RFC3725] is a technique where a signaling
> peer establishes a call that is terminated by a different entity.  This attack
> is very similar in effect to some 3PCC practices, so use of 3PCC could appear to
> be an attack.  However, 3PCC that follows RFC 3725 guidance is unaffected, and
> peers that are aware of changes made by a 3PCC controller can correctly
> distinguish actions of a 3PCC controller from attack.

Ah, thank you.

> > Section 3.2
> > 
> >    A WebRTC identity assertion is provided as a JSON [JSON] object that
> >    is encoded into a JSON text.  The resulting string is then encoded
> >    using UTF-8 [UTF8].  The content of the "external_id_hash" extension
> > 
> > I don't really understand the separate UTF-8 step -- RFC 8259 already
> > requires text to be UTF-8 encoded.
> As I said in response to Roman's email, I don't think that's true.  RFC 8259 says that you need to encode JSON texts using UTF-8 if you aren't in a closed system:
>    JSON text exchanged between systems that are not part of a closed
>    ecosystem MUST be encoded using UTF-8 [RFC3629].
> But this has a comparison requirement that is stronger than "not part of a closed system", so I think that  the encoding rule bears repeating.

I guess in my head once you are doing webrtc/etc. you are out of a closed
system and the loophole disappears, but there's a reason this was only a
non-blocking comment.

> > I think this section would be easier to read if the different cases of
> > identity encoding/transmission were broken out into a bulleted or
> > enumerated list (the latter might make it easier to extend in the
> > future): right now I think we have (1) pure WebRTC, (2) SDP "identity",
> > and (3) SIP PASSPoRT, but I'm not 100% sure I'm reading the text
> > properly.
> > 
> > If that's done, it would also be a good opportunity to clarify that the
> > note about hash agility applies to the TLS extension as a whole, not
> > just the PASSPoRT case.
> I have done that.  It required a tiny bit of glue, but it seems like it worked well.

It does look like it (just from the markdown; hopefully also when

> >                                                     This allows its peer
> >    to include a hash of its identity binding.  An endpoint without an
> >    identity binding MUST include an empty "external_id_hash" extension
> >    in its ServerHello or EncryptedExtensions message, to indicate
> >    support for the extension.
> > 
> > nit: and that it has validated the client's identity binding?
> That isn't possible until the connection is established generally.  Maybe it indicates a willingness to validate an identity binding, but really all that is needed here is that it is willing to receive a binding.

Ah, good point.

> >    A peer that receives an "external_id_hash" extension that does not
> >    match the value of the identity binding from its peer MUST
> >    immediately fail the TLS handshake with an error.  This includes
> >    cases where the binding is absent, in which case the extension MUST
> >    be present and empty.
> > 
> > nit: I'd suggest rewording the second sentence as follows (since the
> > conditional logic on "extension present but binding absent" could be
> > confusing:
> > 
> > % The absence of an identity binding does not relax this requirement --
> > % an extension received when the peer has not provided an identity
> > % binding on the signalling layer must still be validated to have the
> > % zero-length extension body.
> I went with "The absence of an identity binding
> does not relax this requirement; if a peer provided no identity binding, a
> zero-length extension MUST be present to be considered valid."
> >    A peer that receives an identity binding, but does not receive an
> >    "external_id_hash" extension MAY choose to fail the connection,
> >    though it is expected that implementations written prior to the
> >    definition of the extensions in this document will not support both
> >    for some time.
> > 
> > nit: I don't think the comma after "binding" is needed.
> > Also, is the "not" intended?  I'm not entirely sure what "both" is
> > intended to refer to.
> Reworded to:
> Implementations written prior to the definition of the extensions in this
> document will not support this extension for some time.  A peer that receives an
> identity binding but does not receive an `external_id_hash` extension MAY accept
> a TLS connection rather than fail a connection where the extension is absent.


> > Section 4.1
> > 
> >    another honest endpoint.  The attacker convinces the endpoint that
> >    their session has completed, and that the session with the other
> >    endpoint has succeeded.
> > 
> > Even with the benfit of the figure, I'm not sure I am properly
> > understanding the distinction between "completed" and "succeeded".  Is
> > the idea that the "completed" session finishes a DTLS handshake and then
> > immediately hangs up?  Or is this entirely at the signalling layer?
> reworded to:
> "The attacker convinces the first endpoint that their session with the attacker has
> been successfully established, but media is exchanged with the other honest
> endpoint.  The attacker permits the session with the other honest endpoint to
> complete only to the extent necessary to convince the other honest endpoint to
> participate in the attacked session."

Much better; thank you

> >                  For this reason, it might be necessary to permit the
> >    signaling from Patsy to reach Norma to allow Patsy to receive a call
> >    setup completion signal, such as a SIP ACK.  Once the second session
> >    is established, Mallory might cause DTLS packets sent by Norma to
> >    Patsy to be dropped.  It is likely that these DTLS packets will be
> >    discarded by Patsy as Patsy will already have a successful DTLS
> >    connection established.
> > 
> > nit: Is this "it is likely these packets would be discarded even if
> > Mallory lets them through"?
> Yes.  "However, if Mallory allows DTLS packets to pass, it is
> likely that Patsy will discard them as Patsy will already have a successful DTLS
> connection established."
> >    This attack creates an asymmetry in the beliefs about the identity of
> >    peers.  However, this attack is only possible if the victim (Norma)
> >    is willing to conduct two sessions nearly simultaneously, if the
> >    attacker (Mallory) is on the network path between the victims, and if
> >    the same certificate - and therefore SDP "fingerprint" attribute
> >    value - is used in both sessions.
> > 
> > This is the same certificate used by Norma in both sessions, right?
> Ack.  Fixed.
> > Section 4.3
> > 
> >    Where RTP and RTCP [RTP] are not multiplexed, it is possible that the
> >    two separate DTLS connections carrying RTP and RTCP can be switched.
> >    This is considered benign since these protocols are designed to be
> >    distinguishable.  RTP/RTCP multiplexing is advised to address this
> >    problem.
> > 
> > What does "switched" mean?  That Mallory could swap the data contents
> > around as an active MITM?
> Yes.  
> RTP and RTCP are usually multiplexed onto the same path today, where they need to be fully distinguishable.  This is by far the most common deployment model, but that doesn't really matter for the purposes of analysis.
> If RTP and RTCP use different DTLS connections, an attacker could switch port numbers (for instance) on the two and cause peers to believe that RTP is RTCP and vice versa.  That's highly unlikely to produce meaningful results SRTP (RFC 3711) provides key separation.  Key separation would be enough if it weren't for ludicrously short authentication tags in common use (we still see 32-bit tags; bits, not bytes!).  If key separation fails, the protocols are generally distinguishable, though not perfectly so where multiplexing is not chosen.

I'm going to insert "since" in "since SRTP provides key separation".
Thanks for clarifying/confirming, and I don't think we need any changes to
the text here.

> > If my understanding is correct, we should probably add a bit more text
> > here indicating the need for more validation than just the validation of
> > the TLS extension contents that this document describes.
> Yes, there are multiple validation steps.  This document assumes that the validation described in RFC 8122 (a=fingerprint) is always followed.  It also assumes that if you are using one of the identity assertions, you follow the respective validation processes.
> Added to both sections (the bracket part only for the first):
> Any validation performed of the `external_id_hash|external_session_id` extension is done in addition
> to the validation required by {{!FINGERPRINT}} [and any identity assertion
> definition].


> > Section 5
> > 
> >    In the absence of any higher-level concept of peer identity, the use
> >    of session identifiers does not prevent session concatenation if the
> >    attacker is able to copy the session identifier from one signaling
> >    session to another.  This kind of attack is prevented by systems that
> >    enable peer authentication such as WebRTC identity [WEBRTC-SEC] or
> >    SIP identity [SIP-ID].  However, session concatenation remains
> >    possible at higher layers: an attacker can establish two independent
> >    sessions and simply forward any data it receives from one into the
> >    other.
> > 
> > And in such a case the attacker has access to the media plaintext, too,
> > right?
> Yeah, but that doesn't expose the victims to the consequences of mistaken channel bindings.


> >    Use of the "external_session_id" does not guarantee that the identity
> >    of the peer at the TLS layer is the same as the identity of the
> >    signaling peer.  The advantage an attacker gains by concatenating
> >    sessions is limited unless it is assumed that signaling and TLS peers
> >    are the same.  If a secondary protocol uses the signaling channel
> >    with the assumption that the signaling and TLS peers are the same
> >    then that protocol is vulnerable to attack unless they also validate
> >    the identity of peers at both layers.
> > 
> > Is this paragraph describing a case like in
> > draft-ietf-rtcweb-security-arch, where we do send (and verify) identity
> > assertions at the signalling layer?  That is, the verification at the
> > IdP counts for validation at the "secondary protocol" layer and the
> > verification that the TLS extension matches the signalling constitutes
> > verification at the TLS layer, thereby achieving the validation "at both
> > layers"?
> WebRTC doesn't include validation at both layers in the way that this describes.
> Imagine that Mallory pretends that they own fingerprints and assertions from Norma when communicating with Patsy and (vice versa) pretends that they own the Patsy's values when communicating with Norma.  They can then forward packets.  That puts Mallory on the signaling path, but not the media path (aside from having to forward encrypted packets).  If Norma and Patsy were to assume that signaling and media peers were the same, that leads to problems.
> Neither Norma nor Patsy are under misapprehension of each other's identity, but they might not realize that they have an on-path attacker that can see (and modify) signaling.  If they were to send something derived from signaling in DTLS, that might fail.  If they were to send something derived from DTLS in signaling, that might leak to Mallory.

Thanks for helping flesh this out.  Looking back at the text I quoted from
the document, there are two things that I might want to change (still not
100% confident): (1) "unless it is assumed", which is potentially vague
about who/what is doing the assuming.  Perhaps "unless there is additional
data exchanged under the assumption that" or similar.  (2) "unless they also
validate the identity of peers at both layers".  This document describes
how to validate the identity of the peer at the TLS layer, but I wonder if
we want some extra reference/guidance on validating the peer's identity at
the signalling layer.

> >    It is important to note that multiple connections can be created
> >    within the same signaling session.  An attacker might concatenate
> >    only part of a session, choosing to terminate some connections (and
> >    optionally forward data) while arranging to have peers interact
> >    directly for other connections.  It is even possible to have
> >    different peers interact for each connection.  This means that the
> >    actual identity of the peer for one connection might differ from the
> >    peer on another connection.
> > 
> > How do or could the mitigations specified in this document address these
> > attacks?
> The point of this text is to make it clear that no guarantees transfer from media to signaling or between different connections.
> With the mitigations described AND identity assertions, you can be confident that the peer at the media layer is the peer that the identity assertion reports, even in the presence of attackers on the signaling path.
> With only fingerprints, the signaling system needs to have integrity, including safeguards on replacement of fingerprints by endpoints.
> I've added those notes as security considerations.