Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-avtcore-multiplex-guidelines-11: (with DISCUSS and COMMENT)

"Roni Even (A)" <> Thu, 05 March 2020 14:49 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A2EF43A15BA; Thu, 5 Mar 2020 06:49:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id RVFuOigb9J1l; Thu, 5 Mar 2020 06:49:35 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 30AB03A15B9; Thu, 5 Mar 2020 06:49:35 -0800 (PST)
Received: from (unknown []) by Forcepoint Email with ESMTP id 10EAF5182B43317D7CBF; Thu, 5 Mar 2020 14:49:33 +0000 (GMT)
Received: from ( by ( with Microsoft SMTP Server (TLS) id 14.3.408.0; Thu, 5 Mar 2020 14:49:21 +0000
Received: from ( by ( with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1713.5; Thu, 5 Mar 2020 14:49:21 +0000
Received: from ( by ( with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P256) id 15.1.1713.5 via Frontend Transport; Thu, 5 Mar 2020 14:49:21 +0000
Received: from ([]) by ([]) with mapi id 14.03.0439.000; Thu, 5 Mar 2020 22:49:14 +0800
From: "Roni Even (A)" <>
To: Benjamin Kaduk <>, The IESG <>
CC: "" <>, "" <>, "" <>, Jonathan Lennox <>
Thread-Topic: Benjamin Kaduk's Discuss on draft-ietf-avtcore-multiplex-guidelines-11: (with DISCUSS and COMMENT)
Thread-Index: AQHV8nVckABBC0n+RUe/Bf6YMwNt4ag6FUaA
Date: Thu, 05 Mar 2020 14:49:13 +0000
Message-ID: <>
References: <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
x-originating-ip: []
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <>
Subject: Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-avtcore-multiplex-guidelines-11: (with DISCUSS and COMMENT)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 05 Mar 2020 14:49:39 -0000

Hi Ben ,
I am curious (but not necessarily needing to see in this document) where the security considerations regarding CNAME spoofing (where an attacker claims the CNAME of an existing source to attempt to be treated as part of the victim's output) are discussed.

Roni Even

> -----Original Message-----
> From: Benjamin Kaduk via Datatracker []
> Sent: Thursday, March 05, 2020 12:36 AM
> To: The IESG
> Cc:;;
>; Jonathan Lennox;
> Subject: Benjamin Kaduk's Discuss on draft-ietf-avtcore-multiplex-guidelines-11:
> (with DISCUSS and COMMENT)
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-avtcore-multiplex-guidelines-11: Discuss
> When responding, please keep the subject line intact and reply to all email
> addresses included in the To and CC lines. (Feel free to cut this introductory
> paragraph, however.)
> Please refer to
> for more information about IESG DISCUSS and COMMENT positions.
> The document, along with other ballot positions, can be found here:
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> As Mirja maybe already noted, Section 3.4.3 says:
>    Section 8.3 of the RTP Specification [RFC3550] recommends using a
>    single SSRC space across all RTP sessions for layered coding.  Based
>    on the experience so far however, we recommend to use a solution with
>    explicit binding between the RTP streams that is agnostic to the used
>    SSRC values.  That way, solutions using multiple RTP streams in a
> This sounds an awful lot like we're trying to update the recommendations from
> RFC 3550, and looks like different text than was discussed in Mirja's ballot
> thread.  Let's discuss whether the formal Updates:
> mechanism is appropriate here or we should consider rewording.
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> Abstract, Introduction
> Do we consider SRTP to be included in discussion of "RTP"?
> Section 1
>    from a particular usage of the RTP multiplexing points.  The document
>    will provide some guidelines and recommend against some usages as
>    being unsuitable, in general or for particular purposes.
> If something is unsuitable in general, should the protocol feature be
> deprecated/removed?
> Section 2.1
>    RTP Session Group:  One or more RTP sessions that are used together
>       to perform some function.  Examples are multiple RTP sessions used
>       to carry different layers of a layered encoding.  In an RTP
>       Session Group, CNAMEs are assumed to be valid across all RTP
>       sessions, and designate synchronisation contexts that can cross
>       RTP sessions; i.e. SSRCs that map to a common CNAME can be assumed
>       to have RTCP SR timing information derived from a common clock
>       such that they can be synchronised for playout.
> I suggest expanding "RTCP SR timing" or providing a definition.
> Section 3.1
> It seems a little surprising to use simulcast as an example in the "needed to
> represent one media source" bullet and then have separate bullets for simulcast
> permutations.
>    sessions to group the RTP streams.  The choice suitable for one
>    reason, might not be the choice suitable for another reason.  The
> nit: is it the "reason" or the "situation"/"scenario" that is relevant here?
> Section 3.2
>    RTP streams.  Figure 1 outlines the process of demultiplexing
>    incoming RTP streams starting already at the socket representing
>    reception of one or transport flows, e.g. an UDP destination port.
> nit: "one or more"?
> I'd consider putting more arrowheads in the downward direction, though it's
> unclear if the resultant vertical expansion of the figure is worth it.
> Section 3.2.1
>    For RTP session separation within a single endpoint, RTP relies on
>    the underlying transport layer, and on the signalling to identify RTP
>    sessions in a manner that is meaningful to the application.  A single
>    endpoint can have one or more transport flows for the same RTP
>    session, and a single RTP session can therefore span multiple
>    transport layer flows even if all endpoints use a single transport
>    layer flow per endpoint for that RTP session.  The signalling layer
> nit: "therefore" seems misplaced; the relevant linkage in the logic seems to be
> that there could be one transport flow per endpoint pair (as we don't require
> multicast usage).
>    Independently if an endpoint has one or more IP addresses, a single
> nit: I'm not sure if "independently" is the right conjunctive adverb, but whatever
> is used it should have a comma after it.
> Section 3.2.2
>    Endpoints that are both RTP sender and RTP receiver use the same SSRC
>    in both roles.
> If I have multiple SSRCs as a sender, do I have freedom to vary amongst them
> when acting as an RTP receiver (or RTCP sender)?
>    SSRC values are unique across RTP sessions.  For the RTP
>    retransmission [RFC4588] case it is recommended to use explicit
>    binding of the source RTP stream and the redundancy stream, e.g.
>    using the RepairedRtpStreamId RTCP SDES item [I-D.ietf-avtext-rid].
> Some indication of whether this recommendation is new in this document or
> "long-standing" might be worthwhile.
>    Note that RTP sequence number and RTP timestamp are scoped by the
>    SSRC and thus specific per RTP stream.
> And now I wonder about the behavior of these two in the retransmission case
> from the previous paragraph.  But that's likely off-topic for this document :)
>    An endpoint that generates more than one media type, e.g.  a
>    conference participant sending both audio and video, need not (and,
>    indeed, should not) use the same SSRC value across RTP sessions.
> I'm not sure I understand why the guidance on cross-session behavior is specific
> to the multi-media-type case.
>    RTCP compound packets containing the CNAME SDES item is the
>    designated method to bind an SSRC to a CNAME, effectively cross-
>    correlating SSRCs within and between RTP Sessions as coming from the
>    same endpoint.  The main property attributed to SSRCs associated with
>    the same CNAME is that they are from a particular synchronisation
>    context and can be synchronised at playback.
> I am curious (but not necessarily needing to see in this document) where the
> security considerations regarding CNAME spoofing (where an attacker claims
> the CNAME of an existing source to attempt to be treated as part of the victim's
> output) are discussed.
> Section 3.2.4
>    The RTP payload type is scoped by the sending endpoint within an RTP
>    session.  PT has the same meaning across all RTP streams in an RTP
>    session.  All SSRCs sent from a single endpoint share the same
> I'd suggest "same meaning across all RTP streams from that sender", though
> given the previous (and next!) sentence it is probably not strictly necessary.
> Section 3.3
>    o  Does my communication peer support RTP as defined with multiple
>       SSRCs per RTP session?
> There's potentially some ambiguity about grouping/binding in this text.
>    gateway, for example a need to monitor the RTP streams.  Beware that
>    changing the stream transport characteristics in the translator, can
>    require thorough understanding of the application logic, specifically
>    any congestion control or media adaptation to ensure appropriate
>    media handling.
> While congestion control and media adaptation are important, they're hardly
> the only things that a middlebox might need to know about (but fail to
> implement properly, which is the point of this warning).  I'd suggest rephrasing
> to be a range/selection rather than drilling into specific points (e.g., "from
> congestion control to media adaptation or particular application-layer
> semantics").
>    Within the uses enabled by the RTP standard the point to point
>    topology can contain one to many RTP sessions with one to many media
>    sources per session, each having one or more RTP streams per media
>    source.
> micro-nit: "one to many", "one to many", "one or more" ruins the parallelism
> :)
> 3.4.3
>    o  Signalling based (SDP)
> "e.g., SDP", no?
>    An RTP/RTCP-based grouping solution is to use the RTCP SDES CNAME to
>    bind related RTP streams to an endpoint or to a synchronization
>    context.  For applications with a single RTP stream per type (media,
>    source or redundancy stream), CNAME is sufficient for that purpose
>    independent if one or more RTP sessions are used.  However, some
> nit: "independent if" doesn't parse properly; maybe "independently of
> whether"?
>    independent if one or more RTP sessions are used.  However, some
>    applications choose not to use CNAME because of perceived complexity
>    or a desire not to implement RTCP and instead use the same SSRC value
>    to bind related RTP streams across multiple RTP sessions.  RTP
> [It's interesting to see this noted, given that we talk about how if you don't
> implement RTCP you're not actually using RTP, just the RTP packet formats; and
> how we discuss that reusing the same SSRC value across multiple RTP sessions
> can be risky.  That said, this should not discourage us from documenting what
> implementations actually do...]
> Section 3.4.4
>    There exist a number of Forward Error Correction (FEC) based schemes
>    for how to reduce the packet loss of the original streams.  Most of
> nit: I think this is either "mitigate packet loss" or "reduce lost data from a media
> stream", but "reduce packet loss" it is not.
>    Using multiple RTP sessions supports the case where some set of
>    receivers might not be able to utilise the FEC information.  By
>    placing it in a separate RTP session and if separating RTP sessions
>    on transport level, FEC can easily be ignored already on transport
>    level, without considering any RTP layer information.
> nit: "the transport level"
> Section 4.1.2
>    BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], it is possible for
>    the RTP translator to map the RTP streams between both sides using
>    some method, e.g. if the number and order of SDP "m=" lines between
>    both sides are the same.  There are also challenges with SSRC
> (There's nothing in SDP that requires that to be the case, though, this would
> merely be a "convenient property shared by the two applications'
> behavior"?)
> Section 4.1.3
>    For applications that use any security mechanism, e.g., in the form
>    of SRTP, the gateway needs to be able to decrypt and verify source
>    integrity of the incoming packets, and re-encrypt, integrity protect,
>    and sign the packets as peer in the other application's security
>    context.  This is necessary even if all that's needed is a simple
> Can you clarify what is meant by "sign the packets as peer" here?  Is it implying
> that the terminating gateway needs to have credentials so as to impersonate
> both "real" participants to the other?
> (Also, nit: "sign packets as the peer" might be a more grammatical wording, as
> "peer" needs an article.)
>    If one uses security functions, like SRTP, and as can be seen from
>    above, they incur both additional risk due to the requirement to have
>    the gateway in the security association between the endpoints (unless
>    the gateway is on the transport level), and additional complexities
>    in form of the decrypt-encrypt cycles needed for each forwarded
>    packet.  SRTP, due to its keying structure, also requires that each
> This sentence is pretty complicated.  Even in the first part, I'm not sure what
> "they" in "they incur both" refers seems that the risk is to the
> participant(s) ("one") rather than the "security functions"
> themselves...
>    RTP session needs different master keys, as use of the same key in
>    two RTP sessions can for some ciphers result in two-time pads that
>    completely breaks the confidentiality of the packets.
> I'd suggest discussing this as "reuse of a one-time pad" rather than a "two-time
> pad".
> Section 4.1.4
>    Endpoints that aren't updated to handle multiple streams following
>    these recommendations can have issues with participating in RTP
>    sessions containing multiple SSRCs within a single session, such as:
> Talking about endpoints being "updated [...] following these recommendations"
> also makes me wonder whether an Updates relationship to
> 3550 or other document(s) would be appropriate.
> Section 4.2.2
>       the, in most cases 2-3, additional flows.  However, packet loss
>       causes extra delays, at least 100 ms, which is the minimal
>       retransmission timer for ICE.
> Doesn't RFC 8445 say 500 ms, not 100?
>    Deep Packet Inspection and Multiple Streams:  Firewalls differ in how
>       deeply they inspect packets.  There exist some risk that deeply
>       inspecting firewalls will have similar legacy issues with multiple
>       SSRCs as some RTP stack implementations.
> Re "some risk", can we say that this has definitely been seen in the wild at least
> once?
> Section 4.3.1
>    only premium users are allowed to access.  The mechanism preventing a
>    receiver from getting the high quality stream can be based on the
>    stream being encrypted with a key that user can't access without
>    paying premium, using the key-management to limit access to the key.
> nit: there seems to be a missing word here ("paying a premium"?)
>    SRTP [RFC3711] has no special functions for dealing with different
>    sets of master keys for different SSRCs.  The key-management
>    functions have different capabilities to establish different sets of
>    keys, normally on a per-endpoint basis.  For example, DTLS-SRTP
>    [RFC5764] and Security Descriptions [RFC4568] establish different
>    keys for outgoing and incoming traffic from an endpoint.  This key
>    usage has to be written into the cryptographic context, possibly
>    associated with different SSRCs.
> I don't really understand what this paragraph is trying to say.
> Section 4.3.2
>    Transport translator-based sessions and multicast sessions, can
> This doesn't seem to match the terminology we used in § 4.1.2.
> (This terminology appears a couple other times, later.)
> Section 5.1
>    h.  If the applications need finer control over which session
>        participants that are included in different sets of security
>        associations, most key-management will have difficulties
>        establishing such a session.
> nit: the grammar is off, here (remove "that" and use "key-management
> techniques"?)
> Section 5.3
>    2.  The application can indicate its usage of the RTP streams on RTP
>        session level, in case multiple different usages exist.
> nit: is this "in case" (precautionary) or "in the case when"
> (descriptive)?
> Section 6
>    Transport Support Extensions:  When defining new RTP/RTCP extensions
> nit: should we swap the order of "Support" and "Extensions"?
> Section 11.1
> RFC 3830 does not feel like it needs to be normative.
> Appendix A
>    4.   Sending multiple streams in the same sequence number space makes
>         it impossible to determine which payload type, which stream a
>         packet loss relates to, and thus to which stream to potentially
>         apply packet loss concealment or other stream-specific loss
>         mitigation mechanisms.
> I don't think this parses properly (around "which payload type,")
> Appendix B.1
>    One aspect of the existing signalling is that it is focused on RTP
>    sessions, or at least in the case of SDP the media description.
> nit: I think there's an extra or missing word here (around "the media
> description").
>    o  Bitrate/Bandwidth exist today only at aggregate or as a common
>       "any RTP stream" limit, unless either codec-specific bandwidth
>       limiting or RTCP signalling using TMMBR is used.
> Should we have a reference for TMMBR?
> Appendix B.3
>    RTP streams being transported in RTP has some particular usage in an
>    RTP application.  This usage of the RTP stream is in many
> nit: singular/plural mismatch "has"/"streams"