[AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-avtcore-multiplex-guidelines-11: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Wed, 04 March 2020 22:36 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: avt@ietf.org
Delivered-To: avt@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 7625A3A0AC4; Wed, 4 Mar 2020 14:36:24 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-avtcore-multiplex-guidelines@ietf.org, avtcore-chairs@ietf.org, avt@ietf.org, Jonathan Lennox <jonathan.lennox42@gmail.com>, jonathan.lennox42@gmail.com
X-Test-IDTracker: no
X-IETF-IDTracker: 6.119.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <158336138445.29463.15585769228284211900@ietfa.amsl.com>
Date: Wed, 04 Mar 2020 14:36:24 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/hqlLdWkU0P5l6FCa1YaTKt9FSn8>
Subject: [AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-avtcore-multiplex-guidelines-11: (with DISCUSS and COMMENT)
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Mar 2020 22:36:31 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-avtcore-multiplex-guidelines-11: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)

Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here:


As Mirja maybe already noted, Section 3.4.3 says:

   Section 8.3 of the RTP Specification [RFC3550] recommends using a
   single SSRC space across all RTP sessions for layered coding.  Based
   on the experience so far however, we recommend to use a solution with
   explicit binding between the RTP streams that is agnostic to the used
   SSRC values.  That way, solutions using multiple RTP streams in a

This sounds an awful lot like we're trying to update the recommendations
from RFC 3550, and looks like different text than was discussed in
Mirja's ballot thread.  Let's discuss whether the formal Updates:
mechanism is appropriate here or we should consider rewording.


Abstract, Introduction

Do we consider SRTP to be included in discussion of "RTP"?

Section 1

   from a particular usage of the RTP multiplexing points.  The document
   will provide some guidelines and recommend against some usages as
   being unsuitable, in general or for particular purposes.

If something is unsuitable in general, should the protocol feature be

Section 2.1

   RTP Session Group:  One or more RTP sessions that are used together
      to perform some function.  Examples are multiple RTP sessions used
      to carry different layers of a layered encoding.  In an RTP
      Session Group, CNAMEs are assumed to be valid across all RTP
      sessions, and designate synchronisation contexts that can cross
      RTP sessions; i.e. SSRCs that map to a common CNAME can be assumed
      to have RTCP SR timing information derived from a common clock
      such that they can be synchronised for playout.

I suggest expanding "RTCP SR timing" or providing a definition.

Section 3.1

It seems a little surprising to use simulcast as an example in the
"needed to represent one media source" bullet and then have separate
bullets for simulcast permutations.

   sessions to group the RTP streams.  The choice suitable for one
   reason, might not be the choice suitable for another reason.  The

nit: is it the "reason" or the "situation"/"scenario" that is relevant

Section 3.2

   RTP streams.  Figure 1 outlines the process of demultiplexing
   incoming RTP streams starting already at the socket representing
   reception of one or transport flows, e.g. an UDP destination port.

nit: "one or more"?

I'd consider putting more arrowheads in the downward direction, though
it's unclear if the resultant vertical expansion of the figure is worth

Section 3.2.1

   For RTP session separation within a single endpoint, RTP relies on
   the underlying transport layer, and on the signalling to identify RTP
   sessions in a manner that is meaningful to the application.  A single
   endpoint can have one or more transport flows for the same RTP
   session, and a single RTP session can therefore span multiple
   transport layer flows even if all endpoints use a single transport
   layer flow per endpoint for that RTP session.  The signalling layer

nit: "therefore" seems misplaced; the relevant linkage in the logic
seems to be that there could be one transport flow per endpoint pair
(as we don't require multicast usage).

   Independently if an endpoint has one or more IP addresses, a single

nit: I'm not sure if "independently" is the right conjunctive adverb,
but whatever is used it should have a comma after it.

Section 3.2.2

   Endpoints that are both RTP sender and RTP receiver use the same SSRC
   in both roles.

If I have multiple SSRCs as a sender, do I have freedom to vary amongst
them when acting as an RTP receiver (or RTCP sender)?

   SSRC values are unique across RTP sessions.  For the RTP
   retransmission [RFC4588] case it is recommended to use explicit
   binding of the source RTP stream and the redundancy stream, e.g.
   using the RepairedRtpStreamId RTCP SDES item [I-D.ietf-avtext-rid].

Some indication of whether this recommendation is new in this document
or "long-standing" might be worthwhile.

   Note that RTP sequence number and RTP timestamp are scoped by the
   SSRC and thus specific per RTP stream.

And now I wonder about the behavior of these two in the retransmission
case from the previous paragraph.  But that's likely off-topic for this
document :)

   An endpoint that generates more than one media type, e.g.  a
   conference participant sending both audio and video, need not (and,
   indeed, should not) use the same SSRC value across RTP sessions.

I'm not sure I understand why the guidance on cross-session behavior is
specific to the multi-media-type case.

   RTCP compound packets containing the CNAME SDES item is the
   designated method to bind an SSRC to a CNAME, effectively cross-
   correlating SSRCs within and between RTP Sessions as coming from the
   same endpoint.  The main property attributed to SSRCs associated with
   the same CNAME is that they are from a particular synchronisation
   context and can be synchronised at playback.

I am curious (but not necessarily needing to see in this document) where
the security considerations regarding CNAME spoofing (where an attacker
claims the CNAME of an existing source to attempt to be treated as part
of the victim's output) are discussed.

Section 3.2.4

   The RTP payload type is scoped by the sending endpoint within an RTP
   session.  PT has the same meaning across all RTP streams in an RTP
   session.  All SSRCs sent from a single endpoint share the same

I'd suggest "same meaning across all RTP streams from that sender",
though given the previous (and next!) sentence it is probably not
strictly necessary.

Section 3.3

   o  Does my communication peer support RTP as defined with multiple
      SSRCs per RTP session?

There's potentially some ambiguity about grouping/binding in this text.

   gateway, for example a need to monitor the RTP streams.  Beware that
   changing the stream transport characteristics in the translator, can
   require thorough understanding of the application logic, specifically
   any congestion control or media adaptation to ensure appropriate
   media handling.

While congestion control and media adaptation are important, they're
hardly the only things that a middlebox might need to know about (but
fail to implement properly, which is the point of this warning).  I'd
suggest rephrasing to be a range/selection rather than drilling into
specific points (e.g., "from congestion control to media adaptation or
particular application-layer semantics").

   Within the uses enabled by the RTP standard the point to point
   topology can contain one to many RTP sessions with one to many media
   sources per session, each having one or more RTP streams per media

micro-nit: "one to many", "one to many", "one or more" ruins the parallelism


   o  Signalling based (SDP)

"e.g., SDP", no?

   An RTP/RTCP-based grouping solution is to use the RTCP SDES CNAME to
   bind related RTP streams to an endpoint or to a synchronization
   context.  For applications with a single RTP stream per type (media,
   source or redundancy stream), CNAME is sufficient for that purpose
   independent if one or more RTP sessions are used.  However, some

nit: "independent if" doesn't parse properly; maybe "independently of

   independent if one or more RTP sessions are used.  However, some
   applications choose not to use CNAME because of perceived complexity
   or a desire not to implement RTCP and instead use the same SSRC value
   to bind related RTP streams across multiple RTP sessions.  RTP

[It's interesting to see this noted, given that we talk about how if you
don't implement RTCP you're not actually using RTP, just the RTP packet
formats; and how we discuss that reusing the same SSRC value across
multiple RTP sessions can be risky.  That said, this should not
discourage us from documenting what implementations actually do...]

Section 3.4.4

   There exist a number of Forward Error Correction (FEC) based schemes
   for how to reduce the packet loss of the original streams.  Most of

nit: I think this is either "mitigate packet loss" or "reduce lost data
from a media stream", but "reduce packet loss" it is not.

   Using multiple RTP sessions supports the case where some set of
   receivers might not be able to utilise the FEC information.  By
   placing it in a separate RTP session and if separating RTP sessions
   on transport level, FEC can easily be ignored already on transport
   level, without considering any RTP layer information.

nit: "the transport level"

Section 4.1.2

   BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], it is possible for
   the RTP translator to map the RTP streams between both sides using
   some method, e.g. if the number and order of SDP "m=" lines between
   both sides are the same.  There are also challenges with SSRC

(There's nothing in SDP that requires that to be the case, though, this
would merely be a "convenient property shared by the two applications'

Section 4.1.3

   For applications that use any security mechanism, e.g., in the form
   of SRTP, the gateway needs to be able to decrypt and verify source
   integrity of the incoming packets, and re-encrypt, integrity protect,
   and sign the packets as peer in the other application's security
   context.  This is necessary even if all that's needed is a simple

Can you clarify what is meant by "sign the packets as peer" here?  Is it
implying that the terminating gateway needs to have credentials so as to
impersonate both "real" participants to the other?
(Also, nit: "sign packets as the peer" might be a more grammatical
wording, as "peer" needs an article.)

   If one uses security functions, like SRTP, and as can be seen from
   above, they incur both additional risk due to the requirement to have
   the gateway in the security association between the endpoints (unless
   the gateway is on the transport level), and additional complexities
   in form of the decrypt-encrypt cycles needed for each forwarded
   packet.  SRTP, due to its keying structure, also requires that each

This sentence is pretty complicated.  Even in the first part, I'm not
sure what "they" in "they incur both" refers to...it seems that the risk
is to the participant(s) ("one") rather than the "security functions"

   RTP session needs different master keys, as use of the same key in
   two RTP sessions can for some ciphers result in two-time pads that
   completely breaks the confidentiality of the packets.

I'd suggest discussing this as "reuse of a one-time pad" rather than a
"two-time pad".

Section 4.1.4

   Endpoints that aren't updated to handle multiple streams following
   these recommendations can have issues with participating in RTP
   sessions containing multiple SSRCs within a single session, such as:

Talking about endpoints being "updated [...] following these
recommendations" also makes me wonder whether an Updates relationship to
3550 or other document(s) would be appropriate.

Section 4.2.2

      the, in most cases 2-3, additional flows.  However, packet loss
      causes extra delays, at least 100 ms, which is the minimal
      retransmission timer for ICE.

Doesn't RFC 8445 say 500 ms, not 100?

   Deep Packet Inspection and Multiple Streams:  Firewalls differ in how
      deeply they inspect packets.  There exist some risk that deeply
      inspecting firewalls will have similar legacy issues with multiple
      SSRCs as some RTP stack implementations.

Re "some risk", can we say that this has definitely been seen in the
wild at least once?

Section 4.3.1

   only premium users are allowed to access.  The mechanism preventing a
   receiver from getting the high quality stream can be based on the
   stream being encrypted with a key that user can't access without
   paying premium, using the key-management to limit access to the key.

nit: there seems to be a missing word here ("paying a premium"?)

   SRTP [RFC3711] has no special functions for dealing with different
   sets of master keys for different SSRCs.  The key-management
   functions have different capabilities to establish different sets of
   keys, normally on a per-endpoint basis.  For example, DTLS-SRTP
   [RFC5764] and Security Descriptions [RFC4568] establish different
   keys for outgoing and incoming traffic from an endpoint.  This key
   usage has to be written into the cryptographic context, possibly
   associated with different SSRCs.

I don't really understand what this paragraph is trying to say.

Section 4.3.2

   Transport translator-based sessions and multicast sessions, can

This doesn't seem to match the terminology we used in § 4.1.2.
(This terminology appears a couple other times, later.)

Section 5.1

   h.  If the applications need finer control over which session
       participants that are included in different sets of security
       associations, most key-management will have difficulties
       establishing such a session.

nit: the grammar is off, here (remove "that" and use "key-management

Section 5.3

   2.  The application can indicate its usage of the RTP streams on RTP
       session level, in case multiple different usages exist.

nit: is this "in case" (precautionary) or "in the case when"

Section 6

   Transport Support Extensions:  When defining new RTP/RTCP extensions

nit: should we swap the order of "Support" and "Extensions"?

Section 11.1

RFC 3830 does not feel like it needs to be normative.

Appendix A

   4.   Sending multiple streams in the same sequence number space makes
        it impossible to determine which payload type, which stream a
        packet loss relates to, and thus to which stream to potentially
        apply packet loss concealment or other stream-specific loss
        mitigation mechanisms.

I don't think this parses properly (around "which payload type,")

Appendix B.1

   One aspect of the existing signalling is that it is focused on RTP
   sessions, or at least in the case of SDP the media description.

nit: I think there's an extra or missing word here (around "the media

   o  Bitrate/Bandwidth exist today only at aggregate or as a common
      "any RTP stream" limit, unless either codec-specific bandwidth
      limiting or RTCP signalling using TMMBR is used.

Should we have a reference for TMMBR?

Appendix B.3

   RTP streams being transported in RTP has some particular usage in an
   RTP application.  This usage of the RTP stream is in many

nit: singular/plural mismatch "has"/"streams"