Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-avtcore-multiplex-guidelines-11: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Thu, 05 March 2020 21:15 UTC

Date: Thu, 05 Mar 2020 13:15:19 -0800
From: Benjamin Kaduk <kaduk@mit.edu>
To: Magnus Westerlund <magnus.westerlund@ericsson.com>
Cc: "iesg@ietf.org" <iesg@ietf.org>, "avtcore-chairs@ietf.org" <avtcore-chairs@ietf.org>, "jonathan.lennox42@gmail.com" <jonathan.lennox42@gmail.com>, "draft-ietf-avtcore-multiplex-guidelines@ietf.org" <draft-ietf-avtcore-multiplex-guidelines@ietf.org>, "avt@ietf.org" <avt@ietf.org>
Message-ID: <20200305211519.GY98042@kduck.mit.edu>
References: <158336138445.29463.15585769228284211900@ietfa.amsl.com> <d4b939b79839e0ed42658bf8d72fb09fd3c15051.camel@ericsson.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <d4b939b79839e0ed42658bf8d72fb09fd3c15051.camel@ericsson.com>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/PcfHouj5jPKB8LVdn32kbtxF9kU>
Subject: Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-avtcore-multiplex-guidelines-11: (with DISCUSS and COMMENT)
Precedence: list

On Thu, Mar 05, 2020 at 04:30:52PM +0000, Magnus Westerlund wrote:
> Hi Ben,
> 
> Thanks for the many detailed comments. Please see inline for reponses and
> comments.
> 
> On Wed, 2020-03-04 at 14:36 -0800, Benjamin Kaduk via Datatracker wrote:
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-avtcore-multiplex-guidelines-11: Discuss
> > 
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> > 
> > 
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> > 
> > 
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-ietf-avtcore-multiplex-guidelines/
> > 
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > As Mirja maybe already noted, Section 3.4.3 says:
> > 
> >    Section 8.3 of the RTP Specification [RFC3550] recommends using a
> >    single SSRC space across all RTP sessions for layered coding.  Based
> >    on the experience so far however, we recommend to use a solution with
> >    explicit binding between the RTP streams that is agnostic to the used
> >    SSRC values.  That way, solutions using multiple RTP streams in a
> > 
> > This sounds an awful lot like we're trying to update the recommendations
> > from RFC 3550, and looks like different text than was discussed in
> > Mirja's ballot thread.  Let's discuss whether the formal Updates:
> > mechanism is appropriate here or we should consider rewording.
> 
> 
> I think maybe the simplest rewording is dropping the explicit reference to
> Section 8.3. It is a good recommendation to use an explicit mechanism that is
> independent on your RTP session structure. 

I am mindful of the concern raised in the ongoing appeal against the IESG
regarding "updates/revises other documents by describing them as
inconsistent with the specification under review rather than enumerating
them", so we may want to just note here that "experience has found that an
explicit binding between the RTP streams, agnostic of SSRC values, behaves
well".

> AVTCORE should consider updating this part of the RTP specification. And that is
> a fairly short and straightforward document. 

Agreed.

> > 
> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > Abstract, Introduction
> > 
> > Do we consider SRTP to be included in discussion of "RTP"?
> 
> Yes, SRTP is actually "just" a profile of RTP or actually two (SAVP and SAVPF). 
> 
> > 
> > Section 1
> > 
> >    from a particular usage of the RTP multiplexing points.  The document
> >    will provide some guidelines and recommend against some usages as
> >    being unsuitable, in general or for particular purposes.
> > 
> > If something is unsuitable in general, should the protocol feature be
> > deprecated/removed?
> 
> One common unsuitable usage is to overload the payload type for other purposes.
> We can't deprecate the PT as that would mean that a receiver can't interpret the
> RTP Payload in the right way. This document do discuss why overloading it is
> unsuitable and provides alternatives for how to do this a better way. 
> 
> > 
> > Section 2.1
> > 
> >    RTP Session Group:  One or more RTP sessions that are used together
> >       to perform some function.  Examples are multiple RTP sessions used
> >       to carry different layers of a layered encoding.  In an RTP
> >       Session Group, CNAMEs are assumed to be valid across all RTP
> >       sessions, and designate synchronisation contexts that can cross
> >       RTP sessions; i.e. SSRCs that map to a common CNAME can be assumed
> >       to have RTCP SR timing information derived from a common clock
> >       such that they can be synchronised for playout.
> > 
> > I suggest expanding "RTCP SR timing" or providing a definition.
> 
> Yes, we can expand that to RTCP Sender Report timing. 
> > 
> > Section 3.1
> > 
> > It seems a little surprising to use simulcast as an example in the
> > "needed to represent one media source" bullet and then have separate
> > bullets for simulcast permutations.
> 
> Yes, that is a bit strange, let us look at what to do about that. 
> 
> > 
> >    sessions to group the RTP streams.  The choice suitable for one
> >    reason, might not be the choice suitable for another reason.  The
> > 
> > nit: is it the "reason" or the "situation"/"scenario" that is relevant
> > here?
> 
> Scenario I would say. 
> 
> > 
> > Section 3.2
> > 
> >    RTP streams.  Figure 1 outlines the process of demultiplexing
> >    incoming RTP streams starting already at the socket representing
> >    reception of one or transport flows, e.g. an UDP destination port.
> > 
> > nit: "one or more"?
> 
> Yes, from an RTP session perspective, multiple sockets / 5-tuple UDP flows etc
> can actually be used to provide RTP/RTCP packets. This becomes much more evident
> if you look at some of the topologies discussed in RFC 7667. 

I was just noting that "of one or transport" was missing a word ... but
this sounds reasonable, too :)

> So maybe there should be multiple arrows with packets and "Socket(s)" in the box
> to make this clearer. 
> 
> > 
> > I'd consider putting more arrowheads in the downward direction, though
> > it's unclear if the resultant vertical expansion of the figure is worth
> > it.
> 
> You mean into the PB and DEC boxes?  

Right.

> Let us try this, it would "only" ad two more rows. 
> 
> > 
> > Section 3.2.1
> > 
> >    For RTP session separation within a single endpoint, RTP relies on
> >    the underlying transport layer, and on the signalling to identify RTP
> >    sessions in a manner that is meaningful to the application.  A single
> >    endpoint can have one or more transport flows for the same RTP
> >    session, and a single RTP session can therefore span multiple
> >    transport layer flows even if all endpoints use a single transport
> >    layer flow per endpoint for that RTP session.  The signalling layer
> > 
> > nit: "therefore" seems misplaced; the relevant linkage in the logic
> > seems to be that there could be one transport flow per endpoint pair
> > (as we don't require multicast usage).
> 
> Yes, I think just deleting "therefore" appear without downsides. 
> 
> 
> > 
> >    Independently if an endpoint has one or more IP addresses, a single
> > 
> > nit: I'm not sure if "independently" is the right conjunctive adverb,
> > but whatever is used it should have a comma after it.
> > 
> 
> Will look at it. 
> 
> > Section 3.2.2
> > 
> >    Endpoints that are both RTP sender and RTP receiver use the same SSRC
> >    in both roles.
> > 
> > If I have multiple SSRCs as a sender, do I have freedom to vary amongst
> > them when acting as an RTP receiver (or RTCP sender)?
> 
> So, they are all RTP receiver and per default they all need to report on
> reception etc and be RTCP sender. We have an RFC 8108 that clarifies this
> handling. There is also an protocol extension that allows an optimization in
> this case, see drat-ietf-avtcore-rtp-multi-stream-optimisation (stuck in cluster
> 238). 
> 
> 
> > 
> >    SSRC values are unique across RTP sessions.  For the RTP
> >    retransmission [RFC4588] case it is recommended to use explicit
> >    binding of the source RTP stream and the redundancy stream, e.g.
> >    using the RepairedRtpStreamId RTCP SDES item [I-D.ietf-avtext-rid].
> > 
> > Some indication of whether this recommendation is new in this document
> > or "long-standing" might be worthwhile.
> 
> Yes, so this is new as the RID mechanism is recently new (result of RTCWeb) and
> therefore can not be generally expected to be available. We can indicate this. 
> 
> > 
> >    Note that RTP sequence number and RTP timestamp are scoped by the
> >    SSRC and thus specific per RTP stream.
> > 
> > And now I wonder about the behavior of these two in the retransmission
> > case from the previous paragraph.  But that's likely off-topic for this
> > document :)
> 
> For RFC 4588 retransmission, the retransmissions are done on a seperate stream,
> with the orignal sequence number indicated in the RTP payload. 
> 
> > 
> >    An endpoint that generates more than one media type, e.g.  a
> >    conference participant sending both audio and video, need not (and,
> >    indeed, should not) use the same SSRC value across RTP sessions.
> > 
> > I'm not sure I understand why the guidance on cross-session behavior is
> > specific to the multi-media-type case.
> 
> No, it applies to primary and redundancy stream similarly. However, here might
> there be legacy. 
> 
> > 
> >    RTCP compound packets containing the CNAME SDES item is the
> >    designated method to bind an SSRC to a CNAME, effectively cross-
> >    correlating SSRCs within and between RTP Sessions as coming from the
> >    same endpoint.  The main property attributed to SSRCs associated with
> >    the same CNAME is that they are from a particular synchronisation
> >    context and can be synchronised at playback.
> > 
> > I am curious (but not necessarily needing to see in this document) where
> > the security considerations regarding CNAME spoofing (where an attacker
> > claims the CNAME of an existing source to attempt to be treated as part
> > of the victim's output) are discussed.
> 
> So I think there are a number of security issues that might not be as well
> discussed as they should be. RTP unfortunately has quite a vunerable interals,
> and also the fact that the most used security mechanisms don't provide true
> source authentication, only, authenticating it to the group of keyed users makes
> this potentially problematic. 
> 
> RFC 7941 do mention this issue, but does not go further than say that you deploy
> source authentication based on your needs. The PERC documents likely have some
> additional discussion of this. Middleboxes have an interesting role here as they
> can be used as defense against this spofing by tracking which endpoints that own
> which SSRCs and refue to accept multiple endpoint to inject SSRC bound to the
> same SDES CNAME. However, there are of course some use cases, like FEC where
> third party actors need to add SSRCs with same CNAME. 
> 
> 
> 
> > 
> > Section 3.2.4
> > 
> >    The RTP payload type is scoped by the sending endpoint within an RTP
> >    session.  PT has the same meaning across all RTP streams in an RTP
> >    session.  All SSRCs sent from a single endpoint share the same
> > 
> > I'd suggest "same meaning across all RTP streams from that sender",
> > though given the previous (and next!) sentence it is probably not
> > strictly necessary.
> 
> So there subtle thing here. Due to SDP Offer/answer there even in star multi-
> party the configuration is actually done per receiver endpoint per RTP session.
> This can easily forces RTP middleboxes to perform PT translation due to that
> each endpoint in an RTP session actually have its own numbering schemem of
> equivalent PT configurations. But, there is a point to actually state that PTs
> are valid on RTP session level and trying to configure them with a SSRC scope is
> not without issues and should be avoided. 
> 
> 
> > 
> > Section 3.3
> > 
> >    o  Does my communication peer support RTP as defined with multiple
> >       SSRCs per RTP session?
> > 
> > There's potentially some ambiguity about grouping/binding in this text.
> > 
> >    gateway, for example a need to monitor the RTP streams.  Beware that
> >    changing the stream transport characteristics in the translator, can
> >    require thorough understanding of the application logic, specifically
> >    any congestion control or media adaptation to ensure appropriate
> >    media handling.
> > 
> > While congestion control and media adaptation are important, they're
> > hardly the only things that a middlebox might need to know about (but
> > fail to implement properly, which is the point of this warning).  I'd
> > suggest rephrasing to be a range/selection rather than drilling into
> > specific points (e.g., "from congestion control to media adaptation or
> > particular application-layer semantics").
> 
> I think it sounds reasonable, will see what my co-authors if they have
> additional feedback. 
> 
> > 
> >    Within the uses enabled by the RTP standard the point to point
> >    topology can contain one to many RTP sessions with one to many media
> >    sources per session, each having one or more RTP streams per media
> >    source.
> > 
> > micro-nit: "one to many", "one to many", "one or more" ruins the parallelism
> > :)
> 
> Thanks :-)
> 
> > 
> > 3.4.3
> > 
> >    o  Signalling based (SDP)
> > 
> > "e.g., SDP", no?
> 
> 
> Yes.
> > 
> >    An RTP/RTCP-based grouping solution is to use the RTCP SDES CNAME to
> >    bind related RTP streams to an endpoint or to a synchronization
> >    context.  For applications with a single RTP stream per type (media,
> >    source or redundancy stream), CNAME is sufficient for that purpose
> >    independent if one or more RTP sessions are used.  However, some
> > 
> > nit: "independent if" doesn't parse properly; maybe "independently of
> > whether"?
> 
> Yes, whether should do it. 
> 
> > 
> >    independent if one or more RTP sessions are used.  However, some
> >    applications choose not to use CNAME because of perceived complexity
> >    or a desire not to implement RTCP and instead use the same SSRC value
> >    to bind related RTP streams across multiple RTP sessions.  RTP
> > 
> > [It's interesting to see this noted, given that we talk about how if you
> > don't implement RTCP you're not actually using RTP, just the RTP packet
> > formats; and how we discuss that reusing the same SSRC value across
> > multiple RTP sessions can be risky.  That said, this should not
> > discourage us from documenting what implementations actually do...]
> 
> So RTCP implementation is usually done to today, but many early voice only usage
> skipped it. So it is background why things ended up like they are. 
> 
> > 
> > Section 3.4.4
> > 
> >    There exist a number of Forward Error Correction (FEC) based schemes
> >    for how to reduce the packet loss of the original streams.  Most of
> > 
> > nit: I think this is either "mitigate packet loss" or "reduce lost data
> > from a media stream", but "reduce packet loss" it is not.
> 
> Yes, will address. 
> 
> > 
> >    Using multiple RTP sessions supports the case where some set of
> >    receivers might not be able to utilise the FEC information.  By
> >    placing it in a separate RTP session and if separating RTP sessions
> >    on transport level, FEC can easily be ignored already on transport
> >    level, without considering any RTP layer information.
> > 
> > nit: "the transport level"
> 
> Ok. 
> 
> > 
> > Section 4.1.2
> > 
> >    BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], it is possible for
> >    the RTP translator to map the RTP streams between both sides using
> >    some method, e.g. if the number and order of SDP "m=" lines between
> >    both sides are the same.  There are also challenges with SSRC
> > 
> > (There's nothing in SDP that requires that to be the case, though, this
> > would merely be a "convenient property shared by the two applications'
> > behavior"?)
> 
> So part is implied meaning of number and order of m= lines for particular
> applications / systems. This gets interesting when you interconnect such systems
> with a translator. 
> 
> > 
> > Section 4.1.3
> > 
> >    For applications that use any security mechanism, e.g., in the form
> >    of SRTP, the gateway needs to be able to decrypt and verify source
> >    integrity of the incoming packets, and re-encrypt, integrity protect,
> >    and sign the packets as peer in the other application's security
> >    context.  This is necessary even if all that's needed is a simple
> > 
> > Can you clarify what is meant by "sign the packets as peer" here?  Is it
> > implying that the terminating gateway needs to have credentials so as to
> > impersonate both "real" participants to the other?
> 
> Yes, they need to be a trusted party by both communicating peers in this case.
> Unless you use PERC all RTP level middleboxes will have to have access to the
> security context. 
> 
> > (Also, nit: "sign packets as the peer" might be a more grammatical
> > wording, as "peer" needs an article.)
> 
> Ok. 
> 
> > 
> >    If one uses security functions, like SRTP, and as can be seen from
> >    above, they incur both additional risk due to the requirement to have
> >    the gateway in the security association between the endpoints (unless
> >    the gateway is on the transport level), and additional complexities
> >    in form of the decrypt-encrypt cycles needed for each forwarded
> >    packet.  SRTP, due to its keying structure, also requires that each
> > 
> > This sentence is pretty complicated.  Even in the first part, I'm not
> > sure what "they" in "they incur both" refers to...it seems that the risk
> > is to the participant(s) ("one") rather than the "security functions"
> > themselves...
> 
> I think "they" equals gateways. Let us attempt to reformulate. 
> 
> > 
> >    RTP session needs different master keys, as use of the same key in
> >    two RTP sessions can for some ciphers result in two-time pads that
> >    completely breaks the confidentiality of the packets.
> > 
> > I'd suggest discussing this as "reuse of a one-time pad" rather than a
> > "two-time pad".
> 
> Ok
> 
> > 
> > Section 4.1.4
> > 
> >    Endpoints that aren't updated to handle multiple streams following
> >    these recommendations can have issues with participating in RTP
> >    sessions containing multiple SSRCs within a single session, such as:
> > 
> > Talking about endpoints being "updated [...] following these
> > recommendations" also makes me wonder whether an Updates relationship to
> > 3550 or other document(s) would be appropriate.
> 
> So this being an informational document targeting Application designers using
> RTP and the intention is not really to change things in this document. There do
> exist a number of documents that will update RFC 3550. When Cluster 238 is
> processed RFC 3550 will have a number of Updated RFCs. 
> 
> So, I still think the answer is no to have this document update RFC 3550. 
> 
> > 
> > Section 4.2.2
> > 
> >       the, in most cases 2-3, additional flows.  However, packet loss
> >       causes extra delays, at least 100 ms, which is the minimal
> >       retransmission timer for ICE.
> > 
> > Doesn't RFC 8445 say 500 ms, not 100?
> 
> Yes, your right. Probably wrote this during the discussion in the ICE WG and
> never reflected over the final consensus in the document. Will correct. 
> 
> > 
> >    Deep Packet Inspection and Multiple Streams:  Firewalls differ in how
> >       deeply they inspect packets.  There exist some risk that deeply
> >       inspecting firewalls will have similar legacy issues with multiple
> >       SSRCs as some RTP stack implementations.
> > 
> > Re "some risk", can we say that this has definitely been seen in the
> > wild at least once?
> 
> I can't point to a particular case from the top of my head. But I have not been
> much in the day to day running RTP applications. However, I do assume this have
> been an issue. Let me check. 
> 
> > 
> > Section 4.3.1
> > 
> >    only premium users are allowed to access.  The mechanism preventing a
> >    receiver from getting the high quality stream can be based on the
> >    stream being encrypted with a key that user can't access without
> >    paying premium, using the key-management to limit access to the key.
> > 
> > nit: there seems to be a missing word here ("paying a premium"?)
> 
> Yes. 
> 
> > 
> >    SRTP [RFC3711] has no special functions for dealing with different
> >    sets of master keys for different SSRCs.  The key-management
> >    functions have different capabilities to establish different sets of
> >    keys, normally on a per-endpoint basis.  For example, DTLS-SRTP
> >    [RFC5764] and Security Descriptions [RFC4568] establish different
> >    keys for outgoing and incoming traffic from an endpoint.  This key
> >    usage has to be written into the cryptographic context, possibly
> >    associated with different SSRCs.
> > 
> > I don't really understand what this paragraph is trying to say.
> 
> I think it tries to warn that there are a descrepancy between how SRTP is
> defined, the implementations capabilities when combined with particular key-
> manangement and the full set of RTP's flexibilities to create an RTP Session. 
> 
> Will see if we can clarify this a bit. 

Thanks.

> > 
> > Section 4.3.2
> > 
> >    Transport translator-based sessions and multicast sessions, can
> > 
> > This doesn't seem to match the terminology we used in § 4.1.2.
> > (This terminology appears a couple other times, later.)
> 
> These are RFC 7667 terms. We likely should use the RFC 7667 short hand to make
> that clearer. 
> 
> > 
> > Section 5.1
> > 
> >    h.  If the applications need finer control over which session
> >        participants that are included in different sets of security
> >        associations, most key-management will have difficulties
> >        establishing such a session.
> > 
> > nit: the grammar is off, here (remove "that" and use "key-management
> > techniques"?)
> 
> Ok. 
> 
> > 
> > Section 5.3
> > 
> >    2.  The application can indicate its usage of the RTP streams on RTP
> >        session level, in case multiple different usages exist.
> > 
> > nit: is this "in case" (precautionary) or "in the case when"
> > (descriptive)?
> 
> The later, as each RTP session can be bound to a particular usage. 
> 
> > 
> > Section 6
> > 
> >    Transport Support Extensions:  When defining new RTP/RTCP extensions
> > 
> > nit: should we swap the order of "Support" and "Extensions"?
> 
> Or "Extensions for Transport Support". Will address. 
> 
> > 
> > Section 11.1
> > 
> > RFC 3830 does not feel like it needs to be normative.
> 
> Agree, its usage is an example. 
> 
> > 
> > Appendix A
> > 
> >    4.   Sending multiple streams in the same sequence number space makes
> >         it impossible to determine which payload type, which stream a
> >         packet loss relates to, and thus to which stream to potentially
> >         apply packet loss concealment or other stream-specific loss
> >         mitigation mechanisms.
> > 
> > I don't think this parses properly (around "which payload type,")
> 
> Will attempt to address.
> 
> > 
> > Appendix B.1
> > 
> >    One aspect of the existing signalling is that it is focused on RTP
> >    sessions, or at least in the case of SDP the media description.
> > 
> > nit: I think there's an extra or missing word here (around "the media
> > description").
> 
> Yes, something is strange. 
> 
> maybe
> 
> ... or in the case of SDP, the media descriptions concept. 

That looks better, yes.

Thanks,

Ben

> > 
> >    o  Bitrate/Bandwidth exist today only at aggregate or as a common
> >       "any RTP stream" limit, unless either codec-specific bandwidth
> >       limiting or RTCP signalling using TMMBR is used.
> > 
> > Should we have a reference for TMMBR?
> 
> Yes, it is defined in RFC 5104. 
> 
> > 
> > Appendix B.3
> > 
> >    RTP streams being transported in RTP has some particular usage in an
> >    RTP application.  This usage of the RTP stream is in many
> > 
> > nit: singular/plural mismatch "has"/"streams"
> 
> Thanks. 
> 
> 
> 
> 
> Cheers
> 
> Magnus Westerlund 
> 
> 
> ----------------------------------------------------------------------
> Networks, Ericsson Research
> ----------------------------------------------------------------------
> Ericsson AB                 | Phone  +46 10 7148287
> Torshamnsgatan 23           | Mobile +46 73 0949079
> SE-164 80 Stockholm, Sweden | mailto: magnus.westerlund@ericsson.com
> ----------------------------------------------------------------------
> 
>

[AVTCORE] Benjamin Kaduk's Discuss on draft-ietf-… Benjamin Kaduk via Datatracker
Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-i… Roni Even (A)
Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-i… Magnus Westerlund
Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-i… Benjamin Kaduk
Re: [AVTCORE] Benjamin Kaduk's Discuss on draft-i… Magnus Westerlund