Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text"

Bernard Aboba <bernard.aboba@gmail.com> Fri, 04 December 2020 22:54 UTC

Return-Path: <bernard.aboba@gmail.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ACAB63A0FF3 for <avt@ietfa.amsl.com>; Fri, 4 Dec 2020 14:54:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ugFz50DZO18W for <avt@ietfa.amsl.com>; Fri, 4 Dec 2020 14:54:15 -0800 (PST)
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E99783A0FF5 for <avt@ietf.org>; Fri, 4 Dec 2020 14:54:14 -0800 (PST)
Received: by mail-lf1-x129.google.com with SMTP id b4so2262828lfo.6 for <avt@ietf.org>; Fri, 04 Dec 2020 14:54:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=H23OBaHaSX1DQB/7d1F2hJVHoohDtLqIrnyy/RkLM3k=; b=erJEyXa/M1iU+ZWCAd7l4XqOkctIxLnaGO2Pv3TrnGfJsaY1KKWS29/zlMueFOtXzl SEB9wSr5ngiPaGcMnLQIKa6AKGgg0KEU9BMZqtjquY4aYV3Nyty1sx9+tXAc5SsD8d/3 grakDkJGMXipW9ZrQELGqWHD2SNeEwzTpuB3XX+ViEhpWjOYQwjnuJF394aSZ0RYrC9m vkU1Nd+O12JrG9Kagka2LcXd9ffAofK8hycY1a+A1DJAaaIhoclAj5IVWdplzR1HzMsJ yAZww0QJ+fnr+O+vH18AdPi8N1KMfZ4MPfwjpfhYgOfcZt6ZIG829yaLiI6NUFT8l/sY niPA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=H23OBaHaSX1DQB/7d1F2hJVHoohDtLqIrnyy/RkLM3k=; b=RAA/wPR8FITZfufhXB8AZwmK+GzDiX2xTR6t91Q+MliRzfRbIGUZghi81U5SI/1sGf 3r1gYbjoivVcG48TqnnaI5mN70FQMW0TU6z74xR8BTsKhtKe2DQcCPP2HZwvU5A487sV zH8oM+GUeT/cMNdiFK7ENZfZeC2UD+3t4SF/V5yNgVPK0q76ZsrkHNF30x6Eh/cSRcws S4aBnRUpinEiE5bLXTmq66SQe45xz715Z7tWXIfTfZIjqyKQc2gGGhO0wrjdjjZBmu9B +bEzq2AZwL2EqdmR7PSm3XfYtTdyRhq4MK1ZOTP0HRHcDAltrC+KJW7S0xpwNHNU5Ik1 v2Cw==
X-Gm-Message-State: AOAM530G0ZLFOJF9qv7lsayFZAtyoR+96LfXgS/pOELiCgc1Nj9OZSi+ yrqPTU8Tx4CMoNvrVJUdy/jl3YIrYNfHq73lKZWAs1aTkDQ=
X-Google-Smtp-Source: ABdhPJwz9Uw5DCuIbOxHELgxD7u7tkVjHGdKZ+Jp7sMdl7YDkbseBB9MQNmJxNiB9h8e8VNd4aycagdn7GXjgHfAaI0=
X-Received: by 2002:a19:c94:: with SMTP id 142mr4205296lfm.284.1607122452048; Fri, 04 Dec 2020 14:54:12 -0800 (PST)
MIME-Version: 1.0
References: <CAOW+2duJwBizifn94qcRfpZ6cqRjRVyueyoofox0AWjkcJm02g@mail.gmail.com> <68866CAE-C81B-4C23-9DB5-CA8B57C1E3DC@brianrosen.net> <CAOW+2dt88EX1bj27zurn7XX-Ct24CFi_5SRyGObvGjwEDRuR_A@mail.gmail.com>
In-Reply-To: <CAOW+2dt88EX1bj27zurn7XX-Ct24CFi_5SRyGObvGjwEDRuR_A@mail.gmail.com>
From: Bernard Aboba <bernard.aboba@gmail.com>
Date: Fri, 04 Dec 2020 14:54:00 -0800
Message-ID: <CAOW+2dtwOEG6=OEQarQTxQKnUkBAKCCArQXZQUP_QTTbALK1iw@mail.gmail.com>
To: IETF AVTCore WG <avt@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000dd705505b5ab5a80"
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/WJ5Yb5Qm-LrQXcIYxaOeLZD1i-c>
Subject: Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text"
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Dec 2020 22:54:19 -0000

Here are my comments.

Overall, I would suggest placing terminology and some overview material
earlier in the document.  Also,
some clarifications of desired behavior by SFUs might be helpful.

Abstract


   Enhancements for RFC 4103 real-time text mixing is provided in this

[BA] "is provided" -> "are provided"

   document, suitable for a centralized conference model that enables
   source identification and source switching.  The intended use is for

[BA] "source switching" is not clear in this context. The document
refers to the mixing model in which the conferencing server sends
a single SSRC to participants with contributors in the CSRC field.
But an SFU will just forward packets to a participant, who will see
multiple SSRCs (and no CSRCs). Since the SSRCs may be modified, this
can also be construed as "source switching".

   real-time text mixers and multi-party-aware participant endpoints.

[BA] It is not clear to me what "multi-party-aware" means here. Do you
mean that the endpoint can distinguish between sources based on SSRC, or
CSRC, or both?

   The specified mechanism build on the standard use of the CSRC list in

[BA] "build" -> "builds"

   the RTP packet for source identification.  The method makes use of
   the same "text/t140" and "text/red" formats as for two-party
   sessions.

   A capability exchange is specified so that it can be verified that a
   participant can handle the multi-party coded real-time text stream.
   The capability is indicated by use of a media attribute "rtt-mixer".

   The document updates RFC 4103[RFC4103]

   A specifications of how a mixer can format text for the case when the

[BA] "A specifications" -> "Specification"



1.  Introduction

   RFC 4103[RFC4103] specifies use of RFC 3550 RTP [RFC3550] for

[BA] A better citation format is "RTP Payload for Text Conversation"
[RFC4103]

   transmission of real-time text (RTT) and the "text/t140" format.  It
   also specifies a redundancy format "text/red" for increased
   robustness.  RFC 4102 [RFC4102] registers the "text/red" format.
   Regional regulatory requirements specify provision of real-time text
   in multi-party calls.

[BA] Reference?

   Another requirement is that the mixing procedure must not introduce
   delays in the text streams that are experienced disturbing the real-

[BA] "experienced" -> "experienced,"

   time experience of the receiving users.


   Real-time text mixers for multi-party sessions therefore need to
   insert the source of each transmitted group of text from a conference
   participant so that the text can be transmitted interleaved with text
   groups from different sources in the rate they are created.  This
   enables the text groups to be presented by endpoints in suitable
   grouping with other text from the same source.  The presentation can
   then be arranged so that text from different sources can be presented
   in real-time and easily read while it is possible for a reading user
   to also perceive approximately when the text was created in real time
   by the different parties.  The transmission and mixing is intended to
   be done in a general way so that presentation can be arranged in a
   layout decided by the endpoint.

[BA] By "insert the source" I assume that you mean "indicate the
contributing
SSRC within the CSRC field", correct?

   There are existing implementations of RFC 4103 without the updates
   from this document.  These will not be able to receive and present
   real-time text mixed for multi-party aware endpoints.

[BA] By "implementations", you mean "conferencing server implementations",
correct?
Presumably these implementations do not support mixing, so how do they
operate?

   A negotiation mechanism is therefore needed for verification if the
   parties are able to handle a multi-party coded stream and agreeing on
   using that method.

[BA] Can you clarify what "multi-party coded stream" means, exactly? Does
this imply a single SSRC with multiple CSRCs? Or would multiple SSRCs
without
CSRCs also qualify? It would perhaps help to have the terminology section
moved
up prior to

   A fall-back mixing procedure is also needed for cases when the
   negotiation result indicates that a receiving endpoint is not capable
   of handling the mixed format.  This method is called the mixing
   procedure for multi-party unaware endpoints.  The fall-back method is
   naturally not expected to meet all performance requirements placed on
   the mixing procedure for multi-party aware endpoints.

[BA] A clarification of the behavior of "multi-party unaware endpoints"
would
be helpful. Do these just ignore the SSRC and CSRC fields entirely?

   The document updates RFC 4103[RFC4103] by introducing an attribute
   for indicating capability for the multi-party mixing case and rules
   for source indications and source switching.

1.1.  Selected solution and considered alternative


[BA] I think you need to insert the terminology section before this
section.

   A number of alternatives were considered when searching an efficient
   and easily implemented multi-party method for real-time text.  This
   section explains a few of them briefly.





Hellstrom                  Expires 26 May 2021                  [Page 5]

Internet-Draft    RTP-mixer format for multi-party RTT     November 2020


   One RTP stream per source, sent in the same RTP session with
   "text/red" format.
      From some points of view, use of multiple RTP streams, one for
      each source, sent in the same RTP session, called the RTP
      translator model in [RFC3550], would be efficient, and use exactly

[BA] I would suggest referencing "RTP Topologies" [RFC7667]. BTW, a single
SSRC for each source could also be used in the Selective Forwarding
Middlebox topology (Section 3.7), not just the Translator topology,
described in RFC 7667 Section 3.5.

      the same packet format as [RFC4103], the same payload type and a
      simple SDP declaration.  However, the RTP implementation in both
      mixers and endpoints need to support multiple streams in the same
      RTP session in order to use this mechanism.  For best deployment
      opportunity, it should be possible to upgrade existing endpoint
      solutions to be multi-party aware with a reasonable effort.  There
      is currently a lack of support for multi-stream RTP in certain
      implementation technologies.  This fact made this solution not
      selected for inclusion in this document.

[BA] I would suggest that this limitation of existing implementations be
introduced earlier in the document, perhaps summarizing it in terminology.

Note that the use of a mixer, as opposed to an SFU, assumes that the
conferencing
unit has access to the payload (e.g. the payload is not E2E encrypted).


   RTT transport in WebRTC
      Transport of real-time text in the WebRTC technology is specified
      to use the WebRTC data channel in
      [I-D.ietf-mmusic-t140-usage-data-channel].  That spcification

[BA] "spcification" -> "specification"

      contains a section briefly describing its use in multi-party
      sessions.  The focus of this document is RTP transport.
      Therefore, even if the WebRTC transport provides good multi-party
      performance, it is just mentioned in this document in relation to
      providing gateways with multi-party capabilities between RTP and
      WebRTC technologies.

1.2.  Nomenclature

[BA] "Nomenclature" -> "Terminology"?

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

   The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
   mixer, RTP-translator are explained in [RFC3550]

   The term "T140block" is defined in RFC 4103 [RFC4103] to contain one
   or more T.140 code elements.




Hellstrom                  Expires 26 May 2021                  [Page 7]

Internet-Draft    RTP-mixer format for multi-party RTT     November 2020


   "TTY" stands for a text telephone type used in North America.

   "WebRTC" stands for web based communication specified by W3C and
   IETF.

   "DTLS-SRTP" stands for security specified in RFC 5764 [RFC5764].

   "multi-party aware" stands for an endpoint receiving real-time text
   from multiple sources through a common conference mixer being able to
   present the text in real-time separated by source and presented so
   that a user can get an impression of the approximate relative timing
   of text from different parties.

   "multi-party unaware" stands for an endpoint not itself being able to
   separate text from different sources when received through a common
   conference mixer.

[BA] I would suggest that this section be moved up earlier in the document,
such as Section 1.1.


1.3.  Intended application

2.  Overview over the two specified solutions
[BA] "over" -> "of"

[BA] Overall, this section feels like it should appear earlier in the
document, since it provides an introduction to what it contains. Perhaps
it should be placed just after terminology (suggested to be moved to 1.1)
and before the material currently in Section 1.1?

   This section contains a brief introduction of the two methods
   specified in this document.

3.1.  Offer/answer considerations

   RFC 4103[RFC4103] specifies use of RFC 3550 RTP[RFC3550], and a
   redundancy format "text/red" for increased robustness of real-time
   text transmission.  This document updates RFC 4103[RFC4103] by
   introducing a capability negotiation for handling multi-party real-
   time text, a way to indicate the source of transmitted text, and
   rules for efficient timing of the transmissions interleaved from
   different sources.

   The capability negotiation is based on use of the sdp media attribute
   "rtt-mixer".

   Both parties shall indicate their capability in a session setup or
   modification, and evaluate the capability of the counterpart.

   The syntax is as follows:
      "a=rtt-mixer"

[BA] Might be useful to explicitly state what happens if both offer and
answer don't include "a=rtt-mixer" and also
how an SFU that forwards but doesn't mix should Offer or Answer.

3.2.  Actions depending on capability negotiation result

   A transmitting party SHALL send text according to the multi-party
   format only when the negotiation for this method was successful and
   when the CC field in the RTP packet is set to 1.  In all other cases,
   the packets SHALL be populated and interpreted as for a two-party
   session.

   A party which has negotiated the "rtt-mixer" sdp media attribute MUST
   populate the CSRC-list and format the packets according to Section 3
   if it acts as an rtp-mixer and sends multi-party text.

[BA] Presumably "party" here only refers to the conferencing server, not
participants, correct?

   A party which has negotiated the "rtt-mixer" sdp media attribute MUST
   interpret the contents of the "CC" field the CSRC-list and the
   packets according to Section 3 in received rtp packets in the
   corresponding RTP stream.

   A party not performing as a mixer MUST not include the CSRC list.

[BA] "not performing as a mixer MUST not" -> "not acting as a mixer MUST
NOT"


3.5.  Keep-alive

   After that, the transmitter SHALL send keep-alive traffic to the
   receiver(s) at regular intervals when no other traffic has occurred
   during that interval, if that is decided for the actual connection.
   Recommendations for keep-alive can be found in [RFC6263].

[BA] Reading this paragraph, I'm unclear whether it is advocating an
alternative to the [RFC6263] recommendation (e.g. RTP/RTCP mux). Also
note that RFC 7675 consent checks can also help here.


3.8.  Do not send received text to the originating source

   Text received to a mixer from a participant SHOULD NOT be included in

[BA] "received to" -> "sent to"

   transmission from the mixer to that participant.

3.21.  Security for session control and media

   Security SHOULD be applied on both session control and media.  In

[BA] By "Security" I assume you mean SIP over TLS for signalling and
SRTP for media? It might be useful to say this explicitly (since E2E
security is also possible).

   applications where legacy endpoints without security may exist, a
   negotiation SHOULD be performed to decide if security by encryption
   will be applied.  If no other security solution is mandated for the
   application, then RFC 8643 OSRTP [RFC8643] SHOULD be applied to
   negotiate SRTP media security with DTLS.  Most SDP examples below are
   for simplicity expressed without the security additions.  The
   principles (but not all details) for applying DTLS-SRTP [RFC5764]
   security is shown in a couple of the following examples.

3.22.  SDP offer/answer examples

   This sections shows some examples of SDP for session negotiation of
   the real-time text media in SIP sessions.  Audio is usually provided
   in the same session, and sometimes also video.  The examples only
   show the part of importance for the real-time text media.

     Offer example for "text/red" format and multi-party support:

           m=text 11000 RTP/AVP 100 98
           a=rtpmap:98 t140/1000
           a=rtpmap:100 red/1000
           a=fmtp:100 98/98/98
           a=rtt-mixer

[BA] To be clear: an SFU that forwards but does not mix should Answer
without "a=rtt-mixer", even if the Offer indicates the ability to support
multiple streams per session?
Also, an SFU should not Offer "a=rtt-mixer" if it doesn't support mixing?

      Answer example  from a multi-party capable device
           m=text 14000 RTP/AVP 100 98
           a=rtpmap:98 t140/1000
           a=rtpmap:100 red/1000
           a=fmtp:100 98/98/98
           a=rtt-mixer

      Offer example for "text/red" format including multi-party
      and security:
            a=fingerprint: (fingerprint1)
            m=text 11000 RTP/AVP 100 98
            a=rtpmap:98 t140/1000
            a=rtpmap:100 red/1000
            a=fmtp:100 98/98/98
            a=rtt-mixer

   The "fingerprint" is sufficient to offer DTLS-SRTP, with the media
   line still indicating RTP/AVP.

   Note: For brevity, the entire value of the SDP fingerprint attribute
   is not shown in this and the following example.