Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text"
Bernard Aboba <bernard.aboba@gmail.com> Fri, 04 December 2020 22:54 UTC
Return-Path: <bernard.aboba@gmail.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ACAB63A0FF3 for <avt@ietfa.amsl.com>; Fri, 4 Dec 2020 14:54:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ugFz50DZO18W for <avt@ietfa.amsl.com>; Fri, 4 Dec 2020 14:54:15 -0800 (PST)
Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E99783A0FF5 for <avt@ietf.org>; Fri, 4 Dec 2020 14:54:14 -0800 (PST)
Received: by mail-lf1-x129.google.com with SMTP id b4so2262828lfo.6 for <avt@ietf.org>; Fri, 04 Dec 2020 14:54:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=H23OBaHaSX1DQB/7d1F2hJVHoohDtLqIrnyy/RkLM3k=; b=erJEyXa/M1iU+ZWCAd7l4XqOkctIxLnaGO2Pv3TrnGfJsaY1KKWS29/zlMueFOtXzl SEB9wSr5ngiPaGcMnLQIKa6AKGgg0KEU9BMZqtjquY4aYV3Nyty1sx9+tXAc5SsD8d/3 grakDkJGMXipW9ZrQELGqWHD2SNeEwzTpuB3XX+ViEhpWjOYQwjnuJF394aSZ0RYrC9m vkU1Nd+O12JrG9Kagka2LcXd9ffAofK8hycY1a+A1DJAaaIhoclAj5IVWdplzR1HzMsJ yAZww0QJ+fnr+O+vH18AdPi8N1KMfZ4MPfwjpfhYgOfcZt6ZIG829yaLiI6NUFT8l/sY niPA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=H23OBaHaSX1DQB/7d1F2hJVHoohDtLqIrnyy/RkLM3k=; b=RAA/wPR8FITZfufhXB8AZwmK+GzDiX2xTR6t91Q+MliRzfRbIGUZghi81U5SI/1sGf 3r1gYbjoivVcG48TqnnaI5mN70FQMW0TU6z74xR8BTsKhtKe2DQcCPP2HZwvU5A487sV zH8oM+GUeT/cMNdiFK7ENZfZeC2UD+3t4SF/V5yNgVPK0q76ZsrkHNF30x6Eh/cSRcws S4aBnRUpinEiE5bLXTmq66SQe45xz715Z7tWXIfTfZIjqyKQc2gGGhO0wrjdjjZBmu9B +bEzq2AZwL2EqdmR7PSm3XfYtTdyRhq4MK1ZOTP0HRHcDAltrC+KJW7S0xpwNHNU5Ik1 v2Cw==
X-Gm-Message-State: AOAM530G0ZLFOJF9qv7lsayFZAtyoR+96LfXgS/pOELiCgc1Nj9OZSi+ yrqPTU8Tx4CMoNvrVJUdy/jl3YIrYNfHq73lKZWAs1aTkDQ=
X-Google-Smtp-Source: ABdhPJwz9Uw5DCuIbOxHELgxD7u7tkVjHGdKZ+Jp7sMdl7YDkbseBB9MQNmJxNiB9h8e8VNd4aycagdn7GXjgHfAaI0=
X-Received: by 2002:a19:c94:: with SMTP id 142mr4205296lfm.284.1607122452048; Fri, 04 Dec 2020 14:54:12 -0800 (PST)
MIME-Version: 1.0
References: <CAOW+2duJwBizifn94qcRfpZ6cqRjRVyueyoofox0AWjkcJm02g@mail.gmail.com> <68866CAE-C81B-4C23-9DB5-CA8B57C1E3DC@brianrosen.net> <CAOW+2dt88EX1bj27zurn7XX-Ct24CFi_5SRyGObvGjwEDRuR_A@mail.gmail.com>
In-Reply-To: <CAOW+2dt88EX1bj27zurn7XX-Ct24CFi_5SRyGObvGjwEDRuR_A@mail.gmail.com>
From: Bernard Aboba <bernard.aboba@gmail.com>
Date: Fri, 04 Dec 2020 14:54:00 -0800
Message-ID: <CAOW+2dtwOEG6=OEQarQTxQKnUkBAKCCArQXZQUP_QTTbALK1iw@mail.gmail.com>
To: IETF AVTCore WG <avt@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000dd705505b5ab5a80"
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/WJ5Yb5Qm-LrQXcIYxaOeLZD1i-c>
Subject: Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text"
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Dec 2020 22:54:19 -0000
Here are my comments. Overall, I would suggest placing terminology and some overview material earlier in the document. Also, some clarifications of desired behavior by SFUs might be helpful. Abstract Enhancements for RFC 4103 real-time text mixing is provided in this [BA] "is provided" -> "are provided" document, suitable for a centralized conference model that enables source identification and source switching. The intended use is for [BA] "source switching" is not clear in this context. The document refers to the mixing model in which the conferencing server sends a single SSRC to participants with contributors in the CSRC field. But an SFU will just forward packets to a participant, who will see multiple SSRCs (and no CSRCs). Since the SSRCs may be modified, this can also be construed as "source switching". real-time text mixers and multi-party-aware participant endpoints. [BA] It is not clear to me what "multi-party-aware" means here. Do you mean that the endpoint can distinguish between sources based on SSRC, or CSRC, or both? The specified mechanism build on the standard use of the CSRC list in [BA] "build" -> "builds" the RTP packet for source identification. The method makes use of the same "text/t140" and "text/red" formats as for two-party sessions. A capability exchange is specified so that it can be verified that a participant can handle the multi-party coded real-time text stream. The capability is indicated by use of a media attribute "rtt-mixer". The document updates RFC 4103[RFC4103] A specifications of how a mixer can format text for the case when the [BA] "A specifications" -> "Specification" 1. Introduction RFC 4103[RFC4103] specifies use of RFC 3550 RTP [RFC3550] for [BA] A better citation format is "RTP Payload for Text Conversation" [RFC4103] transmission of real-time text (RTT) and the "text/t140" format. It also specifies a redundancy format "text/red" for increased robustness. RFC 4102 [RFC4102] registers the "text/red" format. Regional regulatory requirements specify provision of real-time text in multi-party calls. [BA] Reference? Another requirement is that the mixing procedure must not introduce delays in the text streams that are experienced disturbing the real- [BA] "experienced" -> "experienced," time experience of the receiving users. Real-time text mixers for multi-party sessions therefore need to insert the source of each transmitted group of text from a conference participant so that the text can be transmitted interleaved with text groups from different sources in the rate they are created. This enables the text groups to be presented by endpoints in suitable grouping with other text from the same source. The presentation can then be arranged so that text from different sources can be presented in real-time and easily read while it is possible for a reading user to also perceive approximately when the text was created in real time by the different parties. The transmission and mixing is intended to be done in a general way so that presentation can be arranged in a layout decided by the endpoint. [BA] By "insert the source" I assume that you mean "indicate the contributing SSRC within the CSRC field", correct? There are existing implementations of RFC 4103 without the updates from this document. These will not be able to receive and present real-time text mixed for multi-party aware endpoints. [BA] By "implementations", you mean "conferencing server implementations", correct? Presumably these implementations do not support mixing, so how do they operate? A negotiation mechanism is therefore needed for verification if the parties are able to handle a multi-party coded stream and agreeing on using that method. [BA] Can you clarify what "multi-party coded stream" means, exactly? Does this imply a single SSRC with multiple CSRCs? Or would multiple SSRCs without CSRCs also qualify? It would perhaps help to have the terminology section moved up prior to A fall-back mixing procedure is also needed for cases when the negotiation result indicates that a receiving endpoint is not capable of handling the mixed format. This method is called the mixing procedure for multi-party unaware endpoints. The fall-back method is naturally not expected to meet all performance requirements placed on the mixing procedure for multi-party aware endpoints. [BA] A clarification of the behavior of "multi-party unaware endpoints" would be helpful. Do these just ignore the SSRC and CSRC fields entirely? The document updates RFC 4103[RFC4103] by introducing an attribute for indicating capability for the multi-party mixing case and rules for source indications and source switching. 1.1. Selected solution and considered alternative [BA] I think you need to insert the terminology section before this section. A number of alternatives were considered when searching an efficient and easily implemented multi-party method for real-time text. This section explains a few of them briefly. Hellstrom Expires 26 May 2021 [Page 5] Internet-Draft RTP-mixer format for multi-party RTT November 2020 One RTP stream per source, sent in the same RTP session with "text/red" format. From some points of view, use of multiple RTP streams, one for each source, sent in the same RTP session, called the RTP translator model in [RFC3550], would be efficient, and use exactly [BA] I would suggest referencing "RTP Topologies" [RFC7667]. BTW, a single SSRC for each source could also be used in the Selective Forwarding Middlebox topology (Section 3.7), not just the Translator topology, described in RFC 7667 Section 3.5. the same packet format as [RFC4103], the same payload type and a simple SDP declaration. However, the RTP implementation in both mixers and endpoints need to support multiple streams in the same RTP session in order to use this mechanism. For best deployment opportunity, it should be possible to upgrade existing endpoint solutions to be multi-party aware with a reasonable effort. There is currently a lack of support for multi-stream RTP in certain implementation technologies. This fact made this solution not selected for inclusion in this document. [BA] I would suggest that this limitation of existing implementations be introduced earlier in the document, perhaps summarizing it in terminology. Note that the use of a mixer, as opposed to an SFU, assumes that the conferencing unit has access to the payload (e.g. the payload is not E2E encrypted). RTT transport in WebRTC Transport of real-time text in the WebRTC technology is specified to use the WebRTC data channel in [I-D.ietf-mmusic-t140-usage-data-channel]. That spcification [BA] "spcification" -> "specification" contains a section briefly describing its use in multi-party sessions. The focus of this document is RTP transport. Therefore, even if the WebRTC transport provides good multi-party performance, it is just mentioned in this document in relation to providing gateways with multi-party capabilities between RTP and WebRTC technologies. 1.2. Nomenclature [BA] "Nomenclature" -> "Terminology"? The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP- mixer, RTP-translator are explained in [RFC3550] The term "T140block" is defined in RFC 4103 [RFC4103] to contain one or more T.140 code elements. Hellstrom Expires 26 May 2021 [Page 7] Internet-Draft RTP-mixer format for multi-party RTT November 2020 "TTY" stands for a text telephone type used in North America. "WebRTC" stands for web based communication specified by W3C and IETF. "DTLS-SRTP" stands for security specified in RFC 5764 [RFC5764]. "multi-party aware" stands for an endpoint receiving real-time text from multiple sources through a common conference mixer being able to present the text in real-time separated by source and presented so that a user can get an impression of the approximate relative timing of text from different parties. "multi-party unaware" stands for an endpoint not itself being able to separate text from different sources when received through a common conference mixer. [BA] I would suggest that this section be moved up earlier in the document, such as Section 1.1. 1.3. Intended application 2. Overview over the two specified solutions [BA] "over" -> "of" [BA] Overall, this section feels like it should appear earlier in the document, since it provides an introduction to what it contains. Perhaps it should be placed just after terminology (suggested to be moved to 1.1) and before the material currently in Section 1.1? This section contains a brief introduction of the two methods specified in this document. 3.1. Offer/answer considerations RFC 4103[RFC4103] specifies use of RFC 3550 RTP[RFC3550], and a redundancy format "text/red" for increased robustness of real-time text transmission. This document updates RFC 4103[RFC4103] by introducing a capability negotiation for handling multi-party real- time text, a way to indicate the source of transmitted text, and rules for efficient timing of the transmissions interleaved from different sources. The capability negotiation is based on use of the sdp media attribute "rtt-mixer". Both parties shall indicate their capability in a session setup or modification, and evaluate the capability of the counterpart. The syntax is as follows: "a=rtt-mixer" [BA] Might be useful to explicitly state what happens if both offer and answer don't include "a=rtt-mixer" and also how an SFU that forwards but doesn't mix should Offer or Answer. 3.2. Actions depending on capability negotiation result A transmitting party SHALL send text according to the multi-party format only when the negotiation for this method was successful and when the CC field in the RTP packet is set to 1. In all other cases, the packets SHALL be populated and interpreted as for a two-party session. A party which has negotiated the "rtt-mixer" sdp media attribute MUST populate the CSRC-list and format the packets according to Section 3 if it acts as an rtp-mixer and sends multi-party text. [BA] Presumably "party" here only refers to the conferencing server, not participants, correct? A party which has negotiated the "rtt-mixer" sdp media attribute MUST interpret the contents of the "CC" field the CSRC-list and the packets according to Section 3 in received rtp packets in the corresponding RTP stream. A party not performing as a mixer MUST not include the CSRC list. [BA] "not performing as a mixer MUST not" -> "not acting as a mixer MUST NOT" 3.5. Keep-alive After that, the transmitter SHALL send keep-alive traffic to the receiver(s) at regular intervals when no other traffic has occurred during that interval, if that is decided for the actual connection. Recommendations for keep-alive can be found in [RFC6263]. [BA] Reading this paragraph, I'm unclear whether it is advocating an alternative to the [RFC6263] recommendation (e.g. RTP/RTCP mux). Also note that RFC 7675 consent checks can also help here. 3.8. Do not send received text to the originating source Text received to a mixer from a participant SHOULD NOT be included in [BA] "received to" -> "sent to" transmission from the mixer to that participant. 3.21. Security for session control and media Security SHOULD be applied on both session control and media. In [BA] By "Security" I assume you mean SIP over TLS for signalling and SRTP for media? It might be useful to say this explicitly (since E2E security is also possible). applications where legacy endpoints without security may exist, a negotiation SHOULD be performed to decide if security by encryption will be applied. If no other security solution is mandated for the application, then RFC 8643 OSRTP [RFC8643] SHOULD be applied to negotiate SRTP media security with DTLS. Most SDP examples below are for simplicity expressed without the security additions. The principles (but not all details) for applying DTLS-SRTP [RFC5764] security is shown in a couple of the following examples. 3.22. SDP offer/answer examples This sections shows some examples of SDP for session negotiation of the real-time text media in SIP sessions. Audio is usually provided in the same session, and sometimes also video. The examples only show the part of importance for the real-time text media. Offer example for "text/red" format and multi-party support: m=text 11000 RTP/AVP 100 98 a=rtpmap:98 t140/1000 a=rtpmap:100 red/1000 a=fmtp:100 98/98/98 a=rtt-mixer [BA] To be clear: an SFU that forwards but does not mix should Answer without "a=rtt-mixer", even if the Offer indicates the ability to support multiple streams per session? Also, an SFU should not Offer "a=rtt-mixer" if it doesn't support mixing? Answer example from a multi-party capable device m=text 14000 RTP/AVP 100 98 a=rtpmap:98 t140/1000 a=rtpmap:100 red/1000 a=fmtp:100 98/98/98 a=rtt-mixer Offer example for "text/red" format including multi-party and security: a=fingerprint: (fingerprint1) m=text 11000 RTP/AVP 100 98 a=rtpmap:98 t140/1000 a=rtpmap:100 red/1000 a=fmtp:100 98/98/98 a=rtt-mixer The "fingerprint" is sufficient to offer DTLS-SRTP, with the media line still indicating RTP/AVP. Note: For brevity, the entire value of the SDP fingerprint attribute is not shown in this and the following example.
- [AVTCORE] WG Last Call: "RTP-mixer formatting of … Bernard Aboba
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Dan Mongrain
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Brian Rosen
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Bernard Aboba
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… James Craig
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… James Hamlin
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Bernard Aboba
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Lorenzo Miniero
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Lorenzo Miniero
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström
- Re: [AVTCORE] WG Last Call: "RTP-mixer formatting… Gunnar Hellström