Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text" - SFU

Gunnar Hellström <gunnar.hellstrom@ghaccess.se> Thu, 10 December 2020 16:49 UTC

Return-Path: <gunnar.hellstrom@ghaccess.se>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1F3EA3A10EA for <avt@ietfa.amsl.com>; Thu, 10 Dec 2020 08:49:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=egensajt.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HXx06Y8O_AOr for <avt@ietfa.amsl.com>; Thu, 10 Dec 2020 08:49:26 -0800 (PST)
Received: from smtp.egensajt.se (smtp.egensajt.se [193.42.159.246]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AB25C3A10E9 for <avt@ietf.org>; Thu, 10 Dec 2020 08:49:23 -0800 (PST)
Received: from [192.168.2.137] (h77-53-37-81.cust.a3fiber.se [77.53.37.81]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: gunnar.hellstrom@ghaccess.se) by smtp.egensajt.se (Postfix) with ESMTPSA id 62DA220433; Thu, 10 Dec 2020 17:49:21 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=egensajt.se; s=dkim; t=1607618961; bh=fan+WlBL26k7pDSgyUGQr22ZG7sbRkUZpOrLr5ERIB8=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=LETDHlm/vvlCnHVDL2uVR/l7ic4RatR2SkDjQevgzpg8cqZN5fGE9LtxZfV3mGEsl 3aDNCa4NUvfoQXBBGxMCEf7hXYvnE9Wq4U4rHDKK/7hKg+/3Q9ON86edG5vZroR4yz Paf4QcIZXpy02ZT/ml15r1zhZCi5nTWVVSeUBquE=
From: Gunnar Hellström <gunnar.hellstrom@ghaccess.se>
To: Bernard Aboba <bernard.aboba@gmail.com>
Cc: IETF AVTCore WG <avt@ietf.org>
References: <CAOW+2duJwBizifn94qcRfpZ6cqRjRVyueyoofox0AWjkcJm02g@mail.gmail.com> <68866CAE-C81B-4C23-9DB5-CA8B57C1E3DC@brianrosen.net> <CAOW+2dt88EX1bj27zurn7XX-Ct24CFi_5SRyGObvGjwEDRuR_A@mail.gmail.com> <CAOW+2dtwOEG6=OEQarQTxQKnUkBAKCCArQXZQUP_QTTbALK1iw@mail.gmail.com> <58a73f79-60ad-442c-3162-d2cd52f025fe@ghaccess.se> <b8784aa8-a5ef-544c-7315-c64767211387@ghaccess.se> <CAOW+2duqBWJrq8ihp9Of+4JfYcgJeAJw8tG9T7QDn_6kxC3hOA@mail.gmail.com> <48539ee9-ba2f-ff8e-b71f-f3cf64b5d59d@ghaccess.se> <bd1ee906-b759-028f-9108-0109fce490c7@ghaccess.se>
Message-ID: <8d7f249a-3e22-602f-8d3a-ef339146d2dd@ghaccess.se>
Date: Thu, 10 Dec 2020 17:49:17 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1
MIME-Version: 1.0
In-Reply-To: <bd1ee906-b759-028f-9108-0109fce490c7@ghaccess.se>
Content-Type: multipart/alternative; boundary="------------17984BA20C77F5010FE94FEB"
Content-Language: sv
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/EAlALSiS4eIuSVSo7oRUiPHtZ2E>
Subject: Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text" - SFU
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Dec 2020 16:49:31 -0000

After looking a bit more on the multi-stream issue, I want to reduce the 
ambition I announced below.

We should not put too many varied methods in one draft. It will be too 
complicated to require or state compliance to them then.

So, I want to back off to mention the possible benefits of multi-stream 
RTP and SFU. I want to indicate that it would be possible to negotiate 
between the specified RTP-mixer method and other multi-party aware 
methods, but say that it is for further study to specify any such method 
in more detail.

This conclusion also matches the need from referencing from NENA 
specifications.


/Gunnar

Den 2020-12-09 kl. 20:48, skrev Gunnar Hellström:
>
> Bernard,
>
> I am editing the draft to include a specification of use of multiple 
> RTP streams in one RTP session to each participant. There seems to be 
> so many ways to negotiate and use such topology, so I do not think I 
> should dictate to use one specific. Therefore I suggest that the 
> sections about that topology is made general and need exact 
> specification at implementation time.
>
> The idea with the other methods (the RTP-mixer method and the method 
> for multi-party unaware endpoints) are intended to be sufficiently 
> specified to result in interoperability, e.g. between emergency 
> services and mobile services.
>
> I do not see the same possibility for the multi-stream method. It is 
> likely that RTT will be included in an environment where the other 
> media and the application calls for use of specific RTP topologies and 
> specific ways to negotiate the use of it. Therefore I intend to write 
> a short description with  general statements and references to which 
> considerations from the RTP-mixer method should be valid also for the 
> RTP multi-stream methods.
>
> I intend to also check and adjust the use of terms as "multi-party 
> aware" to make clear when it means one or both multi-party aware methods.
>
> A couple of further notes inline. I got some help to understand more;
>
> Den 2020-12-08 kl. 23:45, skrev Gunnar Hellström:
>>
>> Bernard, some proposed conclusions below:
>>
>> Den 2020-12-08 kl. 20:10, skrev Bernard Aboba:
>>> Gunnar asked:
>>>
>>> "Questions:
>>>
>>> 1. Is 3GPP TS 26.114 Annex T an example of sdp for the kind of SFU
>>> described in RFC 7667 section 3.7, with one media section per stream in
>>> a lower number of streams than participants? If not, how does a normal
>>> sdp look for an SFU?
>>>
>>> [BA] Annex T appears to use payload multiplexing (one payload type 
>>> per stream) for simulcast, which is not commonly deployed.
>>> It is more common to use SSRC multiplexing (one SSRC per stream), 
>>> all on the same payload type.  Here are examples of what simulcast 
>>> SDP looks like in various browsers:
>>> A playground for Simulcast without an SFU - webrtcHacks 
>>> <https://webrtchacks.com/a-playground-for-simulcast-without-an-sfu/>
>>>
>> [GH] Thanks. That example was with Bundle and WbRTC. If that is the 
>> environment you imagine, we will not be allowed to use RTP transport 
>> for RTT. Instead it is the data channel usage for RTT in RFC 8865. 
>> Only audio and video are using RTP.
>>
>> But I assume that there could be use of the SFU technology also for 
>> traditional SIP. What we need to consider for the current draft is 
>> then if an sdp indicating capability for use of an SFU or some other 
>> form of multi-stream RTP will have something characteristic for that 
>> capability. E.g. an m-section for each stream.
>>
>> I think it is quite likely that it must be visible in the sdp that a 
>> party has multi-stream RTP capability.
>>
>> Then the "rtt-mix" attribute is sufficient to indicate capability for 
>> the RTP-mixer based multi-party solution, and we only need to tell 
>> that other solutions may be implemented.
>>
>> I need to check wording so that the draft does not preclude other 
>> multi-party RTP based RTT solutions.
>>
> [GH]I will just assume that it is visible in the sdp that a 
> multi-party session with RTP multi-stream topology is offered, or a 
> capability to use it. And that a party who is not capable of that 
> method can indicate that fact in an offer or answer. I will not say how.
>>
>>>
>>>
>>> 2. RFC 7667 section 3.7 seems to tell that the sequence number is
>>> regenerated in sequence by the middlebox. Is that right? That would make
>>> it not possible to detect if total loss of text occurred. Recovery can
>>> be done based on timestamp analysis, but not detection of unrecoverable
>>> loss.
>>>
>>> [BA] The sequence number needs to be regenerated when an SFU 
>>> switches between simulcast streams sent by a participant, sending 
>>> only a single stream onward to the viewer.  This is needed since 
>>> each simulcast stream has its own sequence number space, and 
>>> endpoints typically do not support receiving simulcast.
>>
>> There are regulatory requirements to detect and mark cases of 
>> suspected text loss. So, if there is packet loss from a participant 
>> to the middlebox, that sequence gap must not be just ignored and 
>> resulting in an unbroken sequence number series after the SFU. Can 
>> the sequence number from the source be copied to the transmission 
>> from the SFU?
>>
>> Does not video also have that need?
>>
> [GH]During active transmission from one source, the sequence number 
> needs to step up at the same rate as the sequence numbers of incoming 
> packets, so that gaps from the leg between the transmitter to the 
> middlebox can be revealed and acted on by the receiving party. So, the 
> offset adjustment is only intended to make the sequence look 
> consequtive even if there was a pause in transmission of a source. 
> Then it works for total loss detection for RTT.
>>
>>>
>>> However an RTT sender would not send simulcast (e.g. multiple 
>>> versions of the same text stream).  So the SFU would potentially 
>>> forward only a single stream per participant, with no CSRCs. The 
>>> question is whether a "multi-party aware endpoint" would be prepared 
>>> to receive multiple SSRCs (and no CSRCs), with each SSRC 
>>> representing a single RTT source.
>> In order to have interoperability, the "rtt-mix" attribute must only 
>> mean the RTP-mixer based solution. Both mixers and endpoints could 
>> also have multi-stream capability, but the intention to open a 
>> multiparty session with multi-stream need to have its own 
>> characteristics.
> [GH]I will differentiate the terms. Most likely so that multi-party 
> aware means any of the two methods RTP-mixer and multi-stream RTP. And 
> then specific wording for each method when there is talk about only 
> one method.
>>>
>>> 3. How is the source conveyed? There is apparently an SSRC to SSRC
>>> mapping taking place. Is there a lower number of SSRCs used in
>>> transmission from the middlebox than the total number of participants?
>>> If so, how is switch of source indicated? By RTCP?
>>>
>>> [BA] I am suggesting a scenario where there would be one SSRC per 
>>> source.  The SFU could perhaps replace the SSRC received from a 
>>> participant with an SSRC of its own, but each text stream would have 
>>> a unique SSRC up to the maximum participant limit.
> [GH]But the mapping between NAME and SSRC is lost when RTCP SDES is 
> just sent along with replaced SSRC.  But I assume that there are 
> methods to sort that out by some signaling that I dont know. So since 
> I will not be specific I will not bother about such detail.
>>>
>>>
>> RTT is specific in that no data flows when the user does not send any 
>> information. In the other media there is always contents. 
> [GH] So in a way it can be said to be "SFU by nature".   I might 
> mention that. It eases switching. In the normal case no switching will 
> be needed. All simultaneous senders can likely always be allowed to 
> transmit on to the receivers.
>>
>>>
>>> If we find that it is likely possible to use the SFU, then I would
>>> anyway think that we just include general hints of its use in this
>>> draft, and let further work detail it. Would that be an acceptable
>>> conclusion?
> [GH]I am continuing on that line.
>>>
>>> [BA] My question was whether a "multi-party aware" endpoint could 
>>> support an SFU (SSRCs but no CSRCs), and how the negotiation would 
>>> work (whether it was supported or not).  If it can work and you can 
>>> indicate how it would be done (or how it would fail if it cannot be 
>>> done), that would be fine with me.
>>> "
>>
>> I need another example of sdp for an SFU session establishment in a 
>> traditional SIP environment to be able to answer.
>>
> [GH] I changed my mind. My general approach does not need any more sdp 
> examples.
>>
> Thanks,
>
> Gunnar
>
>>
>> Thanks,
>>
>> Gunnar
>>
>>>
>>> On Mon, Dec 7, 2020 at 3:03 PM Gunnar Hellström 
>>> <gunnar.hellstrom@ghaccess.se <mailto:gunnar.hellstrom@ghaccess.se>> 
>>> wrote:
>>>
>>>     Bernard, I indicated that I would like to discuss the SFU.
>>>
>>>     You said: "Also, some clarifications of desired behavior by SFUs
>>>     might
>>>     be helpful." and mentioned SFU in a qouple of other comments.
>>>
>>>     I like the idea to have a possibility to use E2E encryption. I also
>>>     imagined that with separate RTP streams it would be possible to
>>>     better
>>>     detect if unrecoverable loss of text appears or not after some lost
>>>     packets. But I get unsure if that is possible when I read
>>>     section 3.7 of
>>>     RFC 7667.
>>>
>>>     I need to understand the SFU description in RFC 7667 better to
>>>     judge if
>>>     it really can carry RFC 2198 coded text with maintained
>>>     possibility to
>>>     detect unrecoverable loss.
>>>
>>>     Before IETF 108, the draft had both an RTP multi-stream method
>>>     and an
>>>     RTP-single stream mixer method ( and also the method for
>>>     multi-party
>>>     unaware endpoints ). Then we decided to move on without the
>>>     multi-stream
>>>     method. It is a bit inconsistent to bring it in again, but I can
>>>     see the
>>>     possibility to arrange for E2E encryption to be a good reason. The
>>>     earlier multi-stream method was based on the RTP-translator and
>>>     required
>>>     the mixer to recover from packet loss and insertion of marks for
>>>     unrecoverable loss, and therefore could not be used for E2E
>>>     encryption.
>>>     Therefore it might be valid to rethink the earlier decision.
>>>
>>>     Questions:
>>>
>>>     1. Is 3GPP TS 26.114 Annex T an example of sdp for the kind of SFU
>>>     described in RFC 7667 section 3.7, with one media section per
>>>     stream in
>>>     a lower number of streams than participants? If not, how does a
>>>     normal
>>>     sdp look for an SFU?
>>>
>>>     2. RFC 7667 section 3.7 seems to tell that the sequence number is
>>>     regenerated in sequence by the middlebox. Is that right? That
>>>     would make
>>>     it not possible to detect if total loss of text occurred.
>>>     Recovery can
>>>     be done based on timestamp analysis, but not detection of
>>>     unrecoverable
>>>     loss.
>>>
>>>     3. How is the source conveyed? There is apparently an SSRC to SSRC
>>>     mapping taking place. Is there a lower number of SSRCs used in
>>>     transmission from the middlebox than the total number of
>>>     participants?
>>>     If so, how is switch of source indicated? By RTCP?
>>>
>>>
>>>     If we find that it is likely possible to use the SFU, then I would
>>>     anyway think that we just include general hints of its use in this
>>>     draft, and let further work detail it. Would that be an acceptable
>>>     conclusion?
>>>
>>>
>>>     Regards
>>>
>>>     Gunnar
>>>
>>>     -- 
>>>
>>>     Gunnar Hellström
>>>     GHAccess
>>>     gunnar.hellstrom@ghaccess.se <mailto:gunnar.hellstrom@ghaccess.se>
>>>
>> -- 
>> Gunnar Hellström
>> GHAccess
>> gunnar.hellstrom@ghaccess.se
> -- 
> Gunnar Hellström
> GHAccess
> gunnar.hellstrom@ghaccess.se

-- 
Gunnar Hellström
GHAccess
gunnar.hellstrom@ghaccess.se