Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text" - SFU

Gunnar Hellström <gunnar.hellstrom@ghaccess.se> Fri, 11 December 2020 09:54 UTC

Return-Path: <gunnar.hellstrom@ghaccess.se>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EE3893A0880 for <avt@ietfa.amsl.com>; Fri, 11 Dec 2020 01:54:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=egensajt.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4Ry8Img95_en for <avt@ietfa.amsl.com>; Fri, 11 Dec 2020 01:54:36 -0800 (PST)
Received: from smtp.egensajt.se (smtp.egensajt.se [193.42.159.246]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7AD573A0876 for <avt@ietf.org>; Fri, 11 Dec 2020 01:54:34 -0800 (PST)
Received: from [192.168.2.137] (h77-53-37-81.cust.a3fiber.se [77.53.37.81]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: gunnar.hellstrom@ghaccess.se) by smtp.egensajt.se (Postfix) with ESMTPSA id B5E202003E; Fri, 11 Dec 2020 10:54:31 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=egensajt.se; s=dkim; t=1607680471; bh=yhVB7gpnzpkqQpGgrNUA1QoQYmCHNVgpHQT8XqNVWag=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=eaKQnWjKlr+qgpi3hteemJCL/mhexLGfpDjaO/wBC71BjC603NvqIcBMWXTUcl+HH YZjQu2poVO63cy8Z1O8cxdCbsOK9NQU9hb5w9PKzkfiXmQgGNeorVwDLI6Ff+BysCK u81YqjWWuDGe0BStdeRXUikGyPEtppbGYWiSSGPY=
From: Gunnar Hellström <gunnar.hellstrom@ghaccess.se>
To: Bernard Aboba <bernard.aboba@gmail.com>
Cc: IETF AVTCore WG <avt@ietf.org>
References: <CAOW+2duJwBizifn94qcRfpZ6cqRjRVyueyoofox0AWjkcJm02g@mail.gmail.com> <68866CAE-C81B-4C23-9DB5-CA8B57C1E3DC@brianrosen.net> <CAOW+2dt88EX1bj27zurn7XX-Ct24CFi_5SRyGObvGjwEDRuR_A@mail.gmail.com> <CAOW+2dtwOEG6=OEQarQTxQKnUkBAKCCArQXZQUP_QTTbALK1iw@mail.gmail.com> <58a73f79-60ad-442c-3162-d2cd52f025fe@ghaccess.se> <b8784aa8-a5ef-544c-7315-c64767211387@ghaccess.se> <CAOW+2duqBWJrq8ihp9Of+4JfYcgJeAJw8tG9T7QDn_6kxC3hOA@mail.gmail.com> <48539ee9-ba2f-ff8e-b71f-f3cf64b5d59d@ghaccess.se> <bd1ee906-b759-028f-9108-0109fce490c7@ghaccess.se> <8d7f249a-3e22-602f-8d3a-ef339146d2dd@ghaccess.se>
Message-ID: <a69770ad-3793-47cb-0aaa-db21ceec9da4@ghaccess.se>
Date: Fri, 11 Dec 2020 10:54:25 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1
MIME-Version: 1.0
In-Reply-To: <8d7f249a-3e22-602f-8d3a-ef339146d2dd@ghaccess.se>
Content-Type: multipart/alternative; boundary="------------39B098ED45C23B96D25EE9E4"
Content-Language: sv
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/Xb4nVNy1h44dpI-1eQ62IFO1Vio>
Subject: Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text" - SFU
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Dec 2020 09:54:41 -0000

Bernard,

I have prepared modifications to satisfy your comments about Keep-alive 
and SFU.

For keep-alive, section 3.3 is proposed to be:
"3.3.  Keep-alive

    After that, the transmitter SHALL send keep-alive traffic to the
    receiver(s) at regular intervals when no other traffic has occurred
    during that interval, if that is decided for the actual connection.
    It is RECOMMENDED to use the keep-alive solution from [RFC6263].  The
    consent check of [RFC7675] is a possible alternative if it is used
    anyway for other reasons."

For SFU, I propose to have the section about considered solutions in 
section 1.2 on SFU to be:

"

    Multiple RTP streams, one per participant.
       One RTP stream per source would be sent in the same RTP session
       with the "text/red" format.  From some points of view, use of
       multiple RTP streams, one for each source, sent in the same RTP
       session would be efficient, and would use exactly the same packet
       format as [RFC4103] and the same payload type.  A couple of
       relevant scenarios using multiple RTP-streams are specified in
       "RTP Topologies" [RFC7667].  One possibility of special interest
       is the Selective Forwarding Middlebox (SFM) topology specified in
       RFC 7667 section 3.7 that could enable end to end encryption.  In
       contrast to audio and video, real-time text is only transmitted
       when the users actually transmit information.  Thus an SFM
       solution would not need to exclude any party from transmission
       under normal conditions.  In order to allow the mixer to convey
       the packets with the payload preserved and encrypted, an SFM
       solution would need to act on some specific characteristics of the
       "text/red" format.  The redundancy headers are part of the
       payload, so the receiver would need to just assume that the
       payload type number in the redundancy header is for "text/t140".
       The characters per second parameter (CPS) would need to act per
       stream.  The relation between the SSRC and the source would need
       to be conveyed in some specified way, e.g. in the CSRC. Recovery
       and loss detection would preferably be based on sequence number
       gap detection.  Thus sequence number gaps in the incoming stream
       to the mixer would need to be reflected in the stream to the
       participant and no new gaps created by the mixer.  However, the
       RTP implementation in both mixers and endpoints need to support
       multiple streams in the same RTP session in order to use this
       mechanism.  For best deployment opportunity, it should be possible
       to upgrade existing endpoint solutions to be multi-party aware
       with a reasonable effort.  There is currently a lack of support
       for multi-stream RTP in certain implementation technologies.  This
       fact made this solution only briefly mentioned in this document as
       an option for further study.

"

(I wonder if I should delete "in the same RTP session", because it seems 
that the SFM description in RFC 7667 does not specify that the streams 
to the participants must be in the same session.)   ????

In order for the specified negotiation to not preclude addition of other 
multi-party methods, e.g. an SFU based, I suggest to move the 
offer/answer section from section 3 to 2, and word it so that other 
multi-party methods are possible.

I also propose to change the name of the main method to be the 
"RTP-mixer based multi-party method"

With that, I hope that I have answered all WGLC comments to 
satisfaction, and I have a corresponding version -12 of the draft prepared.

Regards

Gunnar

-- 
Gunnar Hellström
GHAccess
gunnar.hellstrom@ghaccess.se

Den 2020-12-10 kl. 17:49, skrev Gunnar Hellström:
>
> After looking a bit more on the multi-stream issue, I want to reduce 
> the ambition I announced below.
>
> We should not put too many varied methods in one draft. It will be too 
> complicated to require or state compliance to them then.
>
> So, I want to back off to mention the possible benefits of 
> multi-stream RTP and SFU. I want to indicate that it would be possible 
> to negotiate between the specified RTP-mixer method and other 
> multi-party aware methods, but say that it is for further study to 
> specify any such method in more detail.
>
> This conclusion also matches the need from referencing from NENA 
> specifications.
>
>
> /Gunnar
>
> Den 2020-12-09 kl. 20:48, skrev Gunnar Hellström:
>>
>> Bernard,
>>
>> I am editing the draft to include a specification of use of multiple 
>> RTP streams in one RTP session to each participant. There seems to be 
>> so many ways to negotiate and use such topology, so I do not think I 
>> should dictate to use one specific. Therefore I suggest that the 
>> sections about that topology is made general and need exact 
>> specification at implementation time.
>>
>> The idea with the other methods (the RTP-mixer method and the method 
>> for multi-party unaware endpoints) are intended to be sufficiently 
>> specified to result in interoperability, e.g. between emergency 
>> services and mobile services.
>>
>> I do not see the same possibility for the multi-stream method. It is 
>> likely that RTT will be included in an environment where the other 
>> media and the application calls for use of specific RTP topologies 
>> and specific ways to negotiate the use of it. Therefore I intend to 
>> write a short description with  general statements and references to 
>> which considerations from the RTP-mixer method should be valid also 
>> for the RTP multi-stream methods.
>>
>> I intend to also check and adjust the use of terms as "multi-party 
>> aware" to make clear when it means one or both multi-party aware 
>> methods.
>>
>> A couple of further notes inline. I got some help to understand more;
>>
>> Den 2020-12-08 kl. 23:45, skrev Gunnar Hellström:
>>>
>>> Bernard, some proposed conclusions below:
>>>
>>> Den 2020-12-08 kl. 20:10, skrev Bernard Aboba:
>>>> Gunnar asked:
>>>>
>>>> "Questions:
>>>>
>>>> 1. Is 3GPP TS 26.114 Annex T an example of sdp for the kind of SFU
>>>> described in RFC 7667 section 3.7, with one media section per stream in
>>>> a lower number of streams than participants? If not, how does a normal
>>>> sdp look for an SFU?
>>>>
>>>> [BA] Annex T appears to use payload multiplexing (one payload type 
>>>> per stream) for simulcast, which is not commonly deployed.
>>>> It is more common to use SSRC multiplexing (one SSRC per stream), 
>>>> all on the same payload type.  Here are examples of what simulcast 
>>>> SDP looks like in various browsers:
>>>> A playground for Simulcast without an SFU - webrtcHacks 
>>>> <https://webrtchacks.com/a-playground-for-simulcast-without-an-sfu/>
>>>>
>>> [GH] Thanks. That example was with Bundle and WbRTC. If that is the 
>>> environment you imagine, we will not be allowed to use RTP transport 
>>> for RTT. Instead it is the data channel usage for RTT in RFC 8865. 
>>> Only audio and video are using RTP.
>>>
>>> But I assume that there could be use of the SFU technology also for 
>>> traditional SIP. What we need to consider for the current draft is 
>>> then if an sdp indicating capability for use of an SFU or some other 
>>> form of multi-stream RTP will have something characteristic for that 
>>> capability. E.g. an m-section for each stream.
>>>
>>> I think it is quite likely that it must be visible in the sdp that a 
>>> party has multi-stream RTP capability.
>>>
>>> Then the "rtt-mix" attribute is sufficient to indicate capability 
>>> for the RTP-mixer based multi-party solution, and we only need to 
>>> tell that other solutions may be implemented.
>>>
>>> I need to check wording so that the draft does not preclude other 
>>> multi-party RTP based RTT solutions.
>>>
>> [GH]I will just assume that it is visible in the sdp that a 
>> multi-party session with RTP multi-stream topology is offered, or a 
>> capability to use it. And that a party who is not capable of that 
>> method can indicate that fact in an offer or answer. I will not say how.
>>>
>>>>
>>>>
>>>> 2. RFC 7667 section 3.7 seems to tell that the sequence number is
>>>> regenerated in sequence by the middlebox. Is that right? That would 
>>>> make
>>>> it not possible to detect if total loss of text occurred. Recovery can
>>>> be done based on timestamp analysis, but not detection of unrecoverable
>>>> loss.
>>>>
>>>> [BA] The sequence number needs to be regenerated when an SFU 
>>>> switches between simulcast streams sent by a participant, sending 
>>>> only a single stream onward to the viewer.  This is needed since 
>>>> each simulcast stream has its own sequence number space, and 
>>>> endpoints typically do not support receiving simulcast.
>>>
>>> There are regulatory requirements to detect and mark cases of 
>>> suspected text loss. So, if there is packet loss from a participant 
>>> to the middlebox, that sequence gap must not be just ignored and 
>>> resulting in an unbroken sequence number series after the SFU. Can 
>>> the sequence number from the source be copied to the transmission 
>>> from the SFU?
>>>
>>> Does not video also have that need?
>>>
>> [GH]During active transmission from one source, the sequence number 
>> needs to step up at the same rate as the sequence numbers of incoming 
>> packets, so that gaps from the leg between the transmitter to the 
>> middlebox can be revealed and acted on by the receiving party. So, 
>> the offset adjustment is only intended to make the sequence look 
>> consequtive even if there was a pause in transmission of a source. 
>> Then it works for total loss detection for RTT.
>>>
>>>>
>>>> However an RTT sender would not send simulcast (e.g. multiple 
>>>> versions of the same text stream).  So the SFU would potentially 
>>>> forward only a single stream per participant, with no CSRCs. The 
>>>> question is whether a "multi-party aware endpoint" would be 
>>>> prepared to receive multiple SSRCs (and no CSRCs), with each SSRC 
>>>> representing a single RTT source.
>>> In order to have interoperability, the "rtt-mix" attribute must only 
>>> mean the RTP-mixer based solution. Both mixers and endpoints could 
>>> also have multi-stream capability, but the intention to open a 
>>> multiparty session with multi-stream need to have its own 
>>> characteristics.
>> [GH]I will differentiate the terms. Most likely so that multi-party 
>> aware means any of the two methods RTP-mixer and multi-stream RTP. 
>> And then specific wording for each method when there is talk about 
>> only one method.
>>>>
>>>> 3. How is the source conveyed? There is apparently an SSRC to SSRC
>>>> mapping taking place. Is there a lower number of SSRCs used in
>>>> transmission from the middlebox than the total number of participants?
>>>> If so, how is switch of source indicated? By RTCP?
>>>>
>>>> [BA] I am suggesting a scenario where there would be one SSRC per 
>>>> source.  The SFU could perhaps replace the SSRC received from a 
>>>> participant with an SSRC of its own, but each text stream would 
>>>> have a unique SSRC up to the maximum participant limit.
>> [GH]But the mapping between NAME and SSRC is lost when RTCP SDES is 
>> just sent along with replaced SSRC.  But I assume that there are 
>> methods to sort that out by some signaling that I dont know. So since 
>> I will not be specific I will not bother about such detail.
>>>>
>>>>
>>> RTT is specific in that no data flows when the user does not send 
>>> any information. In the other media there is always contents. 
>> [GH] So in a way it can be said to be "SFU by nature".   I might 
>> mention that. It eases switching. In the normal case no switching 
>> will be needed. All simultaneous senders can likely always be allowed 
>> to transmit on to the receivers.
>>>
>>>>
>>>> If we find that it is likely possible to use the SFU, then I would
>>>> anyway think that we just include general hints of its use in this
>>>> draft, and let further work detail it. Would that be an acceptable
>>>> conclusion?
>> [GH]I am continuing on that line.
>>>>
>>>> [BA] My question was whether a "multi-party aware" endpoint could 
>>>> support an SFU (SSRCs but no CSRCs), and how the negotiation would 
>>>> work (whether it was supported or not).  If it can work and you can 
>>>> indicate how it would be done (or how it would fail if it cannot be 
>>>> done), that would be fine with me.
>>>> "
>>>
>>> I need another example of sdp for an SFU session establishment in a 
>>> traditional SIP environment to be able to answer.
>>>
>> [GH] I changed my mind. My general approach does not need any more 
>> sdp examples.
>>>
>> Thanks,
>>
>> Gunnar
>>
>>>
>>> Thanks,
>>>
>>> Gunnar
>>>
>>>>
>>>> On Mon, Dec 7, 2020 at 3:03 PM Gunnar Hellström 
>>>> <gunnar.hellstrom@ghaccess.se 
>>>> <mailto:gunnar.hellstrom@ghaccess.se>> wrote:
>>>>
>>>>     Bernard, I indicated that I would like to discuss the SFU.
>>>>
>>>>     You said: "Also, some clarifications of desired behavior by
>>>>     SFUs might
>>>>     be helpful." and mentioned SFU in a qouple of other comments.
>>>>
>>>>     I like the idea to have a possibility to use E2E encryption. I
>>>>     also
>>>>     imagined that with separate RTP streams it would be possible to
>>>>     better
>>>>     detect if unrecoverable loss of text appears or not after some
>>>>     lost
>>>>     packets. But I get unsure if that is possible when I read
>>>>     section 3.7 of
>>>>     RFC 7667.
>>>>
>>>>     I need to understand the SFU description in RFC 7667 better to
>>>>     judge if
>>>>     it really can carry RFC 2198 coded text with maintained
>>>>     possibility to
>>>>     detect unrecoverable loss.
>>>>
>>>>     Before IETF 108, the draft had both an RTP multi-stream method
>>>>     and an
>>>>     RTP-single stream mixer method ( and also the method for
>>>>     multi-party
>>>>     unaware endpoints ). Then we decided to move on without the
>>>>     multi-stream
>>>>     method. It is a bit inconsistent to bring it in again, but I
>>>>     can see the
>>>>     possibility to arrange for E2E encryption to be a good reason. The
>>>>     earlier multi-stream method was based on the RTP-translator and
>>>>     required
>>>>     the mixer to recover from packet loss and insertion of marks for
>>>>     unrecoverable loss, and therefore could not be used for E2E
>>>>     encryption.
>>>>     Therefore it might be valid to rethink the earlier decision.
>>>>
>>>>     Questions:
>>>>
>>>>     1. Is 3GPP TS 26.114 Annex T an example of sdp for the kind of SFU
>>>>     described in RFC 7667 section 3.7, with one media section per
>>>>     stream in
>>>>     a lower number of streams than participants? If not, how does a
>>>>     normal
>>>>     sdp look for an SFU?
>>>>
>>>>     2. RFC 7667 section 3.7 seems to tell that the sequence number is
>>>>     regenerated in sequence by the middlebox. Is that right? That
>>>>     would make
>>>>     it not possible to detect if total loss of text occurred.
>>>>     Recovery can
>>>>     be done based on timestamp analysis, but not detection of
>>>>     unrecoverable
>>>>     loss.
>>>>
>>>>     3. How is the source conveyed? There is apparently an SSRC to SSRC
>>>>     mapping taking place. Is there a lower number of SSRCs used in
>>>>     transmission from the middlebox than the total number of
>>>>     participants?
>>>>     If so, how is switch of source indicated? By RTCP?
>>>>
>>>>
>>>>     If we find that it is likely possible to use the SFU, then I would
>>>>     anyway think that we just include general hints of its use in this
>>>>     draft, and let further work detail it. Would that be an acceptable
>>>>     conclusion?
>>>>
>>>>
>>>>     Regards
>>>>
>>>>     Gunnar
>>>>
>>>>     -- 
>>>>
>>>>     Gunnar Hellström
>>>>     GHAccess
>>>>     gunnar.hellstrom@ghaccess.se <mailto:gunnar.hellstrom@ghaccess.se>
>>>>
>>> -- 
>>> Gunnar Hellström
>>> GHAccess
>>> gunnar.hellstrom@ghaccess.se
>> -- 
>> Gunnar Hellström
>> GHAccess
>> gunnar.hellstrom@ghaccess.se
> -- 
> Gunnar Hellström
> GHAccess
> gunnar.hellstrom@ghaccess.se

-- 
Gunnar Hellström
GHAccess
gunnar.hellstrom@ghaccess.se