Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text"

Gunnar Hellström <gunnar.hellstrom@ghaccess.se> Mon, 07 December 2020 14:55 UTC

Return-Path: <gunnar.hellstrom@ghaccess.se>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 546CC3A15B9 for <avt@ietfa.amsl.com>; Mon, 7 Dec 2020 06:55:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.799
X-Spam-Level:
X-Spam-Status: No, score=-1.799 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, TRACKER_ID=0.1, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=egensajt.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TZ_jrqobloEM for <avt@ietfa.amsl.com>; Mon, 7 Dec 2020 06:55:10 -0800 (PST)
Received: from smtp.egensajt.se (smtp.egensajt.se [194.68.80.251]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BEDAF3A1420 for <avt@ietf.org>; Mon, 7 Dec 2020 06:55:06 -0800 (PST)
Received: from [192.168.2.137] (h77-53-37-81.cust.a3fiber.se [77.53.37.81]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: gunnar.hellstrom@ghaccess.se) by smtp.egensajt.se (Postfix) with ESMTPSA id 57AED2015D; Mon, 7 Dec 2020 15:55:04 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=egensajt.se; s=dkim; t=1607352904; bh=OrEvzCnYZE/irO6NY9s03VDr3nAXO0ZlFN6U07LFkoQ=; h=Subject:To:References:From:Date:In-Reply-To:From; b=PxvrYtnOVvPQqr9LycUZcwJ8i/GB02C96JxFEIX1RWH074fLmKIAclrnEngoPBZF1 Ecx/PTFzyNNFwsy+dSikryMBVjhiMnfbEmRoLRBL6j22DmuswXKqfZBNrfWVfduTp4 497lfZW3NujOt3iITYwmoIePF2Kxgh+hZMvJGGJ0=
To: Bernard Aboba <bernard.aboba@gmail.com>, IETF AVTCore WG <avt@ietf.org>
References: <CAOW+2duJwBizifn94qcRfpZ6cqRjRVyueyoofox0AWjkcJm02g@mail.gmail.com> <68866CAE-C81B-4C23-9DB5-CA8B57C1E3DC@brianrosen.net> <CAOW+2dt88EX1bj27zurn7XX-Ct24CFi_5SRyGObvGjwEDRuR_A@mail.gmail.com> <CAOW+2dtwOEG6=OEQarQTxQKnUkBAKCCArQXZQUP_QTTbALK1iw@mail.gmail.com>
From: Gunnar Hellström <gunnar.hellstrom@ghaccess.se>
Message-ID: <58a73f79-60ad-442c-3162-d2cd52f025fe@ghaccess.se>
Date: Mon, 07 Dec 2020 15:55:03 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1
MIME-Version: 1.0
In-Reply-To: <CAOW+2dtwOEG6=OEQarQTxQKnUkBAKCCArQXZQUP_QTTbALK1iw@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------AD28AAA89BA5AB2DAE04E8D5"
Content-Language: sv
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/2HhJ4n_cpFpE33tFz20J-63BIX4>
Subject: Re: [AVTCORE] WG Last Call: "RTP-mixer formatting of multi-party Real-time text"
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Dec 2020 14:55:25 -0000

Bernard, thanks for comments,

Den 2020-12-04 kl. 23:54, skrev Bernard Aboba:
> Here are my comments.
>
> Overall, I would suggest placing terminology and some overview 
> material earlier in the document.  Also,
> some clarifications of desired behavior by SFUs might be helpful.
>
[GH] Done and SFU briefly included as a possible option. I may need to 
check consistency after that.

111111111111111111111111111111111111111111111111111111111111

> Abstract
>
>
>    Enhancements for RFC 4103 real-time text mixing is provided in this
>
> [BA] "is provided" -> "are provided"

[GH] Accepted

222222222222222222222222222222222222222222222222222222222222222

>
>    document, suitable for a centralized conference model that enables
>    source identification and source switching.  The intended use is for
>
> [BA] "source switching" is not clear in this context. The document
> refers to the mixing model in which the conferencing server sends
> a single SSRC to participants with contributors in the CSRC field.
> But an SFU will just forward packets to a participant, who will see
> multiple SSRCs (and no CSRCs). Since the SSRCs may be modified, this
> can also be construed as "source switching".

[GH] yes, "source switching" is used in a number of places to mean how 
rapidly the mixer can send on a packet received from one source when it 
is currently sending packets for another source. Various transmission 
mechanisms result in different time delay. RFC 4103 has requirements 
intended to limit the load on network and devices, by requiring that the 
packet interval should not be less than 300 ms. Three packets must also 
be sent with the last text from one source before a packet from another 
source can be sent. So, it would take 900 ms from receiving new text 
from another than the current source before it can be sent. The current 
draft reduces that time to 100 ms.

The 100 ms minimum packet interval is set in order to give a reasonable 
balance between  load and performance. Do you find that to be a suitable 
value? If it is, also an SFU would be specified to obey that limitation.

I propose to modify wording where "source switching" is used.

In the abstract,  I suggest changing

"but the mixer source switching performance is limited when using the 
default transmission ..."

to

"but the performance of the mixer when giving turns for the different 
sources to transmit is limited when using the default transmission..."

In next paragraph in the abstract, I suggest to change from

" enables
    source identification and source switching."

to

"enables source identification and rapidly interleaved transmission of 
text from different sources."

In last paragraph in 1., I suggest to change from

"rules for source indications and source switching."

to

"rules for source indications and interleaving of text from different 
sources."

In 1.1 I suggest to change from

"Source switching"

to

"Transmission of packets with text from different sources"

In 4.2, I suggest to change from

"the time for source switching is depending on"

to

"the time for switching from transmission of text from one source to 
transmission of text from another source is depending on"

By that all occurrencies of "source switching" are replaced.

3333333333333333333333333333333333333333333333333333


>
>    real-time text mixers and multi-party-aware participant endpoints.
>
> [BA] It is not clear to me what "multi-party-aware" means here. Do you
> mean that the endpoint can distinguish between sources based on SSRC, or
> CSRC, or both?

[GH] That first sentence does not require any specific technology. Next 
one gives a hint of the technology used in this draft. We need the term, 
so I will check that it is defined, but I suggest to replace it here, from

  "multi-party-aware participant endpoints."

to

"participant endpoints capable of providing an efficient presentation or 
other treatment of a multi-party real-time text session"

444444444444444444444444444444444444444444444444444444444444444


>
>    The specified mechanism build on the standard use of the CSRC list in
>
> [BA] "build" -> "builds"

[GH]Accepted

55555555555555555555555555555555555555555555555555555555555555

>
>    the RTP packet for source identification.  The method makes use of
>    the same "text/t140" and "text/red" formats as for two-party
>    sessions.
>
>    A capability exchange is specified so that it can be verified that a
>    participant can handle the multi-party coded real-time text stream.
>    The capability is indicated by use of a media attribute "rtt-mixer".
>
>    The document updates RFC 4103[RFC4103]
>
>    A specifications of how a mixer can format text for the case when the
>
> [BA] "A specifications" -> "Specification"
>
[GH]Accepted

6666666666666666666666666666666666666666666666666666666

>
> 1.  Introduction
>
>    RFC 4103[RFC4103] specifies use of RFC 3550 RTP [RFC3550] for
>
> [BA] A better citation format is "RTP Payload for Text Conversation" 
> [RFC4103]
>
[GH]Accepted

777777777777777777777777777777777777777777777777777777777

>    transmission of real-time text (RTT) and the "text/t140" format.  It
>    also specifies a redundancy format "text/red" for increased
>    robustness.  RFC 4102 [RFC4102] registers the "text/red" format.
>    Regional regulatory requirements specify provision of real-time text
>    in multi-party calls.
>
> [BA] Reference?

[GH] Brian asked me to delete the reference to regulatory requirements. 
I am checking with the origin of the sentence if I should accept to 
delete it before I create the references.

Standards corresponding to the references are e.g.

ETSI EN 301 549

and NENA STA 010

If you really want the references to the regulatory requirements behind 
these standards, I would need to reference the European Accessibility 
Act and the European Public Procurement directive and some more, but I 
think that is very unusual in IETF documents and should not be done.

8888888888888888888888888888888888888888888888888

>
>    Another requirement is that the mixing procedure must not introduce
>    delays in the text streams that are experienced disturbing the real-
>
> [BA] "experienced" -> "experienced,"

[GH] I suggest instead to change to  "experienced to be disturbing the 
real-time experience of the receiving users."
btw is "of" good, or should it be "for" ?

999999999999999999999999999999999999999999999999999999

>
>    time experience of the receiving users.
>
>
>    Real-time text mixers for multi-party sessions therefore need to
>    insert the source of each transmitted group of text from a conference
>    participant so that the text can be transmitted interleaved with text
>    groups from different sources in the rate they are created.  This
>    enables the text groups to be presented by endpoints in suitable
>    grouping with other text from the same source. The presentation can
>    then be arranged so that text from different sources can be presented
>    in real-time and easily read while it is possible for a reading user
>    to also perceive approximately when the text was created in real time
>    by the different parties.  The transmission and mixing is intended to
>    be done in a general way so that presentation can be arranged in a
>    layout decided by the endpoint.
>
> [BA] By "insert the source" I assume that you mean "indicate the 
> contributing
> SSRC within the CSRC field", correct?

[GH]Yes, but I wanted to state the functional requirements first here 
before talking about how it is done.  But I see that the wording is a 
bit influenced by earlier versions of the draft. The "therefore" is 
confusing. I suggest changing from

"Real-time text mixers for multi-party sessions therefore need to insert the
source of each transmitted group of text from a conference participant 
so that the text can be transmitted interleaved with text groups from 
different sources .."

to

"Real-time text mixers for multi-party sessions need to include the
source with each transmitted group of text from a conference participant 
so that the text can be transmitted interleaved with text groups from 
different sources..."

1010101010101010101010101010101010101010101010

>
>    There are existing implementations of RFC 4103 without the updates
>    from this document.  These will not be able to receive and present
>    real-time text mixed for multi-party aware endpoints.
>
> [BA] By "implementations", you mean "conferencing server 
> implementations", correct?
> Presumably these implementations do not support mixing, so how do they 
> operate?

[GH]No, I mean endpoint implementations.  The mixer and the endpoint 
both need to use the specification in the draft.

I suggest to change to:

"There are existing implementations of RFC 4103 in endpoints without the 
updates"

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

>
>    A negotiation mechanism is therefore needed for verification if the
>    parties are able to handle a multi-party coded stream and agreeing on
>    using that method.
>
> [BA] Can you clarify what "multi-party coded stream" means, exactly? Does
> this imply a single SSRC with multiple CSRCs? Or would multiple SSRCs 
> without
> CSRCs also qualify? It would perhaps help to have the terminology 
> section moved
> up prior to

[GH]At this point it is still a functional requirement. The selected 
solution comes in next section.

To be more clear, I suggest to change

from:

"   A negotiation mechanism is therefore needed for verification if the
    parties are able to handle a multi-party coded stream and agreeing on
    using that method."

to:

"A negotiation mechanism is therefore needed for verification if the 
parties are able to handle a common method for multi-party transmission 
and agreeing on using that method."

1212121212121212121212121212121212121212121212121212

>
>    A fall-back mixing procedure is also needed for cases when the
>    negotiation result indicates that a receiving endpoint is not capable
>    of handling the mixed format.  This method is called the mixing
>    procedure for multi-party unaware endpoints.  The fall-back method is
>    naturally not expected to meet all performance requirements placed on
>    the mixing procedure for multi-party aware endpoints.
>
> [BA] A clarification of the behavior of "multi-party unaware 
> endpoints" would
> be helpful. Do these just ignore the SSRC and CSRC fields entirely?

[GH] Yes, on presentation level they are not specified to distinguish 
between SSRC or CSRC values.  I suggest this explanation:

"Multi-party unaware endpoints may present all received text as if it 
came from the same source regardless of any accompanying source 
indication coded in fields in the packet.  The fall-back method is 
called the mixing procedure for multi-party unaware endpoints."

131313131313131313131313131313131313131313131313

>
>    The document updates RFC 4103[RFC4103] by introducing an attribute
>    for indicating capability for the multi-party mixing case and rules
>    for source indications and source switching.
>
> 1.1.  Selected solution and considered alternative
>
>
> [BA] I think you need to insert the terminology section before this
> section.

[GH]Accepted

1414141414141414141414141414141414141414141

>
>    A number of alternatives were considered when searching an efficient
>    and easily implemented multi-party method for real-time text.  This
>    section explains a few of them briefly.
>
>
>
>
>
> Hellstrom                  Expires 26 May 2021           [Page 5]
> Internet-Draft    RTP-mixer format for multi-party RTT     November 2020
>
>
>    One RTP stream per source, sent in the same RTP session with
>    "text/red" format.
>       From some points of view, use of multiple RTP streams, one for
>       each source, sent in the same RTP session, called the RTP
>       translator model in [RFC3550], would be efficient, and use exactly
>
> [BA] I would suggest referencing "RTP Topologies" [RFC7667]. BTW, a single
> SSRC for each source could also be used in the Selective Forwarding
> Middlebox topology (Section 3.7), not just the Translator topology,
> described in RFC 7667 Section 3.5.

[GH] Yes, RFC 7667 is referenced in many places in 
draft-hellstrom-avtcore-multi-party-rtt-solutions when explaining 
different potential solutions. I include a reference also in this draft.

Proposed changed section in "selected solutions...."

"From some points of view, use of multiple RTP streams, one for each 
source, sent in the same RTP session would be efficient, and would use 
exactly the same packet format as [RFC4103] and the same payload type.
    A couple of relevant scenarios using multiple RTP-streams are 
specified in "RTP Topologies" [RFC7667]. One possibility is the 
RTP-translator model specified in RFC 7667 section 3.5 with the benefit 
of easily detected which source had unrecoverable loss. Another is the 
Selective Forwarding Middlebox topology specified in RFC 7667 section 
3.7 that could enable end to end encryption."

the conclusion in that section is also suggested to be changed to:

"This fact made this solution only briefly mentioned as an option in 
this document."

1515151515151515151515151515151515151515151515151515

>
>       the same packet format as [RFC4103], the same payload type and a
>       simple SDP declaration.  However, the RTP implementation in both
>       mixers and endpoints need to support multiple streams in the same
>       RTP session in order to use this mechanism. For best deployment
>       opportunity, it should be possible to upgrade existing endpoint
>       solutions to be multi-party aware with a reasonable effort.  There
>       is currently a lack of support for multi-stream RTP in certain
>       implementation technologies.  This fact made this solution not
>       selected for inclusion in this document.
>
> [BA] I would suggest that this limitation of existing implementations be
> introduced earlier in the document, perhaps summarizing it in terminology.
>
> Note that the use of a mixer, as opposed to an SFU, assumes that the 
> conferencing
> unit has access to the payload (e.g. the payload is not E2E encrypted).

[GH] It is urgent to implement the multi-party aware method. In earlier 
versions of the draft a method with one RTP stream per source has been 
specified, but investigations about the possibility to implement it has 
shown that a large part of the installed base of endpoints would require 
massive work to upgrade the RTP implementations to be able to support 
it. It is sad, because with the multi-stream solution the possibility to 
detect from which source loss of text occurred is better and as you 
point out, there may be a possibility to use end-to-end encryption. 
Another problem with the multi-stream solution is that it is not evident 
how it would be signaled with SDP in a simple way. The examples I have 
seen have used one m-section per possible stream. This is very different 
from the RTP-mixer solution, where one m-section is sufficient and new 
participants announced just by a conference notification.

However, it is important that RTT makes use of similar technologies as 
the other media, so if SFU is becoming popular with E2E encryption, it 
would be good to have specified how RTT can be implemented in that 
environment.

Do you know if SFU can be used in the SIP centralized conference model 
RFC 4353 et al? How would SDP look like for that case?

Since the discussions during the year have resulted in that we need the 
RTP-mixer based solution, I suggest that we continue on that track with 
the current draft, but just take a moment to consider if an SFU solution 
would need another negotiation by an SDP attribute. If so, we can change 
the SDP-attribute from a single one without values to an attribute with 
two values.

But maybe an SFU-capable endpoint can be recognized by some other 
characteristics. Will the middlebox make a re-Invite per added stream, 
and notice that the SFU-capable endpoint will answer positively on the 
addition of the stream? If so, no more negotiation is needed.

I think the abstract is a good place to tell that we concentrate on the 
RTP-mixer solution.

I suggest to add this in the abstract:

"Solutions using multiple RTP streams in the same RTP session were 
considered, as they could have some benefits over the RTP-mixer model, 
but the possibility to implement the solution in a wide range of 
existing RTP implementations made the RTP-mixer model be selected."

161616161616161616161616161616161616161616161616161616161

>
>    RTT transport in WebRTC
>       Transport of real-time text in the WebRTC technology is specified
>       to use the WebRTC data channel in
>       [I-D.ietf-mmusic-t140-usage-data-channel]. That spcification
>
> [BA] "spcification" -> "specification"

[GH] Accepted

17171717171717171717171717171717171717171717171717171717

>
>       contains a section briefly describing its use in multi-party
>       sessions.  The focus of this document is RTP transport.
>       Therefore, even if the WebRTC transport provides good multi-party
>       performance, it is just mentioned in this document in relation to
>       providing gateways with multi-party capabilities between RTP and
>       WebRTC technologies.
>
> 1.2.  Nomenclature
>
> [BA] "Nomenclature" -> "Terminology"?

[GH] Accepted
181818181818181818181818181818181818181818181818181818181
>    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
>    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
>    document are to be interpreted as described in [RFC2119].
>
>    The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
>    mixer, RTP-translator are explained in [RFC3550]
>
>    The term "T140block" is defined in RFC 4103 [RFC4103] to contain one
>    or more T.140 code elements.
>
>
>
>
> Hellstrom                  Expires 26 May 2021           [Page 7]
> Internet-Draft    RTP-mixer format for multi-party RTT     November 2020
>
>
>    "TTY" stands for a text telephone type used in North America.
>
>    "WebRTC" stands for web based communication specified by W3C and
>    IETF.
>
>    "DTLS-SRTP" stands for security specified in RFC 5764 [RFC5764].
>
>    "multi-party aware" stands for an endpoint receiving real-time text
>    from multiple sources through a common conference mixer being able to
>    present the text in real-time separated by source and presented so
>    that a user can get an impression of the approximate relative timing
>    of text from different parties.
>
>    "multi-party unaware" stands for an endpoint not itself being able to
>    separate text from different sources when received through a common
>    conference mixer.
>
> [BA] I would suggest that this section be moved up earlier in the 
> document,
> such as Section 1.1.
> [GH] Accepted
> 19191919191919191919191919191919191919191919191
> 1.3.  Intended application
>
> 2.  Overview over the two specified solutions
> [BA] "over" -> "of"

[GH]Accepted

20202020202020202020202020202020202020202020202020

> [BA] Overall, this section feels like it should appear earlier in the
> document, since it provides an introduction to what it contains. Perhaps
> it should be placed just after terminology (suggested to be moved to 
> 1.1) and before the material currently in Section 1.1?

[GH] I think it has a good logical place where it is. The solutions are 
briefly mentioned in the abstract and the introduction, but here starts 
the real specification.

Possibly the "selected solutions and considered alternatives" could be 
shortened to get more rapidly to the technical core of the document.

No action at the moment.

21212121212121212121212121212121212121212121212

>
>    This section contains a brief introduction of the two methods
>    specified in this document.
>
> 3.1.  Offer/answer considerations
>
>    RFC 4103[RFC4103] specifies use of RFC 3550 RTP[RFC3550], and a
>    redundancy format "text/red" for increased robustness of real-time
>    text transmission.  This document updates RFC 4103[RFC4103] by
>    introducing a capability negotiation for handling multi-party real-
>    time text, a way to indicate the source of transmitted text, and
>    rules for efficient timing of the transmissions interleaved from
>    different sources.
>
>    The capability negotiation is based on use of the sdp media attribute
>    "rtt-mixer".
>
>    Both parties shall indicate their capability in a session setup or
>    modification, and evaluate the capability of the counterpart.
>
>    The syntax is as follows:
>       "a=rtt-mixer"
>
> [BA] Might be useful to explicitly state what happens if both offer 
> and answer don't include "a=rtt-mixer" and also
> how an SFU that forwards but doesn't mix should Offer or Answer.

[GH] I have seen an example of sdp exchange for SFU in 3GPP TS 26.114 
Annex T. Do you regard that typical, with multiple m-sections for the 
number of streams to support?

This might need more discussion for how to handle the SFU option in 
general, but I propose to include the following paragraphs in 3.2

   "      A party which has not successfully completed the negotiation 
of the "rtt-mixer" sdp media attribute MUST NOT transmit packets 
interleaved from different sources in the same RTP stream as specified 
in section 3. If the party is a mixer and did declare the "rtt-mixer" 
sdp media attribute, it SHOULD perform the procedure for multi-party 
unaware endpoints. If the party is not a mixer, it SHOULD transmit 
according to [RFC4103].

       Capability for another real-time text multi-party method, e.g. 
based on the Selective Forwarding RTP topology specified in [RFC7667] 
section 3.7 MAY be signaled in SDP simultaneously with the "rtt-mixer" 
sdp media attribute. The answering party SHOULD select which method to 
use and answer accordingly.
       "

2222222222222222222222222222222222222222222222222222222222222222222222222222

>
> 3.2.  Actions depending on capability negotiation result
>
>    A transmitting party SHALL send text according to the multi-party
>    format only when the negotiation for this method was successful and
>    when the CC field in the RTP packet is set to 1. In all other cases,
>    the packets SHALL be populated and interpreted as for a two-party
>    session.
>
>    A party which has negotiated the "rtt-mixer" sdp media attribute MUST
>    populate the CSRC-list and format the packets according to Section 3
>    if it acts as an rtp-mixer and sends multi-party text.
>
> [BA] Presumably "party" here only refers to the conferencing server, not
> participants, correct?

[GH] No, it is intended to mean for both. It says "If acting as a mixer" 
or so before all statements.

2323232323232323232323232323232323232323232323232323232323

>
>    A party which has negotiated the "rtt-mixer" sdp media attribute MUST
>    interpret the contents of the "CC" field the CSRC-list and the
>    packets according to Section 3 in received rtp packets in the
>    corresponding RTP stream.
>
>    A party not performing as a mixer MUST not include the CSRC list.
>
> [BA] "not performing as a mixer MUST not" -> "not acting as a mixer 
> MUST NOT"
>
[GH]Accepted

2424242424242424242424242424242424244242424424244242

>
> 3.5.  Keep-alive
>
>    After that, the transmitter SHALL send keep-alive traffic to the
>    receiver(s) at regular intervals when no other traffic has occurred
>    during that interval, if that is decided for the actual connection.
>    Recommendations for keep-alive can be found in [RFC6263].
>
> [BA] Reading this paragraph, I'm unclear whether it is advocating an
> alternative to the [RFC6263] recommendation (e.g. RTP/RTCP mux). Also
> note that RFC 7675 consent checks can also help here.

[GH] It is not meant to be advocating an alternative to RFC 6263. Do you 
suggest a change to "SHOULD use the recommendation from RFC 6263"? But 
at the same time you propose mentioning the RFC 7675 consent checks.

I suggest to change the last sentence to "It is RECOMMENDED to use the 
keep-alive solution from [RFC6263].", but I can accept proposals for how 
to also mention the consent check alternative.


(Some RTT implementations use transmission of BOM characters as 
keep-alive but that is not recommended, because it can cause false 
character loss indications. The initial transmission of BOM mentioned in 
3.4 is however important for opening RTP and NAT paths.)

252525252525252525252525252525252525252525525252525525525252

>
>
> 3.8.  Do not send received text to the originating source
>
>    Text received to a mixer from a participant SHOULD NOT be included in
>
> [BA] "received to" -> "sent to"
[GH] Already changed to "received by" as proposed by Brian
26262626262626262626262626262626262626262626626262662
>    transmission from the mixer to that participant.
>
> 3.21.  Security for session control and media
>
>    Security SHOULD be applied on both session control and media.  In
>
> [BA] By "Security" I assume you mean SIP over TLS for signalling and
> SRTP for media? It might be useful to say this explicitly (since E2E
> security is also possible).

[GH] I suggest to change the beginning of 3.21 to:

" Security SHOULD be applied by use of SIP over TLS by default according 
to "[RFC5630] section 3.1.3 on session control level and by default 
using DTLS-SRTP [RFC5764] on media level. "

2828282828282828282828282828282828282828282828282828

>
>    applications where legacy endpoints without security may exist, a
>    negotiation SHOULD be performed to decide if security by encryption
>    will be applied.  If no other security solution is mandated for the
>    application, then RFC 8643 OSRTP [RFC8643] SHOULD be applied to
>    negotiate SRTP media security with DTLS.  Most SDP examples below are
>    for simplicity expressed without the security additions.  The
>    principles (but not all details) for applying DTLS-SRTP [RFC5764]
>    security is shown in a couple of the following examples.
>
> 3.22.  SDP offer/answer examples
>
>    This sections shows some examples of SDP for session negotiation of
>    the real-time text media in SIP sessions.  Audio is usually provided
>    in the same session, and sometimes also video. The examples only
>    show the part of importance for the real-time text media.
>
>      Offer example for "text/red" format and multi-party support:
>
>            m=text 11000 RTP/AVP 100 98
>            a=rtpmap:98 t140/1000
>            a=rtpmap:100 red/1000
>            a=fmtp:100 98/98/98
>            a=rtt-mixer
>
> [BA] To be clear: an SFU that forwards but does not mix should Answer 
> without "a=rtt-mixer", even if the Offer indicates the ability to 
> support multiple streams per session?
> Also, an SFU should not Offer "a=rtt-mixer" if it doesn't support mixing?

[GH] I think we need to discuss the inclusion of the SFU based method 
more. How well shall we describe that method?  3GPP TS 26.114 Annex T 
seems to have good examples of sdp for SFU negotiation. Do you regard 
these example to be typical for SFU negotiation? How is the shift of 
source to use a stream in the SFU usually signaled?

As a start anyway, I suggest to include the following at the end of the 
first paragraph in 3.22.

"The examples relate to the single RTP stream mixing for multi-party 
aware endpoints and for multi-party unaware endpoints.

Multi-party RTT can also be provided through an Selective Forwarding 
Unit (SFU). In that case, the offer will have multiple m-sections for 
RTT, and an answer acknowledging the use of an SFU would accept a number 
of m-sections for RTT. The offer may contain also the "rtt-mixer" sdp 
media attribute for the main RTT media when the offeror has capability 
for both multi-party methods, while an answer, selecting to use SFU MUST 
not include the "rtt-mixer" sdp media attribute. "

292929292929292929292929292929292929292929292929292929

>
>       Answer example  from a multi-party capable device
>            m=text 14000 RTP/AVP 100 98
>            a=rtpmap:98 t140/1000
>            a=rtpmap:100 red/1000
>            a=fmtp:100 98/98/98
>            a=rtt-mixer
>
>       Offer example for "text/red" format including multi-party
>       and security:
>             a=fingerprint: (fingerprint1)
>             m=text 11000 RTP/AVP 100 98
>             a=rtpmap:98 t140/1000
>             a=rtpmap:100 red/1000
>             a=fmtp:100 98/98/98
>             a=rtt-mixer
>
>    The "fingerprint" is sufficient to offer DTLS-SRTP, with the media
>    line still indicating RTP/AVP.
>
>    Note: For brevity, the entire value of the SDP fingerprint attribute
>    is not shown in this and the following example.
>
>
Thanks,

Gunnar

> _______________________________________________
> Audio/Video Transport Core Maintenance
> avt@ietf.org
> https://www.ietf.org/mailman/listinfo/avt

-- 
Gunnar Hellström
GHAccess
gunnar.hellstrom@ghaccess.se