Re: [AVTCORE] Opsdir last call review of draft-ietf-avtcore-multi-party-rtt-mix-14

Gunnar Hellström <> Tue, 27 April 2021 21:58 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id ED91D3A228A; Tue, 27 Apr 2021 14:58:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id BYkosh84kn7r; Tue, 27 Apr 2021 14:58:46 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 86D533A2251; Tue, 27 Apr 2021 14:58:42 -0700 (PDT)
Received: from [] ( []) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id E032920F60; Tue, 27 Apr 2021 23:58:39 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dkim; t=1619560720; bh=N5GjMi7WEWqPIVKeSA4FoaAhMTjHQT8RRNQ08f1NHlo=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=iZa4D7EK59r+1Vmb4KIccW2QNXZb4xLn5+ybD2sQuW1YGC+i8Z8eNe6OTVn1K0SFP NZL6xm2YkUpeGbLxqzC2j1NpKFreWBBcgRs9xbop2I3nCRM3jihO9kiJUBQqUxZ67B rYOxynTK5hPs7y93eD9fVWbJnJvUBNGxSFjED7x4=
To: =?UTF-8?B?SsO8cmdlbiBTY2jDtm53w6RsZGVy?= <>,
References: <>
From: =?UTF-8?Q?Gunnar_Hellstr=c3=b6m?= <>
Message-ID: <>
Date: Tue, 27 Apr 2021 23:58:36 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: sv
Archived-At: <>
Subject: Re: [AVTCORE] Opsdir last call review of draft-ietf-avtcore-multi-party-rtt-mix-14
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 27 Apr 2021 21:58:57 -0000

Thank you for your review.

I will provide initial reactions here, and follow up with more exact 
proposals for changes.

Den 2021-04-27 kl. 12:25, skrev Jürgen Schönwälder via Datatracker:
> Reviewer: Jürgen Schönwälder
> Review result: Has Nits
> I am by no means an expert in this area so please take this into
> account while reading my comments...
> Content comments:
> * The document assumes "human" communication, i.e., where text
>    originates at a speed of a human and politeness is used to resolve
>    concurrency conflicts. This seems to be a fair assumption for the
>    considered use cases but what happens if this assumption is not met?
>    Can systems or RTP mixers detect and handle such situations
>    gracefully or is the idea that any resulting "jerkiness" must be
>    accepted if senders misbehave?

[GH] The receivers are expected to declare their reception speed 
capability in characters per second (over a 10 second period). A high 
capability will allow smooth flow of text from many participants 
simultaneously. The congestion section 9 tells what can be done if the 
load gets too high. The mixer can then discard text, and if it has any 
means to detect who is the main contributer, it can avid to discard text 
from that participant. (e.g. by the sdp "content" attribute.

This is of course not nice. But it is equally not nice to try to hear 
any specific voice in a mixed audio channel with many participants. And 
same with video in cases when the video has real information.

So we are in good company with the other media in the problem that there 
are no really good solutions to an information overload situation.

In most cases the typing participants will send a sentence or two and 
then stop and read or listen to the others. In such situations the 
overload will be sorted out after a short while even without the 
participants being very polite.

Do you want me to elaborate more about this in the congestion chapter 9?

> * The solution does not provide end-to-end security since the mixer
>    must be trusted to have access to the texts in order do the mixing.
>    This is mentioned in the security considerations and in section 2
>    where alternatives are considered. The reason to not select a
>    solution providing end-to-end security is give in section 1.2. Is
>    there work planned to address this issue, i.e., to complement this
>    solution with a solution providing end-to-end security?

[GH] There is another individual draft 
"draft-hellstrom-avtcore-multi-party-rtt-solutions", intended to 
document design choices behind the reviewed draft. It discusses the 
end-to-end security topic among other things but does not solve it. If 
there are requests for it, that work could be continued to provide 
specific solutions. But I would like to hear the request first.

Maybe it is more urgent to specify how RFC 8865 "real-time text in 
WebRTC" can be used in a multi-party setting with end-to-end security.

I would prefer to let the discussion of the topic in the reviewed draft 
be sufficient fo now.

> * Perhaps the recommendation in section 4.2.6 that the mixing method
>    for multi-party unaware endpoints is not RECOMMENDED to be used
>    should be repeated in the security considerations? It seems there
>    are serious limitations, in particular also related to the creation
>    of a presentation that can make it impossible to detect masquerade
>    attacks. Yes, masquerading is mentioned but from an outside security
>    point of view it feels like there was a strong security solution
>    that was discarded due to lack of implementation support, there is a
>    somewhat OK solution (but not able to provide end-to-end security),
>    and there is a pretty ugly solution to accommodate endpoints with no
>    support for the other solution. If this is a fair summary, perhaps
>    explaining this clearly in the security considerations would be a
>    good thing.
[GH] Yes, good point. I will compose a proposal.
> * I am confused about Figures 5 and 6 since the mixed identities of
>    the sources are once shown in square brackets and once in
>    parenthesis. Are labels like [Alice] or [Bob] not inserted by the
>    mixer? If so, why would the format on the endpoint be different? Is
>    the idea that endpoints try to parse the mixed text in order to
>    render it differently? Or was the idea to show that different mixers
>    can use different styles to generate labels, i.e., I should not
>    really compare Figure 5 and 6?

[GH] The figures should be possible to compare. And, yes, I have caused 
confusion by letting the mixer create labels with brackets in figure 5 
but with parentheses in figure 6. In figure 6 the brackets are inserted 
by the receiving terminal in a way that has become quite common in RTT 
implementations, but the parentheses come from the mixer. Alice is the 
local user. Her text is merged locally and therefore get the label 
assigned locally.

I will change so that the type of label framing is consistent and insert 
some words about the labels and their framing.

> Editorial comments:
> * I suggest to cite [T140] when you first refer to it in the
>    Introduction:
>    OLD
>     A requirement related to multi-party sessions from the presentation
>     level standard T.140 for real-time text is: "The display of text from
>    NEW
>     A requirement related to multi-party sessions from the presentation
>     level standard T.140 [T140] for real-time text is: "The display of text from
[GH] In the previous review, I got a recommendation to delete many such 
standard names before the reference, but you are right that it would 
probably be good with using it once.
> * as defined -> are defined and missing full stop
>    OLD
>     The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
>     mixer, RTP-translator as defined in [RFC3550]
>    NEW
>     The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
>     mixer, RTP-translator are defined in [RFC3550].
[GH] Yes, will do.
> * Add reference(s) to WebRTC in the terminology section?
[GH] Yes, will do.



> _______________________________________________
> Audio/Video Transport Core Maintenance

Gunnar Hellström