Re: [AVTCORE] Opsdir last call review of draft-ietf-avtcore-multi-party-rtt-mix-14

Gunnar Hellström <> Wed, 28 April 2021 11:44 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id F1D3B3A2693; Wed, 28 Apr 2021 04:44:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9k1f1bGzuwDY; Wed, 28 Apr 2021 04:44:27 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 860B03A2692; Wed, 28 Apr 2021 04:44:26 -0700 (PDT)
Received: from [] ( []) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id 29FFF201E8; Wed, 28 Apr 2021 13:44:23 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dkim; t=1619610263; bh=umlDMReyJhKt/uckuyUKpW0f3LPXnh4ITjNrj+L+wSk=; h=Subject:From:To:Cc:References:Date:In-Reply-To:From; b=CHa6fqP+0f/3qLOSXxW3C5YRNuFIS3O3AqGxZcIRDQC+t3xp/47fDX/+HFqkB+7de bsCntRxYhQf9lJsVzGVH/qDU9qTzKCK2MUnDQudAWPgBeNrWx3xLY03pQGyY0BYpvK aK90eV91h5aaGiV+1UQSq4zHUPq7hblhf0OwGlXQ=
From: =?UTF-8?Q?Gunnar_Hellstr=c3=b6m?= <>
To: =?UTF-8?B?SsO8cmdlbiBTY2jDtm53w6RsZGVy?= <>,
References: <> <>
Message-ID: <>
Date: Wed, 28 Apr 2021 13:44:20 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: sv
Archived-At: <>
Subject: Re: [AVTCORE] Opsdir last call review of draft-ietf-avtcore-multi-party-rtt-mix-14
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 28 Apr 2021 11:44:33 -0000


I have submitted a new version, acting on your comments.

The IETF datatracker status page for this draft is:

There is also an HTML version available at:

A diff from the previous version is available at:

Here is a summary of what I did:

Actions on review comments from Jurgen Schonwalder:

    A bit more about congestion situations and that they are expected to
    be very rare.

    Explanation of differences in security between the conference-aware
    and the conference-unaware case added in security section.

    Presentation examples with source labels made less confusing, and

    Reference to T.140 inserted at first mentioning of T.140.

    Reference to RFC 8825 inserted to explain WebRTC

    Nit in wording in terminology section adjusted.

I hope this satisfies your comments.


Den 2021-04-27 kl. 23:58, skrev Gunnar Hellström:
> Thank you for your review.
> I will provide initial reactions here, and follow up with more exact 
> proposals for changes.
> Den 2021-04-27 kl. 12:25, skrev Jürgen Schönwälder via Datatracker:
>> Reviewer: Jürgen Schönwälder
>> Review result: Has Nits
>> I am by no means an expert in this area so please take this into
>> account while reading my comments...
>> Content comments:
>> * The document assumes "human" communication, i.e., where text
>>    originates at a speed of a human and politeness is used to resolve
>>    concurrency conflicts. This seems to be a fair assumption for the
>>    considered use cases but what happens if this assumption is not met?
>>    Can systems or RTP mixers detect and handle such situations
>>    gracefully or is the idea that any resulting "jerkiness" must be
>>    accepted if senders misbehave?
> [GH] The receivers are expected to declare their reception speed 
> capability in characters per second (over a 10 second period). A high 
> capability will allow smooth flow of text from many participants 
> simultaneously. The congestion section 9 tells what can be done if the 
> load gets too high. The mixer can then discard text, and if it has any 
> means to detect who is the main contributer, it can avid to discard 
> text from that participant. (e.g. by the sdp "content" attribute.
> This is of course not nice. But it is equally not nice to try to hear 
> any specific voice in a mixed audio channel with many participants. 
> And same with video in cases when the video has real information.
> So we are in good company with the other media in the problem that 
> there are no really good solutions to an information overload situation.
> In most cases the typing participants will send a sentence or two and 
> then stop and read or listen to the others. In such situations the 
> overload will be sorted out after a short while even without the 
> participants being very polite.
> Do you want me to elaborate more about this in the congestion chapter 9?
>> * The solution does not provide end-to-end security since the mixer
>>    must be trusted to have access to the texts in order do the mixing.
>>    This is mentioned in the security considerations and in section 2
>>    where alternatives are considered. The reason to not select a
>>    solution providing end-to-end security is give in section 1.2. Is
>>    there work planned to address this issue, i.e., to complement this
>>    solution with a solution providing end-to-end security?
> [GH] There is another individual draft 
> "draft-hellstrom-avtcore-multi-party-rtt-solutions", intended to 
> document design choices behind the reviewed draft. It discusses the 
> end-to-end security topic among other things but does not solve it. If 
> there are requests for it, that work could be continued to provide 
> specific solutions. But I would like to hear the request first.
> Maybe it is more urgent to specify how RFC 8865 "real-time text in 
> WebRTC" can be used in a multi-party setting with end-to-end security.
> I would prefer to let the discussion of the topic in the reviewed 
> draft be sufficient fo now.
>> * Perhaps the recommendation in section 4.2.6 that the mixing method
>>    for multi-party unaware endpoints is not RECOMMENDED to be used
>>    should be repeated in the security considerations? It seems there
>>    are serious limitations, in particular also related to the creation
>>    of a presentation that can make it impossible to detect masquerade
>>    attacks. Yes, masquerading is mentioned but from an outside security
>>    point of view it feels like there was a strong security solution
>>    that was discarded due to lack of implementation support, there is a
>>    somewhat OK solution (but not able to provide end-to-end security),
>>    and there is a pretty ugly solution to accommodate endpoints with no
>>    support for the other solution. If this is a fair summary, perhaps
>>    explaining this clearly in the security considerations would be a
>>    good thing.
> [GH] Yes, good point. I will compose a proposal.
>> * I am confused about Figures 5 and 6 since the mixed identities of
>>    the sources are once shown in square brackets and once in
>>    parenthesis. Are labels like [Alice] or [Bob] not inserted by the
>>    mixer? If so, why would the format on the endpoint be different? Is
>>    the idea that endpoints try to parse the mixed text in order to
>>    render it differently? Or was the idea to show that different mixers
>>    can use different styles to generate labels, i.e., I should not
>>    really compare Figure 5 and 6?
> [GH] The figures should be possible to compare. And, yes, I have 
> caused confusion by letting the mixer create labels with brackets in 
> figure 5 but with parentheses in figure 6. In figure 6 the brackets 
> are inserted by the receiving terminal in a way that has become quite 
> common in RTT implementations, but the parentheses come from the 
> mixer. Alice is the local user. Her text is merged locally and 
> therefore get the label assigned locally.
> I will change so that the type of label framing is consistent and 
> insert some words about the labels and their framing.
>> Editorial comments:
>> * I suggest to cite [T140] when you first refer to it in the
>>    Introduction:
>>    OLD
>>     A requirement related to multi-party sessions from the presentation
>>     level standard T.140 for real-time text is: "The display of text 
>> from
>>    NEW
>>     A requirement related to multi-party sessions from the presentation
>>     level standard T.140 [T140] for real-time text is: "The display 
>> of text from
> [GH] In the previous review, I got a recommendation to delete many 
> such standard names before the reference, but you are right that it 
> would probably be good with using it once.
>> * as defined -> are defined and missing full stop
>>    OLD
>>     The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
>>     mixer, RTP-translator as defined in [RFC3550]
>>    NEW
>>     The terms SDES, CNAME, NAME, SSRC, CSRC, CSRC list, CC, RTCP, RTP-
>>     mixer, RTP-translator are defined in [RFC3550].
> [GH] Yes, will do.
>> * Add reference(s) to WebRTC in the terminology section?
> [GH] Yes, will do.
> Thanks,
> Gunnar
>> _______________________________________________
>> Audio/Video Transport Core Maintenance
Gunnar Hellström