Re: [AVTCORE] Review of draft-ietf-avtcore-multi-party-rtt-mix

Gunnar Hellström <> Tue, 03 November 2020 17:00 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8949A3A0DD5 for <>; Tue, 3 Nov 2020 09:00:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.146
X-Spam-Status: No, score=-2.146 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-0.247, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id dDjCMazXw3VZ for <>; Tue, 3 Nov 2020 09:00:17 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 5C5643A0DD7 for <>; Tue, 3 Nov 2020 09:00:16 -0800 (PST)
Received: from [] ( []) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id F060420111; Tue, 3 Nov 2020 18:00:13 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dkim; t=1604422814; bh=Keaz0GCgBE1u1KkAJDA0iZc2DClmemolestF5aFzAck=; h=Subject:To:References:From:Date:In-Reply-To:From; b=JrgLTvWWycZ/Y78Ya5iBJG56bAKmFvdKMFSBNXyAoA0FibwsMyOcdROC8hezHSYyj 2fPW69mb+cUs8TMYA51JhhjGfcmpPsFGAlgcxfT0nM/mWAiSzehwPWq866fLNBD4vW bNeCm9O2XzZRm6jTbGXt8+ybs15kv4lY9hsKvAnQ=
To: Brian Rosen <>,
References: <>
From: =?UTF-8?Q?Gunnar_Hellstr=c3=b6m?= <>
Message-ID: <>
Date: Tue, 3 Nov 2020 18:00:09 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: sv
Archived-At: <>
Subject: Re: [AVTCORE] Review of draft-ietf-avtcore-multi-party-rtt-mix
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 03 Nov 2020 17:00:21 -0000

Hi Brian, thanks for good comments.

I want to start answering this comment:
"8. I don’t remember why you have to send the redundant text before 
switching sources. You might want to explain why in 3.11. It’s obvious 
you would like to do Particpant 1, initial, Participant 2, initial, 
Participant 1, redundant 1, Participant 2 redundant 2, Participant 1, 
redundant 2, Participant 2, redundant 2."

Good point. I have so far only specified use of the sequence number gap to detect packet loss and to evaluate if some redundant data in the latest received packet needs to be recovered. Then it is important that the source of the lost packets were the same as the source of the packet that we can pick the redundant data from.

Your idea to let sources shift more freely is good, and can be staisfied if we look at both sequence number gap and the contents of the timestamp offsets stored together with the redundant data in the packets.

The specification needs to be changed in sections 3.11 Source switching, 3.14 Rimer offset fields, 3.18.3 extraction , and 3.22 Packet sequence example.

In 3.11 the rule should be that when there is data available for transmission from more than one source, then a packet interval of 100 ms is recommended, and the packet to be composed and sent should have data from the source that has the oldest unsent data received in the mixer. All unsent data from that source shall be put in the packet as primary. Any earlier sent data from the same source shall be entered in the packet as before with timer offsets corresponding to when that data was sent as primary. This is done in two generations.

In 3.14 it shall be said that at reception, the primary data shall be retrieved. Then the timestamp of the latest received data from the same source evaluated against the calculated timestamps of the first redundant and second redundant data (by subtracting the timestamp offset from the timestamp). If any redundant data is younger, then that redundant data shall be retrieved and entered before the primary data mong received data from that source.

This algorithm may be simplified by doing it only if there had been any packet loss since latest reception from the same source as the last packet.

I need to check edge cases, e.g. when the loss is at the beginning or edge of a pause and other odd situations. And also what to do at the very rare occasions when many users create text simultaneously.

But in general I think this is daoable and creates better flow of text through the mixer.



Den 2020-11-02 kl. 22:22, skrev Brian Rosen:
> I have reviewed -multi-party-mix-09 and have the following comments:
> 1. Section 1 Introduction provides the solution, in detail.  I think it would be better to have a high level description of the problem, and then introduce the solution in a subsequent section.  I would also like to see a little text on what “mixing” for rtt means.
> 2. In the Selected solution and considered alternatives section, the first alternative (One RTP stream per source in same RTP session), please make it clearer that the clients have to support this, not just the mixer, and that’s the issue.
> 3. I don’t really understand the text at the end of that section where it describes when multiple typers send text simultaneously.  ISTM that RTT behaves just like voice in that turn taking is often jumbled when more than one user responds to the previous user.  It takes time to recognize it, and for agreement on who has the “floor”.
> 4.  Need a better heading for “Specified Solutions”.  Actually, I wonder if you could delete this whole section, and add a definition for “multiparty aware” and “multiparty unaware” in 1.2.  Some of this text might move to 3 and 4.2, but that’s okay
> 5. If you did remove Section 2, you could rename Section 3 to Procedures for Multi-party aware mixing
> 6. The first two paragraphs of 3.1 are O/A text.  The last 3 aren’t - they are how parties act after O/A
> 7. In 3.3, it says “As soon as a participant is known to participate in a session and being available for text reception, a Unicode BOM character SHALL be sent to it according to the procedures in this section.”  I think it is unclear who sends this to whom.  I believe based on the text in 3.8, that the participant sends BOM to the mixer, because it says “and deletion of 'BOM' characters from each participant”.  Does the participant send it to the mixer?  Does the mixer send it to the new participant?  To existing participants?
> 8. I don’t remember why you have to send the redundant text before switching sources.  You might want to explain why in 3.11.  It’s obvious you would like to do Particpant 1, initial, Participant 2, initial, Participant 1, redundant 1, Participant 2 redundant 2, Participant 1, redundant 2, Participant 2, redundant 2.
> 9. Might want to add text before 3.18.1 that says why you want to send CSRC length 0.
> 10. Typo “evenso” in 3.19
> 11. Not sure I understand what “a negotiation between security and no security SHOULD be applied. ” (3.20) is supposed to mean.
> 12. Showing examples of SHA-1 may be realistic, but not desirable.
> 13. Possible discussion of malicious participants in Security Considerations.  Not much can be done, other than require secure signaling and media (so authentication can be relied upon).
> _______________________________________________
> Audio/Video Transport Core Maintenance

Gunnar Hellström