Re: [AVTCORE] Question on multi-party RTT handling (draft-hellstrom-avtcore-multi-party-rtt-source-03)
Gunnar Hellström <gunnar.hellstrom@ghaccess.se> Thu, 21 May 2020 07:39 UTC
Return-Path: <gunnar.hellstrom@ghaccess.se>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id 8AE9B3A0A97
for <avt@ietfa.amsl.com>; Thu, 21 May 2020 00:39:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key)
header.d=egensajt.se
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id UjOhdEfo-BxR for <avt@ietfa.amsl.com>;
Thu, 21 May 2020 00:39:21 -0700 (PDT)
Received: from smtp.egensajt.se (smtp.egensajt.se [193.42.159.246])
(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id 5222C3A0A79
for <avt@ietf.org>; Thu, 21 May 2020 00:39:20 -0700 (PDT)
Received: from [192.168.2.136] (h79-138-72-251.cust.a3fiber.se [79.138.72.251])
(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
(No client certificate requested)
(Authenticated sender: gunnar.hellstrom@ghaccess.se)
by smtp.egensajt.se (Postfix) with ESMTPSA id B58492004F;
Thu, 21 May 2020 09:39:17 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=egensajt.se; s=dkim;
t=1590046757; bh=nS0iAULrzsQjtW7ORM26uoyLx2BpOKYju49WoSxrPN0=;
h=Subject:To:References:From:Date:In-Reply-To:From;
b=VyXaXZH6tThzkUXy1wpWTaeciGywCS5Oul9NJFz+Mzo2aiSrz1+PeIT8HJ1Q0/XAP
62PlMKUCQXds19FutDP1+ub1iyynnE97IT3HGU/SEPn0EyacK/yYmeRvY2a8ZEWtz+
lAfwWwAhHyKdsN5aZexRkkPpcBFdiIhj2nj7A50Q=
To: Yong Xin <Yong.Xin@radisys.com>, "avt@ietf.org" <avt@ietf.org>
References: <SN4PR0801MB36806FF9CD2538E08E95CDCD9CB60@SN4PR0801MB3680.namprd08.prod.outlook.com>
From: =?UTF-8?Q?Gunnar_Hellstr=c3=b6m?= <gunnar.hellstrom@ghaccess.se>
Message-ID: <8e29faf8-2abc-d7d2-6551-8c2fcfee9545@ghaccess.se>
Date: Thu, 21 May 2020 09:39:14 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
Thunderbird/68.8.0
MIME-Version: 1.0
In-Reply-To: <SN4PR0801MB36806FF9CD2538E08E95CDCD9CB60@SN4PR0801MB3680.namprd08.prod.outlook.com>
Content-Type: multipart/alternative;
boundary="------------3C340D54D9447CCF3606E61C"
Content-Language: sv
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/2H1PRQgG_jO5XT9V2M--w6Hd0XQ>
Subject: Re: [AVTCORE] Question on multi-party RTT handling
(draft-hellstrom-avtcore-multi-party-rtt-source-03)
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>,
<mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>,
<mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 May 2020 07:39:26 -0000
Dear Yong, Thanks for a good question. The draft you are asking about has been replaced by this one: https://datatracker.ietf.org/doc/draft-ietf-avtcore-multi-party-rtt-mix/ and it is modified at the point of your question, and partly because of the issue you saw with the draft you looked in. More follows inline, Den 2020-05-21 kl. 00:28, skrev Yong Xin: > > Dear Mr. Hellstrom, > > I have a question about how to use RTT mixer (rtt-mix) method with > “text/red” format for multi-party call handling, as defined in your > IETF draft > https://tools.ietf.org/html/draft-hellstrom-avtcore-multi-party-rtt-source-03. > > > > 4. Use of fields in the RTP packets > > RFC 4103 <https://tools.ietf.org/html/rfc4103>[RFC4103 > <https://tools.ietf.org/html/rfc4103>] specifies use of RFC 3550 > <https://tools.ietf.org/html/rfc3550> RTP[RFC3550], and a > redundancy format "text/red" for increased robustness. This > specification updates RFC 4102 > <https://tools.ietf.org/html/rfc4102>[RFC4102 > <https://tools.ietf.org/html/rfc4102>] and RFC 4103 > <https://tools.ietf.org/html/rfc4103>[RFC4103 > <https://tools.ietf.org/html/rfc4103>] by > introducing a rule for populating and using the CSRC-list in the RTP > packet in order to enhance the performance in multi-party RTT > sessions. > When transmitted from a mixer, the first member in the CSRC-list > SHALL contain the SSRC of the source of the primary T140block in the > packet. The second and further members in the CSRC-list SHALL > contain the SSRC of the source of the first, second, etc redundant > generations of T140blocks included in the packet. ( the recommended > level of redundancy is to use one primary and two redundant > generations of T140blocks.) In some cases, a primary or redundant > T140block is empty, but is still represented by a member in the > redundancy header. For such cases, the corresponding CSRC-list > member MUST also be included. > The CC field SHALL show the number of members in the CSRC list. > Note: This specification departs from section 4 of RFC 2198 > <https://tools.ietf.org/html/rfc2198#section-4> [RFC2198 > <https://tools.ietf.org/html/rfc2198>] > which associates the whole of the CSRC-list with the primary data and > assumes that the same list applies to reconstructed redundant data. > In the present specification a T140block is associated with exactly > one CSRC list member as described above. Also RFC 2198 > <https://tools.ietf.org/html/rfc2198> [RFC2198 > <https://tools.ietf.org/html/rfc2198>] > anticipates infrequent change to CSRCs; implementers should be aware > that the order of the CSRC-list according to this specification will > vary during transitions between transmission from the mixer of text > originated by different participants. > The picture below shows a typical RTP packet with multi-party RTT > contents and coding according to the present specification. > > 0 1 2 3 > > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > |V=2|P|X| CC=3 |M| "RED" PT | sequence number of primary | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > | timestamp of primary encoding "P" | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > | synchronization source (SSRC) identifier | > > +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ > > | CSRC list member 1 = SSRC of source of "P" | > > | CSRC list member 2 = SSRC of source of "R1" | > > | CSRC list member 3 = SSRC of source of "R2" | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > |1| T140 PT | timestamp offset of "R2" | "R2" block length | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > |1| T140 PT | timestamp offset of "R1" | "R1" block length | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > |0| T140 PT | "R2" T.140 encoded redundant data | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---------------+ > > | | "R1" T.140 encoded redundant data | | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+ > > | "P" T.140 encoded primary data | > > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > > Figure 1: text/red packet with sources indicated in the CSRC-list. > > At every transmission time, the mixer can use the primary data block > to send new texts from one source, but new texts from other sources > will have to wait in their queue for their turn. I assume it is a > round-robin fashion to determine the next source. The default text > transmission interval is 300ms, which means the texts from other > sources have to wait in the queue for at least 300ms before they can > be transmitted. I can see you have recommended to reduce the > transmission interval from 300ms to 100ms to reduce this delay, but in > the case of large conference and assuming every participant is typing > the text simultaneously, the waiting time in the queue will become > longer. For example, in a 10-party conference, even with 100ms > transmission interval, the new texts from last participant will wait > for 9x100ms = 900ms to send. This delay will be too long for some > emergency service. Increasing the redundancy level will only help to > recovery from more consecutive packet loss, but it does not help to > reduce this delay. So it looks to me this method is not ideal for > large conference, is my understanding correct? Has this issue been > discussed in the IETF meeting before? Do you have any recommendation > to solve this problem? > Your understanding is correct. There is discussion about various ways to arrange the mixing in another draft: https://tools.ietf.org/html/draft-hellstrom-avtcore-multi-party-rtt-solutions <https://tools.ietf.org/html/draft-hellstrom-avtcore-multi-party-rtt-solutions-00> It is slightly outdated now, but it contains reasoning about performance and other aspects of different solutions. The current draft replacing the one you read, specifies a packet format that enables new text from up to 16 simultaneous text sources. It is possible to send text from more simultaneous sending users, but then there will be a short delay for some. The delay for 1-16 simultaneous texters will vary between 0 and 300 milliseconds. Even in a large conference, it will in most cases be only one participant sending real-time text, but occasionally two or three. It will be as for voice or for sign language in video: It will be unmanageable for the participants to perceive media from many sources simultaneously. I agree that for text, the opportunities are a bit better than for audio and video. The text at least stays and is readable in a well arranged display where the participants can catch up reading if there were many sending simultaneously. You mention the emergency call with 10 participants as an example where a delay of 900 ms would be a risk. In the type of emergency call I think of, where one person in an emergency calls the emergency number and get a connection with an emergency call taker, I can only imagine there be in very unusual cases up to maybe 5 participants, most often taking turns nicely in sending text. It can for example be the user, the call taker, a language translator, a first responder and an expert in chemical danger. The simultaneous typing that may occur will be e.g. the user coming with more information while the first responder types some instructions for how to handle the case. The others will in most cases wait for their turn. The more common emergency call will have three participants: The calling user, the call taker and a first responder or other agent. And then the two people in the service know how to take turns. So it will be a maximum of two participants typing simultaneously. I can imagine a completely other kind of emergency conference, where people call in and report accidents they have seen to check if they are already handled, and they get reports about ongoing emergencies. If it is at all realistic to set up such service as a conference call, there would indeed be small delays before some text is presented. However, the 900 ms in your example is the time that a person normally types a word, and the person supposed to act on all these text streams may need to switch from reading another source to the end and then move to respond or look at the new source. That will always take more than one second. So even here, the replaced draft would result in good performance. And this is not what is meant with an emergency call. Maybe there will be some other applications with unmanaged conferences with real-time text where a lot of simultaneous typing will occur. Therefore I moved to specifying for up to 16 simultaneous sources. There are also both human and regulatory requirements saying that real-time text MUST not be delayed more than 500 ms or 1 second (depending on what document you read, and where the delay is measured.) So that should be obeyed for normal cases. In the replaced draft you refer to, the format is called "text/red" just as for RFC 4103, and negotiated by an sdp attribute. I got indications off-list that that would not be allowed. The change in the use of the CSRC list from what is stated in RFC 2198 would be too big. Therefore I needed to move to call it a new format "text/rex", and negotiate it by payload types in the m-line. When I realized that I needed to take that step, it was also natural to improve the format to be able to carry more text without introducing delays. Do you agree that the current draft draft-ietf-avtcore-multi-party-rtt-mix-01 solves your concerns? Thanks, Gunnar > Thanks, > > Yong > -- Gunnar Hellström GHAccess gunnar.hellstrom@ghaccess.se
- Re: [AVTCORE] Question on multi-party RTT handlin… Yong Xin
- [AVTCORE] Question on multi-party RTT handling (d… Yong Xin
- Re: [AVTCORE] Question on multi-party RTT handlin… Gunnar Hellström
- Re: [AVTCORE] Question on multi-party RTT handlin… Gunnar Hellström
- Re: [AVTCORE] Question on multi-party RTT handlin… Yong Xin
- Re: [AVTCORE] Question on multi-party RTT handlin… Gunnar Hellström
- Re: [AVTCORE] Question on multi-party RTT handlin… Yong Xin
- Re: [AVTCORE] Question on multi-party RTT handlin… Gunnar Hellström
- Re: [AVTCORE] Question on multi-party RTT handlin… Yong Xin
- Re: [AVTCORE] Question on multi-party RTT handlin… Gunnar Hellström
- Re: [AVTCORE] Question on multi-party RTT handlin… Yong Xin