Re: [AVTCORE] Question on multi-party RTT handling (draft-hellstrom-avtcore-multi-party-rtt-source-03)

Yong Xin <Yong.Xin@radisys.com> Fri, 22 May 2020 23:31 UTC

Return-Path: <Yong.Xin@radisys.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 426103A0E0A for <avt@ietfa.amsl.com>; Fri, 22 May 2020 16:31:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=radisyscorp.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yE8jqLyBuOQF for <avt@ietfa.amsl.com>; Fri, 22 May 2020 16:31:40 -0700 (PDT)
Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2100.outbound.protection.outlook.com [40.107.93.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 54CE93A0E09 for <avt@ietf.org>; Fri, 22 May 2020 16:31:40 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JucsXXI+g/ZlWM2ooA0qPwVf5IXpR4QCNPyoDEZJAnk21wceajjA9cAJNElLcNpdyQzf1Y7KTEgUngDQkpABDLXCqscBZ4GiVxlc9ZlGmyFoZXb7hQgTkO1oS0x5g6HLEJUyBG9eGyUhGjQnl7hu+c8IkrnMoygnwPBsMZ/BNnmv/SOrRQtKBiZQvZ+s2Woh/nHftlN8sNYSLGMgOZ4t9Dw1MRQ6lB0sboquXoOx1pUW/fu/HcA1wtZBiX9JJOfcC3ODhVsRGMIybhx24kJ3woOoh+QTEk70/S6UkRtPD8mTrEZijhNKahfEO7UC/oQSq9AjUVagYLdX5TFK6A6r9A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3tRZ/PT4K3NyA36eyzdqCFarB1fyO5x01ATIUDwBSe8=; b=D/DwKikBStCud5MAOsELbK9xzu3Gyu2bTO+fadtSrv1jhnJzcwDzrpf9mgAZYDmTvE3rvZxV3nZutADyy6jKFD7Np7EdqGGnk4ZEKoLdGqRsorlI5jt5cb4d/K0k3ARVEo4rdIx7kEHDzzektBj4QNU2+BsRfpeRd2axschIXfSDCZ5kfIGGUj+VMB2i5ysHY7ACg3ETGSElczvhRfpsVpCcXm/bfZsasvN8s/GJnIU13n1DRR35fllmaY4UHpbclGxsudr4eSyIAOGDVirRaAkKZR9aln2Fpx0fMF8v1iqxpJeMZQ55BKF3fSdWwg4GnKrPiDkJ78OUjJkz4i+flQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=radisys.com; dmarc=pass action=none header.from=radisys.com; dkim=pass header.d=radisys.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Radisyscorp.onmicrosoft.com; s=selector2-Radisyscorp-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3tRZ/PT4K3NyA36eyzdqCFarB1fyO5x01ATIUDwBSe8=; b=7/qWfNwg/xCoertM/6Ihdf28z/56ApqHMxfmHccBHxfz6KBrFEb/D0Q4UDJ072bYzRvm1kzjKaw05FQBTQkP1310uLVRyUy8MWTRDwo/VGra18zgUt+z9PYoUxW2zZBO+EoIwz8p6sLnaikEsXHAl3DbHjuzSDk5kG8nNUyPvXE=
Received: from MWHPR0801MB3674.namprd08.prod.outlook.com (2603:10b6:301:79::13) by MWHPR0801MB3772.namprd08.prod.outlook.com (2603:10b6:301:7f::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3000.31; Fri, 22 May 2020 23:31:30 +0000
Received: from MWHPR0801MB3674.namprd08.prod.outlook.com ([fe80::d99d:b325:8253:6930]) by MWHPR0801MB3674.namprd08.prod.outlook.com ([fe80::d99d:b325:8253:6930%6]) with mapi id 15.20.3021.026; Fri, 22 May 2020 23:31:30 +0000
From: Yong Xin <Yong.Xin@radisys.com>
To: Gunnar Hellström <gunnar.hellstrom@ghaccess.se>
CC: "avt@ietf.org" <avt@ietf.org>
Thread-Topic: Question on multi-party RTT handling (draft-hellstrom-avtcore-multi-party-rtt-source-03)
Thread-Index: AdYu9djRhXB/Vlx3RPC7kBNomgolYQATRPoAACIX3yAAFHBuAAAags+g
Date: Fri, 22 May 2020 23:31:30 +0000
Message-ID: <MWHPR0801MB3674E1617DF2EB5ABA1E438B9CB40@MWHPR0801MB3674.namprd08.prod.outlook.com>
References: <SN4PR0801MB36806FF9CD2538E08E95CDCD9CB60@SN4PR0801MB3680.namprd08.prod.outlook.com> <8e29faf8-2abc-d7d2-6551-8c2fcfee9545@ghaccess.se> <MWHPR0801MB3674D15E6F2011F31323FFF69CB40@MWHPR0801MB3674.namprd08.prod.outlook.com> <ebc55345-21e4-67e0-5161-70bb8941599c@ghaccess.se>
In-Reply-To: <ebc55345-21e4-67e0-5161-70bb8941599c@ghaccess.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_8aa00c31-701e-4223-8b9c-13bd86c6a24f_Enabled=true; MSIP_Label_8aa00c31-701e-4223-8b9c-13bd86c6a24f_SetDate=2020-05-22T23:31:03Z; MSIP_Label_8aa00c31-701e-4223-8b9c-13bd86c6a24f_Method=Standard; MSIP_Label_8aa00c31-701e-4223-8b9c-13bd86c6a24f_Name=8aa00c31-701e-4223-8b9c-13bd86c6a24f; MSIP_Label_8aa00c31-701e-4223-8b9c-13bd86c6a24f_SiteId=d05e4a96-dcd9-4c15-a71a-9c868da4f308; MSIP_Label_8aa00c31-701e-4223-8b9c-13bd86c6a24f_ActionId=ec47b53c-6ce9-41d2-9fd1-425dd2311557; MSIP_Label_8aa00c31-701e-4223-8b9c-13bd86c6a24f_ContentBits=0
authentication-results: ghaccess.se; dkim=none (message not signed) header.d=none;ghaccess.se; dmarc=none action=none header.from=radisys.com;
x-originating-ip: [2001:569:7c00:fa00:191:f4f9:8d01:ed4c]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 5d917eb4-5d6f-44e9-e00b-08d7fea8417b
x-ms-traffictypediagnostic: MWHPR0801MB3772:
x-microsoft-antispam-prvs: <MWHPR0801MB3772615C2957CE448918AF489CB40@MWHPR0801MB3772.namprd08.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:9508;
x-forefront-prvs: 04111BAC64
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: nqLjfZvdtO5+ceBwx3d2S08fpPP9hOZ127t/GoIvquj5uwOSada4CCWSxyP4U0R24xKniIoCcpN+aJF4XvJmzWuACVb0FCd3Bt7PyLsUFGbNRM/OY5dE2/Rfb9L7FYEBZHs0O7NLOpLRs5gZaBvqsbFudj6WOJ4Ut2y3vxK/SPYvfTxJ81FqfGmYEGb5msLI1VMf3d8ewFdGBMcSPVa3tFwcD+95QPE0esE4zqkZCY3wH5AqqjNpwPLnTTrE6ROsB65rqbKXyptOPlZ2vZVK7YFLlqsypgFwkSDn4DJAtgAU+uY6WcOxhjUrcYCeCKBEMWqA637gz70SJv+bU0VliwVJFhn01wfs4tdicghj+O1Wgiaql23LQqQbkxF5Sx28NsGfPWMxAa2WOYv/i+D7H628difYzfLsnTdVfSIb5rK2TF6ww2Pq/LzHLYzF3KXa2dkmZmkjlt9+WLhQXi1Os/pbR6Cv+kKAVOT6Uz3GkQ4Vn5MiUmDNTTEwy6x42aETtqNDVBgtsWtOvto+ZE+KCg==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MWHPR0801MB3674.namprd08.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(396003)(39850400004)(136003)(376002)(346002)(366004)(8676002)(7696005)(86362001)(2906002)(71200400001)(33656002)(8936002)(9686003)(478600001)(66574014)(5660300002)(52536014)(166002)(966005)(55016002)(9326002)(66446008)(64756008)(66946007)(30864003)(66556008)(76116006)(316002)(6506007)(186003)(66476007)(6916009)(4326008)(53546011)(579004)(559001); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata: SMco0njKm9vVZgDJiQdqt+b2Rjx+EP4u7ZojbUteQQ++zo+3YO14wf5u55OmHILIpwHiSlRNykLRtUrcBtfqB+NbggAB3u6xyHh4xtKbweuDgVeC+Zrk3L6yGMJyir7SMcLb5uSkC4Rggh4nOgpDP3syEc4jQMHAG4vaU4yMHr72HZcbXeS3sqNgXyt/vS30fVOD5COIaVdYRYIrV5cFgpb+8JSOUj+Zs8lxS9nu2Hrj26ZJ1sYxkeKrbfwYjEV8PPVktjb5hr7jnhNj8vBqqeOkQgRciVBYMPz79wUpIMbjHE5627z1zZpQEXP/plfWrLE5Acn2+OxtKgitfgWRA64rAHObpJM5lfijaFJPZnDbsDS4eGVG7kc5ndkBvTVdqQhJFMzDz80dU2Ckr0YXcrnPun0WjQoUX6lNVt6T/+ZHQx4BkkoH1cDLtsCtQ1t7KnoJd34Ipy7z+lhI93u9L4VlZZTWYchFmJY9TtMc1FYfwVRbbq1aiS/hPje1U1ig55WNxuQvezWjf2t2MW1IX7iTxj7vTnb4ynH/eD9nwve1Bp7Vro3plAFcsmMOBAY6
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_MWHPR0801MB3674E1617DF2EB5ABA1E438B9CB40MWHPR0801MB3674_"
MIME-Version: 1.0
X-OriginatorOrg: radisys.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 5d917eb4-5d6f-44e9-e00b-08d7fea8417b
X-MS-Exchange-CrossTenant-originalarrivaltime: 22 May 2020 23:31:30.2282 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: d05e4a96-dcd9-4c15-a71a-9c868da4f308
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: Y+74MoXnCu+pzRSyhyF/vq7bzRwJZS+f3V3jd2cRta9ugc5QDAAFpq4kt8myAvzoPeQRTd597uBuINnzircoIw==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR0801MB3772
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/dAmTyU9MjIpXpVxKa7kZoE5lG3k>
Subject: Re: [AVTCORE] Question on multi-party RTT handling (draft-hellstrom-avtcore-multi-party-rtt-source-03)
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 May 2020 23:31:47 -0000

Hi Gunnar,

Thanks for the clarification - see my response inline

Regards,
Yong

From: Gunnar Hellström <gunnar.hellstrom@ghaccess.se>
Sent: Friday, May 22, 2020 2:41 AM
To: Yong Xin <Yong.Xin@radisys.com>; avt@ietf.org
Subject: Re: Question on multi-party RTT handling (draft-hellstrom-avtcore-multi-party-rtt-source-03)


Hi Yong,

Thanks for good questions,
Den 2020-05-22 kl. 02:53, skrev Yong Xin:
Dear Hellstrom,

Thanks for the quick response. The latest spec does address my concern. I have some follow-up questions:


  *   The new payload format "text/rex" can be used with or without redundancy. When redundancy is used, mixer has to use the same redundancy level when transmitting texts from multiple sources. If the different party in the same conference has negotiated a different redundancy level, the mixer has to pick the lowest level to use, right?

No. There are two sides of this answer:

The mixer should do separate mixing for each recipient, using the redundancy level agreed with each recipient. This is also because the users do not want to see their own transmitted text being received from the mixer. The own text is displayed locally by the endpoints. If the recipient does not support "text/rex", the mixer also need to do the mixing for multi-party unaware endpoints using the "text/red" format  described in section 13.2.

[YX] Understood, the text mixer is providing N-1 mixing, similar to audio mixing, so the user never receive their own transmitted text from the mixer.

And the mixer must recover from loss in reception from each source and create a queue of clean text from each source before composing the packets for transmission. The mixer cannot just resend received packet contents with redundancy, because the recovery mechanism requires the sequence number gaps for loss detection, and the mixer must create its own sequence number series in the transmission.

[YX] I agree what you said here. I think I'm little confused when reading the following paragraph in section 3 of the spec. Let me put an example, there's a 3-party conference and all participants (A, B, C) are conference-aware RTT terminals and support text/rex packet format. User A, B, C negotiates different redundancy level 2, 3, 1 respectively. When mixer transmitting text (source from B & C) to user A, what is the number of redundant generations should be used by the mixer in the transmitted packet? Is it 2 or 1?
   The number of redundant generations of T140blocks to include in
   transmitted packets SHALL be deducted from the SDP negotiation.  It
   SHOULD be set to the minimum of the number declared by the receiver
   and the transmitter.  The same number of redundant generations MUST
   be used for all sources in the transmissions.  The number of
   generations sent to a receiver SHALL be the same during the whole
   session unless it is modified by session renegotiation.

  *   But in case there's one party has negotiated "text/rex" without any redundancy level, does that mean mixer has to turn of the redundancy for this conference? Does mixer need to change the redundancy level up and down dynamically as user joins or leaves the conference? Does mixer need to send re-INVITE to re-negotiate the redundancy level with other party when such change happens?
>From the logic above, the answers on these questions are: no. I realize that an explanation should be inserted in the beginning of section 3. "Actions at transmission by the mixer" to clarify that the source for transmission from the mixer is clean text in separate queues regardless of which format or protocol they used in the individual receptions.

[YX] This is related to the above question. Some clarification in the spec would be helpful.



  *   In section 12, I noticed the 150cps recommendation is still there and has been made as default value for the new packet format, but the transmission interval is back to 300ms (the recommended interval was 100ms in the old spec). I guess with the new packet format, it is not required to use the shorter transmission interval any more.
The transmission interval is mentioned in two paragraphs in section 3. One saying that the default is 300 ms, the other saying:

"For multi-party operation, it is RECOMMENDED that the mixer sends a packet to each receiver as soon as text has been received from a source as long as the maximum number of characters per second indicated by the recipient is not exceeded, and also the number of packets sent per second to a recipient is kept under a specified number.  This number SHALL be 10 if no other limit is applied for the application.  The intention is to keep the latency introduced by the mixer low."

This is intended to create a balance between low latency and protection against bursty packet loss. Even if the latency requirements from real-time text users are much lower than from audio and video users, a low latency is appreciated, and latency of over 2 seconds end-to-end creates conversation problems.  Therefore, this paragraph about when to transmit will self-regulate to about 100 ms packet interval from about 3 simultaneous typing sources.

The 300 ms default assures that the remaining redundancy transmissions will be sent even shortly after all sources have stopped typing.

[YX] So are you saying the recommended transmission interval in the new spec is still 100ms, and the 300ms is actually the time that covers 3 transmissions (one primary transmission plus 2 redundancy transmissions, assuming redundancy level 2), is my understanding correct? I guess this is better clarified in the new spec. The old spec is much clear in this definition.

However, this algorithm may make the protection against bursty loss weaker than with a steady 300 ms interval. With between 17 and 32 simultaneous typing users, the latency caused by the mixer will be around 300 ms and then passes both regulatory and human requirements.

Now, even if it passes the requirements, 32 is a very unrealistic number of simultaneous typing users. In audio conferences it is only possible to perceive one source at a time well. The benefit of enabling more is just for noticing that someone else want to say something. Requirements for this work are collected in draft-hellstrom-avtcore-multi-party-rtt-solutions-00, and there the performance requirements are set to be valid for up to 5 simultaneously transmitting users and the delay caused by the mixer to be less than 500 ms. I think we should design for these figures.

  *   And you mentioned these characteristics provide for smooth flow of text with acceptable latency from at least 32 sources simultaneously. Since the new packet format can support up to 16 sources per packet, the text from 32 sources will have be transmitted in turn. If my calculation is correct, with 300ms transmission interval and redundancy level 2, it will take 900ms (one primary + 2 redundant) for mixer to switch from first 16 sources to next 16 sources, so the delay is about 900ms. Is this the acceptable latency in your mind?
See discussions above. Maybe regulators need to say how many simultaneous users the requirements are for. I think 5 is a high and good figure even if the discussion above indicates 32 to be possible.
[YX] Yes I agree, at least for emergency type of service, I don't see a multi-party use case that requires more than 5 parties



  *   There're quite a few updates in the spec in the last couple of months. When do you expect this IETF draft will get finalized and approved?

Yes, I hope for a short period of discussions like yours that I appreciate to sort out the main principles, and then a period of the different levels of last calls and refinement editing. Even if one would hope for a more rapid progress, a realistic milestone for sending it to IESG has been set for February 2021. It would be good if various organisations who would reference it in their specifications (as 3GPP, ATIS, NENA, ETSI etc.) would take a look already now and assess if it is agreeable and on the right way.

[YX] Thanks for making effort on this

Thanks

Gunnar

  *

Regards,
Yong

From: Gunnar Hellström <gunnar.hellstrom@ghaccess.se><mailto:gunnar.hellstrom@ghaccess.se>
Sent: Thursday, May 21, 2020 12:39 AM
To: Yong Xin <Yong.Xin@radisys.com><mailto:Yong.Xin@radisys.com>; avt@ietf.org<mailto:avt@ietf.org>
Subject: Re: Question on multi-party RTT handling (draft-hellstrom-avtcore-multi-party-rtt-source-03)

The e-mail below is from an external source. Please do not open attachments or click links from an unknown or suspicious origin.

Dear  Yong,

Thanks for a good question.

The draft you are asking about has been replaced by this one:

https://datatracker.ietf.org/doc/draft-ietf-avtcore-multi-party-rtt-mix/

and it is modified at the point of your question, and partly because of the issue you saw with the draft you looked in. More follows inline,
Den 2020-05-21 kl. 00:28, skrev Yong Xin:
Dear Mr. Hellstrom,

I have a question about how to use RTT mixer (rtt-mix) method with "text/red" format for multi-party call handling, as defined in your IETF draft https://tools.ietf.org/html/draft-hellstrom-avtcore-multi-party-rtt-source-03.
  4. Use of fields in the RTP packets

   RFC 4103<https://tools.ietf.org/html/rfc4103>[RFC4103<https://tools.ietf.org/html/rfc4103>] specifies use of RFC 3550<https://tools.ietf.org/html/rfc3550> RTP[RFC3550], and a

   redundancy format "text/red" for increased robustness.  This

   specification updates RFC 4102<https://tools.ietf.org/html/rfc4102>[RFC4102<https://tools.ietf.org/html/rfc4102>] and RFC 4103<https://tools.ietf.org/html/rfc4103>[RFC4103<https://tools.ietf.org/html/rfc4103>] by

   introducing a rule for populating and using the CSRC-list in the RTP

   packet in order to enhance the performance in multi-party RTT

   sessions.



   When transmitted from a mixer, the first member in the CSRC-list

   SHALL contain the SSRC of the source of the primary T140block in the

   packet.  The second and further members in the CSRC-list SHALL

   contain the SSRC of the source of the first, second, etc redundant

   generations of T140blocks included in the packet. ( the recommended

   level of redundancy is to use one primary and two redundant

   generations of T140blocks.)  In some cases, a primary or redundant

   T140block is empty, but is still represented by a member in the

   redundancy header.  For such cases, the corresponding CSRC-list

   member MUST also be included.



   The CC field SHALL show the number of members in the CSRC list.



   Note: This specification departs from section 4 of RFC 2198<https://tools.ietf.org/html/rfc2198#section-4> [RFC2198<https://tools.ietf.org/html/rfc2198>]

   which associates the whole of the CSRC-list with the primary data and

   assumes that the same list applies to reconstructed redundant data.

   In the present specification a T140block is associated with exactly

   one CSRC list member as described above.  Also RFC 2198<https://tools.ietf.org/html/rfc2198> [RFC2198<https://tools.ietf.org/html/rfc2198>]

   anticipates infrequent change to CSRCs; implementers should be aware

   that the order of the CSRC-list according to this specification will

   vary during transitions between transmission from the mixer of text

   originated by different participants.



   The picture below shows a typical RTP packet with multi-party RTT

   contents and coding according to the present specification.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X| CC=3  |M|  "RED" PT   |   sequence number of primary  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               timestamp of primary encoding "P"               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |  CSRC list member 1 = SSRC of source of "P"                   |
      |  CSRC list member 2 = SSRC of source of "R1"                  |
      |  CSRC list member 3 = SSRC of source of "R2"                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |1|   T140 PT   |  timestamp offset of "R2" | "R2" block length |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |1|   T140 PT   |  timestamp offset of "R1" | "R1" block length |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |0|   T140 PT   | "R2" T.140 encoded redundant data             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+---------------+
      |   |  "R1" T.140 encoded redundant data        |               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         +-+-+-+
      |              "P" T.140 encoded primary data             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       Figure 1: text/red packet with sources indicated in the CSRC-list.

At every transmission time, the mixer can use the primary data block to send new texts from one source, but new texts from other sources will have to wait in their queue for their turn. I assume it is a round-robin fashion to determine the next source. The default text transmission interval is 300ms, which means the texts from other sources have to wait in the queue for at least 300ms before they can be transmitted. I can see you have recommended to reduce the transmission interval from 300ms to 100ms to reduce this delay, but in the case of large conference and assuming every participant is typing the text simultaneously, the waiting time in the queue will become longer. For example, in a 10-party conference, even with 100ms transmission interval, the new texts from last participant will wait for 9x100ms = 900ms to send. This delay will be too long for some emergency service. Increasing the redundancy level will only help to recovery from more consecutive packet loss, but it does not help to reduce this delay. So it looks to me this method is not ideal for large conference, is my understanding correct? Has this issue been discussed in the IETF meeting before? Do you have any recommendation to solve this problem?

Your understanding is correct. There is discussion about various ways to arrange the mixing in another draft: https://tools.ietf.org/html/draft-hellstrom-avtcore-multi-party-rtt-solutions<https://tools.ietf.org/html/draft-hellstrom-avtcore-multi-party-rtt-solutions-00>

It is slightly outdated now, but it contains reasoning about performance and other aspects of different solutions.

The current draft replacing the one you read, specifies a packet format that enables new text from up to 16 simultaneous text sources. It is possible to send text from more simultaneous sending users, but then there will be a short delay for some. The delay for 1-16 simultaneous texters will vary between 0 and 300 milliseconds.

Even in a large conference, it will in most cases be only one participant sending real-time text, but occasionally two or three. It will be as for voice or for sign language in video: It will be unmanageable for the participants to perceive media from many sources simultaneously. I agree that for text, the opportunities are a bit better than for audio and video. The text at least stays and is readable in a well arranged display where the participants can catch up reading if there were many sending simultaneously.

You mention the emergency call with 10 participants as an example where a delay of 900 ms would be a risk. In the type of emergency call I think of, where one person in an emergency calls the emergency number and get a connection with an emergency call taker, I can only imagine there be in very unusual cases up to maybe 5 participants, most often taking turns nicely in sending text.

It can for example be the user, the call taker, a language translator, a first responder and an expert in chemical danger. The simultaneous typing that may occur will be e.g. the user coming with more information while the first responder types some instructions for how to handle the case. The others will in most cases wait for their turn.

The more common emergency call will have three participants: The calling user, the call taker and a first responder or other agent. And then the two people in the service know how to take turns. So it will be a maximum of two participants typing simultaneously.

I can imagine a completely other kind of emergency conference, where people call in and report accidents they have seen to check if they are already handled, and they get reports about ongoing emergencies. If it is at all realistic to set up such service as a conference call, there would indeed be small delays before some text is presented. However, the 900 ms in your example is the time that a person normally types a word, and the person supposed to act on all these text streams may need to switch from reading another source to the end and then move to respond or look at the new source. That will always take more than one second. So even here, the replaced draft would result in good performance. And this is not what is meant with an emergency call.

Maybe there will be some other applications with unmanaged conferences with real-time text where a lot of simultaneous typing will occur.  Therefore I moved to specifying for up to 16 simultaneous sources.

There are also both human and regulatory requirements saying that real-time text MUST not be delayed more than 500 ms or 1 second (depending on what document you read, and where the delay is measured.) So that should be obeyed for normal cases.

In the replaced draft you refer to, the format is called "text/red" just as for RFC 4103, and negotiated by an sdp attribute. I got indications off-list that that would not be allowed. The change in the use of the CSRC list from what is stated in RFC 2198 would be too big. Therefore I needed to move to call it a new format "text/rex", and negotiate it by payload types in the m-line. When I realized that I needed to take that step, it was also natural to improve the format to be able to carry more text without introducing delays.

Do you agree that the current draft draft-ietf-avtcore-multi-party-rtt-mix<https://datatracker.ietf.org/doc/draft-ietf-avtcore-multi-party-rtt-mix/>-01 solves your concerns?

Thanks,

Gunnar





Thanks,
Yong


--

Gunnar Hellström

GHAccess

gunnar.hellstrom@ghaccess.se<mailto:gunnar.hellstrom@ghaccess.se>

--

Gunnar Hellström

GHAccess

gunnar.hellstrom@ghaccess.se<mailto:gunnar.hellstrom@ghaccess.se>