Re: [MMUSIC] draft-holmberg-mmusic-t140-usage-data-channel - multi-party

Gunnar Hellström <> Wed, 28 August 2019 18:26 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 54AFF120059 for <>; Wed, 28 Aug 2019 11:26:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id j-_lA_RVJTPL for <>; Wed, 28 Aug 2019 11:26:27 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 5D60212001B for <>; Wed, 28 Aug 2019 11:26:27 -0700 (PDT)
X-Halon-ID: 4e655109-c9c1-11e9-bdc3-005056917a89
Received: from [] (unknown []) by (Halon) with ESMTPSA id 4e655109-c9c1-11e9-bdc3-005056917a89; Wed, 28 Aug 2019 20:26:09 +0200 (CEST)
To: Paul Kyzivat <>,
References: <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <>
From: =?UTF-8?Q?Gunnar_Hellstr=c3=b6m?= <>
Message-ID: <>
Date: Wed, 28 Aug 2019 20:26:23 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <>
Subject: Re: [MMUSIC] draft-holmberg-mmusic-t140-usage-data-channel - multi-party
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 28 Aug 2019 18:26:31 -0000

Hi Paul, see comment at the end,

Den 2019-08-28 kl. 16:49, skrev Paul Kyzivat:
> On 8/27/19 2:57 PM, Gunnar Hellström wrote:
>> Hi Paul,
>> Please see inline,
>> Den 2019-08-27 kl. 20:39, skrev Paul Kyzivat:
>>> On 8/26/19 3:59 PM, Gunnar Hellström wrote:
>>>> 4. A multi-party server S, combining a number of sources into one 
>>>> call to a participant A, with real-time text from each other 
>>>> participant (B,C,...) communicated in just one T140 data channel 
>>>> between S and A. There is a need to indicate source for each 
>>>> T140block sent to A. We currently have no way specified for that. 
>>>> An extension of T.140 could do it.
>>> Is there a reason to ever do this?
>>> In audio and video the "mixing" actually does something 
>>> irreversible. And it takes real work to do it, and doing so reduces 
>>> the bandwith considerably over what is required to transmit the 
>>> individual streams.
>>> For RTT none of that is true. There is very little impact in 
>>> bandwidth or processing in transmitting them all as separate channels.
>>> So why not just say "don't do that"?
>> Yes, interesting and realistic thought. It would likely be the best 
>> choice for many practical cases.
>> I am not sure however how it will work with a huge conference with 
>> hundreds of participants, and some of them occasionally asking for 
>> the floor and send a bit of RTT text. A server using one data channel 
>> per participant would either be required to establish an enormous 
>> amount of T140 data channels being prepared to send what has been 
>> received, or take the effort to establish a new T140 data channel to 
>> all users at the moment a user gets the floor, and then possibly 
>> close it again.
>> What do you think about that case?
> Initially each user would only have one channel. Then he gets another 
> one added each time there is a new speaker who hasn't spoken before. I 
> don't know if there would be formal floor control, or if everyone 
> would always be allowed to talk. Formal floor control would provide an 
> early hint to establish the new channels, possibly preventing delays. 
> But adding a channel isn't an expensive operation and delaying 
> messages until it is done shouldn't be a problem.
> But is this a realistic issue? Do RTT conferences with hundreds of 
> speakers happen in practice?

It is realistic to the same degree as use of video and audio in 
multi-party multi-media conferences.

It is not realistic to let hundreds of participants talk simultaneously, 
but one or a very low number. It is realistic to let one at a time have 
the floor and speak, sending the audio to all the others, and have the 
opportunity to hand over the floor to someone else with a lose or strict 

There is very little use for transmitting live video from hundreds of 
participants simultaneously, but it is realistic to show a low number of 
users and be prepared to switch who are seen.

Similarly it is not realistic to let hundreds of participants present 
new RTT simultaneously, but one or a very low number. It is realistic to 
let a few at a time transmit, sending the RTT to all the others. Anyone 
should have the opportunity to transmit RTT, controlled by either no 
protocol or a lose or a strict protocol.

RTT is sadly often characterized as media for accessibility and thereby 
expected to be used only in special cases. This is wrong from two points 
of view:

1: If used for accessibility, in cases when one or some users have no or 
little use of audio or video for language communication because of a 
disability, then it is very important that all other participants can 
use the same media, so that communication can go directly to or from the 
one(s) who need RTT. This is the whole idea of accessibility, that users 
are not forced to limited technology corners, but are able to 
participate anywhere with the same technology as others and have full 
and equal participation. (sometimes a translating service between 
different modalities are still needed so that everybody are enabled to 
use the modality they prefer)

2. In any multimedia conference there appears reasons to communicate 
something in text. The three real-time media: audio, video and RTT are 
needed together for alternating or simultaneous use. Talk, show and 
write for a complete and efficient communication. In that scenario, text 
messaging is a slow tool, causing delays and risk for losing the 
interest from  viewers. When anyone starts texting RTT, it is possible 
to start following the thoughts as they are expressed in text 
immediately from start, while for messaging the typing user needs to 
complete the message and risking to enter the message far too late to be 
valid in the discussion.

3. Automatic subtitling by speech-to-text services and even language 
translation before presentation would be useless without a form of RTT. 
It is now realistic to use such services to enhance multimedia 
conferences and make the contents available for both people with other 
languages, people in noisy areas and for deaf or hard-of-hearing users. 
This is a rapidly increasing use. Also for this case it is realistic 
only with a few actively transmitting users, while the result can be if 
interest to distribute to many.

This said, I anyway agree that the usage that will develop most rapidly 
will likely be for meetings with not more than five participants having 
audio, video and RTT available and most commonly only three taking turns 
on sending RTT. Some of these calls will be in emergency services and 
interpreting (or relay) services, while there is also a need for 
user-to-user conferences.

I would also like to know which MCU model is most realistic for these 
applications. Is it the one with separate text channels per active 
source opened and closed when there is a need, or is it the single text 
channel with in-line identification of the source for each new data 
channel message?



>     Thanks,
>     Paul
> _______________________________________________
> mmusic mailing list