Re: [Slim] Modality preference

Gunnar Hellström <> Tue, 20 June 2017 12:32 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 696F712EAF0 for <>; Tue, 20 Jun 2017 05:32:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id hCWT-dKE2c42 for <>; Tue, 20 Jun 2017 05:32:47 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1FAA512EAB8 for <>; Tue, 20 Jun 2017 05:32:36 -0700 (PDT)
X-Halon-ID: 8782678f-55b4-11e7-9ab8-005056917a89
Received: from [] (unknown []) by (Halon) with ESMTPSA id 8782678f-55b4-11e7-9ab8-005056917a89; Tue, 20 Jun 2017 14:32:32 +0200 (CEST)
To: Brian Rosen <>,
References: <> <> <> <>
From: Gunnar Hellström <>
Message-ID: <>
Date: Tue, 20 Jun 2017 14:32:28 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/alternative; boundary="------------2FB0C0E31AFC4F463B692C13"
Archived-At: <>
Subject: Re: [Slim] Modality preference
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Selection of Language for Internet Media <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 20 Jun 2017 12:32:50 -0000

Den 2017-06-19 kl. 22:29, skrev Brian Rosen:

>> Den 2017-06-19 kl. 15:31, skrev Brian Rosen:
>>> We’ve been working on supporting emergency services with multiple media for some time in the NENA Next Generation 9-1-1 project.
>>> At least in the U.S., all media offered should be accepted.  The language preference will be negotiated, and based on the result, an interpreting service may be bridged into the call.
You have no base for that decision if you do not have a preference 
grading between media. Then you do not see which language is the really 
preferred one.
>>> As such, a modality preference is not seen as valuable, and in fact, could be considered harmful, as the call taker should be trained to provide an initial answer in all media negotiated and react to the user’s actions thereafter.  An important characteristic of emergency services when viewed from the lens of the slim work is that the mechanisms have to work for the small percentage of cases where the normal mechanisms fail.  In this case, it would be use of a device with language and media preferences labeled by a user who isn’t the usual user of the device when an emergency call is placed.  You do NOT want to assume that the labeled preferences are actually the real preferences.  You bias your responses by the preferences, but you have to allow for variation when the call is actually answered.  So, answering in all offered media is something we would do, regardless of what the highest media preference was.
>> [GH]That sounds good as a theory, but may turn out to be equally 
>> harmful as you think that obeying modality
>> preference would be.
>> Answering in all modalities may cause an uncertainty of the caller 
>> for which modality to best continue the conversation.
>> And it can cause delays and resource waste if you want to answer with 
>> sign language in all calls indicating sign language competence. The 
>> indication of sign language competence can be for the case I have 
>> repeated many times: A hearing person competent in sign language but 
>> preferring spoken language. In order to not get everyday calls with 
>> deaf signing persons to invoke a video relay service, it is of value 
>> to specify sign laguage competence, but on a lower preference level 
>> that spoken language for this person.
>> Having that sign language competence indication cause a PSAP to 
>> answer with sign language will cause delays and resource usage if not 
>> the first sign language answer is only a recording.
> You put English first and ASL second on the video stream.   The right 
> thing happens.
Putting spoken English first in the video media specification would 
indicate that you regard it to be important for you to see the speaker 
and be seen yourself when you talk English. That is rarely the case, so 
this would be a faked way to specify modality preference, because what 
you would really want to specify is that you can do very well with 
spoken English in the audio media and ASL in video is a usable 
alternative on lower preference if the other party want to use it.

And if written English were your first preference and ASL second, then 
this trick does not work, because we cannot express written language in 
video. We have agreed that non-signed language in video means spoken.

>> If you had the opportunity to use modality preference I would think 
>> that you got better success rate and satisfied users if you first 
>> answered in the most preferred modality, and be prepared to retry 
>> with less favoured modalities a few seconds later in order to cater 
>> for the few cases with a borrowed phone.
> I don’t think so.  Please try to find a better example.  I can’t think 
> of one.
A person who want to speak and receive written language would like to 
get an answer that clearly shows that
the other party selected to receive spoken and send text. Without the 
modality preference indication, the answering party does not have any 
base for such information.
>>> Another reason why media preference is not really wanted is that emergency services can use other media even when the user doesn’t prefer to use them.  An audio feed for a deaf caller can be helpful to identify what is happening around the caller.  A video feed for a blind caller (if the device really made the offer) could similarly be useful.
>> [GH]This is not a reason to not want modality preference.
>> As said initially, we are not discussing negotiating the enabling of 
>> media. We are only specifying guidance for which language and 
>> modality to use initially in the call.
>> (we have a good parallell in RFC 4796, the SDP Content attribute, 
>> where it is said: "
>> 'content' values are just informative at the offer/answer model [8 <>] level." Maybe we should include a clear
>>   statement like that in the SLIM real-time drafts.
>>> So, I think this is not a useful capability for emergency services and if it was in the offer, I’d tend to want to ignore it and accept all media, with an initial greeting in the preferred language.
>> Even if that would be feasible for the emergency service case, you 
>> need to consider that the communicationn device needs also be useful 
>> in the everyday call situation. In that situation, the answering part 
>> will not normally have the capability to answer in three modalities 
>> in parallell. A modality preference indication is essential for the 
>> call to start smoothly with an answer in a modality and language that 
>> the caller is willing to receive, and a modality preference 
>> indication to the caller is essential for the caller to start using 
>> the most suitable modality.
> First of all, this thread is specifically about emergency services. 
>  I’m not objecting to the follow on work you are proposing.
Good, I hope we can find sufficient interest to complete it.
>  I just don’t think it’s useful or even desirable for emergency services.
In the Emergency Service projects I have participated in, we have seen 
the modality preference negotiation as useful and needed, and we have 
been sorry to not have any comleted standard to implement.

So, we just see this differently. I do not see all parts of 
draft-ietf-slim-negotiating-human-language as important, but I have 
participated and contributed with the goal to make the whole draft in 
shape for approval.
> I also think that if you offer 3 modalities, you can handle all 3 
> simultaneously.  That is the way real systems actually work,  My video 
> conference systems are now all 3 modality systems,  They offer video, 
> audio and text at answering time.  It’s normal to start with audio, 
> but some people actually announce themselves with text if they join 
> late.  Everything seems to work fine,  If an ASL caller joins, they 
> might be initially be greeted via audio but the response in ASL would 
> happen immediately, with smooth transitions by all parties.
>> So, please accept that there are good motivations for a modality 
>> preference indication. You promised me a review of a separate draft 
>> for this functionality.
>> As you may have seen, I first created two drafts for indications 
>> integrated with draft-ietf-slim-negotiating-human-language, and then 
>> an alternative more self-sustained, based on the SDP grouping 
>> framework. I would also appreciate your assessment of which 
>> alternative to prefer and proceed with. I prefer
>> draft-hellstrom-language-grouping; it is more consistent and has no 
>> known interop problems.
> I will review the drafts soon.

Gunnar Hellström
+46 708 204 288