Re: [Slim] Extended functionality for the real-time language negotiation

Gunnar Hellström <gunnar.hellstrom@omnitor.se> Wed, 15 March 2017 06:46 UTC

Authorized-sender: gunnar.hellstrom@omnitor.se
References: <084a066e-ea68-d614-58e1-08c904f477ea@omnitor.se> <60797269-4dad-5f48-3184-b8fbca42c30c@realtimetext.org> <FFABE6D6-316E-40E1-B923-4C44A05F39B7@brianrosen.net> <ab03fe35-048d-be00-5a7f-bb9268d0fefb@omnitor.se> <a339b1c2-e493-0dba-28f7-77ee499f5042@omnitor.se>
Cc: "slim@ietf.org" <slim@ietf.org>
From: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
Message-ID: <fba735e2-1351-7acc-3e79-b4ae949533e4@omnitor.se>
Date: Wed, 15 Mar 2017 07:46:01 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <a339b1c2-e493-0dba-28f7-77ee499f5042@omnitor.se>
Content-Type: multipart/alternative; boundary="------------C99DB21BA5D3A7F687C0EB1D"
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/wR7iFD_gEINQh3RMGRFnj5kzGvA>
Subject: Re: [Slim] Extended functionality for the real-time language negotiation
Precedence: list

This is an amended overview of possible solutions to the functionality
extensions for the slim real-time draft.

It includes the alternative to use the language subtag -t- to indicate
transformed language and thereby enable simultaneity.

The selected solutions can be included in the current draft or specified
in a separate draft. If it is specified in a separate draft, the current
draft needs to be checked and possibly adjusted so that it allows the
selected syntax of the extensions.

The letters e) f) etc refer to the summary of issues from LC.

*Indication of preference between media, and of simultanous versus
alternative languages.*

Discussion:
Issue e) says that there is a need to be able to indicate which of a set
of language/media indications are more preferred alternatives than others.
Examples are:
1. A user A want to get written English in text, A can as a less
preferred alternative accept to get spoken English. An answering party
B who can use text will then respond with written text and get good
satisfaction, while another answering user C without text capability
will answer in spoken English and have a possibility for a reasonably
successful call.
Without this indication, the first answering party B may have seen the
spoken and written alternatives as true equal preference alternatives
and answered with spoken English that will result in less satisfied users.
2. A prefer to receive spoken language, and can accept to receive
text. When answering party B can use spoken language, that will be
satisfied, otherwise written language will be used.
3. A prefer to use spoken language in both directions, and can accept to
use sign language in video in both directions. Answering party B has a
clear indication of why both signed and written is indicated and can
answer according to its capabilities trying to satisfy the preference
for spoken language.
4. A prefer to use sign language in both directions and can accept to
use written language in both directions. Sign language users will use
sign language, others will use text.
5. A prefer to send sign language and receive text (deaf-blind user) and
can accept to send text. In a call with a person with similar
preferences, text will be used both ways, otherwise sign one way and
text the other.

etc.

Issue f) requires a way to indicate use of captioning and other
situations where use of simultaneous languages in different modalities
are needed:

1. Preference for hearing spoken language and simultaneously read
written language in text. ( captioning) . The time is here when this
can be provided automatically in some settings, but also traditionally
by a manned service.
2. Preference for hearing spoken language and simultaneously seeing the
speaker in video. (lip-reading). Easily and naturally provided once
the need is known.
3. Preference for seeing sign language and simultaneously hear spoken
language in audio. ( for multiple users at the terminal ) One of the
streams is provided by an interpreter.
4. Preference for hearing spoken language and simultaneously view
written language in video. (captioning if we accept to specify text as
overlay on video, otherwise it is same as number 1.)

Some of these can be acceptable also if just one of the language/media
combinations can be provided, but is much more preferred if both can be
provided together. In other cases it is essential to get both
simultaneously. There is a need to differentiate in the indication that
this preference for getting the languages together is preferred.

Alternative coding proposals:
1. Preference between modalities
1.1 Based on draft -08, add the coding of an asterisk last in an
attribute to mean lower preference for a lanugage/media combination than
the one(s) without an asterisk.
example where audio and text are alternatives and text preferred
m=audio
a=hlang-recv:en*
m=text
a=hlang-recv:en

1.2. Change to the Accept-Language syntax and let the q-values have
scope over the whole SDP.

Example where sign language is higher preferred than text.

m=video 51372 RTP/AVP 31 32
a=hlang-recv:ase;q=0.9
a=hlang-send:ase;q=0.9

m=text 49250 RTP/AVP 98,99
a=hlang-send:en;q=0.5,*;q=0.1
a=hlang-recv:en;q=0.5,*;q=0.1

1.3. Introduce a new a=modality attribute on media level, with
parameters: <modality>, <direction>, <preference>

example:

m=text
a=modality:written,recv,hi
a=hlang-recv:en*
m=audio
a=modality:spoken,recv,med
a=hlang-recv:en*

2. Preference for simultaneous languages vs alternative languages:

2.1. Based on draft -08, add another notation to the use of the
asterisk, e.g. an optional character to be used together with or without
the asterisk to mark media that are wanted together. (ugly) example:

m=audio
a=hlang-recv:en*$c
m=text
a=hlang-recv:en$c

The $ is a simultaneity indication, the c is a grouping indicator
telling that all modalities marked with the $c are wanted together. (we
might be able to restrict the indication to just one set of languages
that are wanted simultaneously.)

2.2. Use the Accept-Language syntax for the hlang attributes and add the
usage rules that q-values with less than .1 difference mean languages
with a preference to be used together. Higher differences indicate that
they are alternatives. Thereby it is both possible to indicate
simultaneity and preference if the simultaneity cannot be satisfied.

m=audio 51372 RTP/AVP 0
a=hlang-recv:ase;q=0.5
a=hlang-send:ase;q=0.5

m=text 49250 RTP/AVP 98,99
a=hlang-send:en;q=0.51,*;q=0.1
a=hlang-recv:en;q=0.51,*;q=0.1

The q-values differences are within 0.1 so it is a preference for getting both together, but if that is not possible, text is preferred.

2.3. Add to the new a=modality attribute from solution 1.3 a fourth,
optional parameter [simultaneity] with value any single letter,
indicating a preference for having that modality simultaneously with
another modality indicated with the same value in the [simultaneity]
parameter. Without this parameter, the modalities are alternatives.
Use this solution together with solution 1.3

Example: Indicate that written English and spoken English are desired
together but the call shall not be denied if that combination is not
possible, and then written English is preferred.

m=text
a=modality:written,recv,hi,d
a=hlang-recv:en
m=audio
a=modality:spoken,recv,hi,d
a=hlang-recv:en*

The "d" is a grouping identifier.

2.4. Use the -t- subtag for transformed content on a language indication
defined in RFC 6497 as an indication that this language can be provided
or is desired together with a language in another modality. Use this
indication together with solution 1.1

Example: Indicate that written English and spoken English are desired
together and written English expected to be transformed. but the call
shall not be denied if that combination is not possible, and then
written English is preferred.

m=text
a=hlang-recv:en-t-en
m=audio
a=hlang-recv:en*

The -t- indicated with the -recv direction shall not be understood that
the indicated language needs to be transformed. It is just an
expectation that enables it to be provided simultaneously.

My judgement of the alternatives:
I have a slight preference for solutions 1.2 and 2.2 because they are
logically cleanest , but they require that we accept the LC review
proposal to move to the Accept-Language syntax.
1.1 and 2.4 are easily added to the syntax of the current draft and have
sufficient functionality for most cases.
1.3 and 2.3 cause more work and longer SDP
2.1 is ugly and kept here just for reference.

Gunnar

--
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288

[Slim] Extended functionality for the real-time l… Gunnar Hellström
Re: [Slim] Extended functionality for the real-ti… Arnoud van Wijk
Re: [Slim] Extended functionality for the real-ti… Brian Rosen
Re: [Slim] Extended functionality for the real-ti… Gunnar Hellström
Re: [Slim] Extended functionality for the real-ti… Randall Gellens
Re: [Slim] Extended functionality for the real-ti… Natasha Rooney
Re: [Slim] Extended functionality for the real-ti… Gunnar Hellström
Re: [Slim] Extended functionality for the real-ti… Gunnar Hellström