Re: [Slim] Issue 43: How to know the modality of a language indication?

Den 2017-10-15 kl. 21:27, skrev Paul Kyzivat:
> On 10/15/17 1:49 PM, Bernard Aboba wrote:
>> Paul said:
>>
>> "For the software to know must mean that it will behave differently 
>> for a tag that represents a sign language than for one that 
>> represents a spoken or written language. What is it that it will do 
>> differently?"
>>
>> [BA] In terms of behavior based on the signed/non-signed distinction, 
>> in -17 the only reference appears to be in Section 5.4, stating that 
>> certain combinations are not defined in the document (but that 
>> definition of those combinations was out of scope):
>
> I'm asking whether this is a distinction without a difference. I'm not 
> asking whether this makes a difference in the *protocol*, but whether 
> in the end it benefits the participants in the call in any way. 
<GH>Good point, I was on my way to make a similar comment earlier today. 
The difference it makes for applications to "know" what modality a 
language tag represents in its used position seems to be only for 
imagined functions that are out of scope for the protocol specification.
> For instance:
>
> - does it help the UA to decide how to alert the callee, so that the
>   callee can better decide whether to accept the call or instruct the
>   UA about how to handle the call?
<GH>Yes, for a regular human user -to-user call, the result of the 
negotiation must be presented to the participants, so that they can 
start the call with a language and modality that is agreed.
That presentation could be exactly the description from the language tag 
registry, and then no "knowledge" is needed from the application. But it 
is more likely that the application has its own string for presentation 
of the negotiated language and modality. So that will be presented. But 
it is still found by a table lookup between language tag and string for 
a language name, so no real knowledge is needed.
We have said many times that the way the application tells the user the 
result of the negotiation is out of scope for the draft, but it is good 
to discuss and know that it can be done.
A similar mechanism is also needed for configuration of the user's 
language preference profile further discussed below.
>
> - does it allow the UA to make a decision whether to accept the media?
<GH>No, the media should be accepted regardless of the result of the 
language negotiation.
>
> - can the UA use this information to change how to render the media?
<GH>Yes, for the specialized text notation of sign language we have 
discussed but currently placed out of scope, a very special rendering 
application is needed. The modality would be recognized by a script 
subtag to a sign language tag used in text media. However, I think that 
would be best to also use it with a specific text subtype, so that the 
rendering can be controlled by invocation of a "codec" for that rendering.
>
> And if there is something like this, will the UA be able to do this 
> generically based on whether the media is sign language or not, or 
> will the UA need to already understand *specific* sign language tags?
<GH>Applications will need to have localized versions of the names for 
the different sign languages and also for spoken languages and written 
languages, to be used in setting of preferences and announcing the 
results of the negotiation. It might be overkill to have such localized 
names for all languages in the IANA language registry, so it will need 
to be able to handle localized names of a subset och the registry. With 
good design however, this is just an automatic translation between a 
language tag and a corresponding name, so it does in fact not require 
any "knowledge" of what modality is used with each language tag.
The application can ask for the configuration:
"Which languages do you want to offer to send in video"
"Which languages do you want to offer to send in text"
"Which languages do you want to offer to send in audio"
"Which languages do you want to be prepared to receive in video"
"Which languages do you want to be prepared to receive in text"
"Which languages do you want to be prepared to receive in audio"

And for each question provide a list of language names to select from. 
When the selection is made, the corresponding language tag is placed in 
the profile for negotiation.

If the application provides the whole IANA language registry to the user 
for each question, then there is a possibility that the user by mistake 
selects a language that requires another modality than the question was 
about. If the application shall limit the lists provided for each 
question, then it will need a kind of knowledge about which language 
tags suit each modality (and media)

>
> E.g., A UA serving a deaf person might automatically introduce a sign 
> language interpreter into an incoming audio-only call. If the incoming 
> call has both audio and video then the video *might* be for conveying 
> sign language, or not. If not then the UA will still want to bring in 
> a sign language interpreter. But is knowing the call generically 
> contains sign language sufficient to decide against bringing in an 
> interpreter? Or must that depend on it being a sign language that the 
> user can use? If the UA is configured for all the specific sign 
> languages that the user can deal with then there is no need to 
> recognize other sign languages generically.
<GH>We are talking about specific language tags here and knowing what 
modality they are used for. The user needs to specify which sign 
languages they prefer to use. The callee application can be made to look 
for gaps between what the caller offers and what the callee can accept, 
and from that deduct which type and languages for a conversion that is 
needed, and invoke that as a relay service. That invocation can be made 
completely table driven and have corresponding translation profiles for 
available relay services. But it is more likely that it is done by 
having some knowledge about which languages are sign languages and which 
are spoken languages and sending the call to the relay service to try to 
sort out if they can handle the translation.
>
>
So, the answer is - no, the application does not really have any 
knowledge about which modality a language tag represents in its used 
position. If the user selects to indicate very rare language tag 
indications for a media, then a match will just become very unlikely.

Where does this discussion take us? Should we modify section 5.4 again?

Thanks
Gunnar
>     Thanks,
>     Paul
>
>>       5.4
>> <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4>.
>>       Undefined Combinations
>>
>>
>>
>>     The behavior when specifying a non-signed language tag for a video
>>     media stream, or a signed language tag for an audio or text media
>>     stream, is not defined in this document.
>>
>>     The problem of knowing which language tags are signed and which are
>>     not is out of scope of this document.
>>
>>
>>
>> On Sun, Oct 15, 2017 at 10:13 AM, Paul Kyzivat <pkyzivat@alum.mit.edu 
>> <mailto:pkyzivat@alum.mit.edu>> wrote:
>>
>>     On 10/15/17 2:24 AM, Gunnar Hellström wrote:
>>
>>         Paul,
>>         Den 2017-10-15 kl. 01:19, skrev Paul Kyzivat:
>>
>>             On 10/14/17 2:03 PM, Bernard Aboba wrote:
>>
>>                 Gunnar said:
>>
>>                 "Applications not implementing such specific notations
>>                 may use the following simple deductions.
>>
>>                 - A language tag in audio media is supposed to indicate
>>                 spoken modality.
>>
>>                 [BA] Even a tag with "Sign Language" in the 
>> description??
>>
>>                 - A language tag in text media is supposed to 
>> indicate                 written modality.
>>
>>                 [BA] If the tag has "Sign Language" in the description,
>>                 can this document really say that?
>>
>>                 - A language tag in video media is supposed to indicate
>>                 visual sign language modality except for the case when
>>                 it is supposed to indicate a view of a speaking person
>>                 mentioned in section 5.2 characterized by the exact same
>>                 language tag also appearing in an audio media 
>> specification.
>>
>>                 [BA] It seems like an over-reach to say that a spoken
>>                 language tag in video media should instead be
>>                 interpreted as a request for Sign Language.  If this
>>                 were done, would it always be clear which Sign Language
>>                 was intended?  And could we really assume that both
>>                 sides, if negotiating a spoken language tag in video
>>                 media, were really indicating the desire to sign?  It
>>                 seems like this could easily result interoperability
>>                 failure.
>>
>>
>>             IMO the right way to indicate that two (or more) media
>>             streams are conveying alternative representations of the
>>             same language content is by grouping them with a new
>>             grouping attribute. That can tie together an audio with a
>>             video and/or text. A language tag for sign language on the
>>             video stream then clarifies to the recipient that it is sign
>>             language. The grouping attribute by itself can indicate that
>>             these streams are conveying language.
>>
>>         <GH>Yes, and that is proposed in
>>         draft-hellstrom-slim-modality-grouping    with two kinds of
>>         grouping: One kind of grouping to tell that two or more
>>         languages in different streams are alternatives with the same
>>         content and a priority order is assigned to them to guide the
>>         selection of which one to use during the call. The other kind of
>>         grouping telling that two or more languages in different streams
>>         are desired together with the same language content but
>>         different modalities ( such as the use for captioned telephony
>>         with the same content provided in both speech and text, or sign
>>         language interpretation where you see the interpreter, or
>>         possibly spoken language interpretation with the languages
>>         provided in different audio streams ). I hope that that draft
>>         can be progressed. I see it as a needed complement to the pure
>>         language indications per media.
>>
>>
>>     Oh, sorry. I did read that draft but forgot about it.
>>
>>         The discussion in this thread is more about how an application
>>         would easily know that e.g. "ase" is a sign language and "en" is
>>         a spoken (or written) language, and also a discussion about what
>>         kinds of languages are allowed and indicated by default in each
>>         media type. It was not at all about falsely using language tags
>>         in the wrong media type as Bernard understood my wording. It was
>>         rather a limitation to what modalities are used in each media
>>         type and how to know the modality with cases that are not
>>         evident, e.g. "application" and "message" media types.
>>
>>
>>     What do you mean by "know"? Is it for the *UA* software to know, or
>>     for the human user of the UA to know? Presumably a human user that
>>     cares will understand this if presented with the information in some
>>     way. But typically this isn't presented to the user.
>>
>>     For the software to know must mean that it will behave differently
>>     for a tag that represents a sign language than for one that
>>     represents a spoken or written language. What is it that it will do
>>     differently?
>>
>>              Thanks,
>>              Paul
>>
>>
>>         Right now we have returned to a very simple rule: we define only
>>         use of spoken language in audio media, written language in text
>>         media and sign language in video media.
>>         We have discussed other use, such as a view of a speaking person
>>         in video, text overlay on video, a sign language notation in
>>         text media, written language in message media, written language
>>         in WebRTC data channels, sign written and spoken in bucket media
>>         maybe declared as application media. We do not define these
>>         cases. They are just not defined, not forbidden. They may be
>>         defined in the future.
>>
>>         My proposed wording in section 5.4 got too many
>>         misunderstandings so I gave up with it. I think we can live with
>>         5.4 as it is in version -16.
>>
>>         Thanks,
>>         Gunnar
>>
>>
>>
>>             (IIRC I suggested something along these lines a long time 
>> ago.)
>>
>>                  Thanks,
>>                  Paul
>>
>>             _______________________________________________
>>             SLIM mailing list
>>             SLIM@ietf.org <mailto:SLIM@ietf.org>
>>             https://www.ietf.org/mailman/listinfo/slim
>>             <https://www.ietf.org/mailman/listinfo/slim>
>>
>>
>>
>>     _______________________________________________
>>     SLIM mailing list
>>     SLIM@ietf.org <mailto:SLIM@ietf.org>
>>     https://www.ietf.org/mailman/listinfo/slim
>>     <https://www.ietf.org/mailman/listinfo/slim>
>>
>>
>

-- 
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288