Re: [Slim] Issue 43: How to know the modality of a language indication?

On 10/15/17 5:22 PM, Gunnar Hellström wrote:
> Den 2017-10-15 kl. 21:27, skrev Paul Kyzivat:
>> On 10/15/17 1:49 PM, Bernard Aboba wrote:
>>> Paul said:
>>>
>>> "For the software to know must mean that it will behave differently 
>>> for a tag that represents a sign language than for one that 
>>> represents a spoken or written language. What is it that it will do 
>>> differently?"
>>>
>>> [BA] In terms of behavior based on the signed/non-signed distinction, 
>>> in -17 the only reference appears to be in Section 5.4, stating that 
>>> certain combinations are not defined in the document (but that 
>>> definition of those combinations was out of scope):
>>
>> I'm asking whether this is a distinction without a difference. I'm not 
>> asking whether this makes a difference in the *protocol*, but whether 
>> in the end it benefits the participants in the call in any way. 
> <GH>Good point, I was on my way to make a similar comment earlier today. 
> The difference it makes for applications to "know" what modality a 
> language tag represents in its used position seems to be only for 
> imagined functions that are out of scope for the protocol specification.
>> For instance:
>>
>> - does it help the UA to decide how to alert the callee, so that the
>>   callee can better decide whether to accept the call or instruct the
>>   UA about how to handle the call?
> <GH>Yes, for a regular human user -to-user call, the result of the 
> negotiation must be presented to the participants, so that they can 
> start the call with a language and modality that is agreed.

*Today* we don't do this. We leave it for the end users to negotiate the 
language they will use by other means - typically in-band by informal 
negotiation. Most UAs don't have any provision to provide the specified 
language to the end user.

This of course will also work for deaf users with sign language in the 
absence of any interpreters in the call. When it doesn't work is when 
there is a deaf user and a hearing user, and video isn't present at both 
ends. But in that case there isn't much hope.

> That presentation could be exactly the description from the language tag 
> registry, and then no "knowledge" is needed from the application. But it 
> is more likely that the application has its own string for presentation 
> of the negotiated language and modality. So that will be presented. But 
> it is still found by a table lookup between language tag and string for 
> a language name, so no real knowledge is needed.

Clearly presenting the raw language tags won't be useful to most users. 
So some sort of translation is needed. Such a hypothetical translation 
table could also have properties, such as whether the language is a sign 
language. Hence this doesn't seem to be a problem we need to solve.

> We have said many times that the way the application tells the user the 
> result of the negotiation is out of scope for the draft, but it is good 
> to discuss and know that it can be done.
> A similar mechanism is also needed for configuration of the user's 
> language preference profile further discussed below.
>>
>> - does it allow the UA to make a decision whether to accept the media?
> <GH>No, the media should be accepted regardless of the result of the 
> language negotiation.
>>
>> - can the UA use this information to change how to render the media?
> <GH>Yes, for the specialized text notation of sign language we have 
> discussed but currently placed out of scope, a very special rendering 
> application is needed. The modality would be recognized by a script 
> subtag to a sign language tag used in text media. However, I think that 
> would be best to also use it with a specific text subtype, so that the 
> rendering can be controlled by invocation of a "codec" for that rendering.

But won't the rendering also be in some way language specific? If so, 
the device will need to be configured for the specific languages it 
supports. Hence again a generic mechanism isn't needed.

>> And if there is something like this, will the UA be able to do this 
>> generically based on whether the media is sign language or not, or 
>> will the UA need to already understand *specific* sign language tags?
> <GH>Applications will need to have localized versions of the names for 
> the different sign languages and also for spoken languages and written 
> languages, to be used in setting of preferences and announcing the 
> results of the negotiation. It might be overkill to have such localized 
> names for all languages in the IANA language registry, so it will need 
> to be able to handle localized names of a subset och the registry. With 
> good design however, this is just an automatic translation between a 
> language tag and a corresponding name, so it does in fact not require 
> any "knowledge" of what modality is used with each language tag.

My earlier comment applies to the above.

> The application can ask for the configuration:
> "Which languages do you want to offer to send in video"
> "Which languages do you want to offer to send in text"
> "Which languages do you want to offer to send in audio"
> "Which languages do you want to be prepared to receive in video"
> "Which languages do you want to be prepared to receive in text"
> "Which languages do you want to be prepared to receive in audio"
> 
> And for each question provide a list of language names to select from. 
> When the selection is made, the corresponding language tag is placed in 
> the profile for negotiation.

Hence, there need not be any *generic* mechanism, since this is based on 
configuration.

> If the application provides the whole IANA language registry to the user 
> for each question, then there is a possibility that the user by mistake 
> selects a language that requires another modality than the question was 
> about. If the application shall limit the lists provided for each 
> question, then it will need a kind of knowledge about which language 
> tags suit each modality (and media)
> 
> 
>>
>> E.g., A UA serving a deaf person might automatically introduce a sign 
>> language interpreter into an incoming audio-only call. If the incoming 
>> call has both audio and video then the video *might* be for conveying 
>> sign language, or not. If not then the UA will still want to bring in 
>> a sign language interpreter. But is knowing the call generically 
>> contains sign language sufficient to decide against bringing in an 
>> interpreter? Or must that depend on it being a sign language that the 
>> user can use? If the UA is configured for all the specific sign 
>> languages that the user can deal with then there is no need to 
>> recognize other sign languages generically.
> <GH>We are talking about specific language tags here and knowing what 
> modality they are used for. The user needs to specify which sign 
> languages they prefer to use. The callee application can be made to look 
> for gaps between what the caller offers and what the callee can accept, 
> and from that deduct which type and languages for a conversion that is 
> needed, and invoke that as a relay service. That invocation can be made 
> completely table driven and have corresponding translation profiles for 
> available relay services. But it is more likely that it is done by 
> having some knowledge about which languages are sign languages and which 
> are spoken languages and sending the call to the relay service to try to 
> sort out if they can handle the translation.
>>
>>
> So, the answer is - no, the application does not really have any 
> knowledge about which modality a language tag represents in its used 
> position. If the user selects to indicate very rare language tag 
> indications for a media, then a match will just become very unlikely.
> 
> Where does this discussion take us? Should we modify section 5.4 again?

Frankly, I see no need for section 5.4.

	Thanks,
	Paul

> Thanks
> Gunnar
>>     Thanks,
>>     Paul
>>
>>>       5.4
>>> <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4>. 
>>>
>>>       Undefined Combinations
>>>
>>>
>>>
>>>     The behavior when specifying a non-signed language tag for a video
>>>     media stream, or a signed language tag for an audio or text media
>>>     stream, is not defined in this document.
>>>
>>>     The problem of knowing which language tags are signed and which are
>>>     not is out of scope of this document.
>>>
>>>
>>>
>>> On Sun, Oct 15, 2017 at 10:13 AM, Paul Kyzivat <pkyzivat@alum.mit.edu 
>>> <mailto:pkyzivat@alum.mit.edu>> wrote:
>>>
>>>     On 10/15/17 2:24 AM, Gunnar Hellström wrote:
>>>
>>>         Paul,
>>>         Den 2017-10-15 kl. 01:19, skrev Paul Kyzivat:
>>>
>>>             On 10/14/17 2:03 PM, Bernard Aboba wrote:
>>>
>>>                 Gunnar said:
>>>
>>>                 "Applications not implementing such specific notations
>>>                 may use the following simple deductions.
>>>
>>>                 - A language tag in audio media is supposed to indicate
>>>                 spoken modality.
>>>
>>>                 [BA] Even a tag with "Sign Language" in the 
>>> description??
>>>
>>>                 - A language tag in text media is supposed to 
>>> indicate                 written modality.
>>>
>>>                 [BA] If the tag has "Sign Language" in the description,
>>>                 can this document really say that?
>>>
>>>                 - A language tag in video media is supposed to indicate
>>>                 visual sign language modality except for the case when
>>>                 it is supposed to indicate a view of a speaking person
>>>                 mentioned in section 5.2 characterized by the exact same
>>>                 language tag also appearing in an audio media 
>>> specification.
>>>
>>>                 [BA] It seems like an over-reach to say that a spoken
>>>                 language tag in video media should instead be
>>>                 interpreted as a request for Sign Language.  If this
>>>                 were done, would it always be clear which Sign Language
>>>                 was intended?  And could we really assume that both
>>>                 sides, if negotiating a spoken language tag in video
>>>                 media, were really indicating the desire to sign?  It
>>>                 seems like this could easily result interoperability
>>>                 failure.
>>>
>>>
>>>             IMO the right way to indicate that two (or more) media
>>>             streams are conveying alternative representations of the
>>>             same language content is by grouping them with a new
>>>             grouping attribute. That can tie together an audio with a
>>>             video and/or text. A language tag for sign language on the
>>>             video stream then clarifies to the recipient that it is sign
>>>             language. The grouping attribute by itself can indicate that
>>>             these streams are conveying language.
>>>
>>>         <GH>Yes, and that is proposed in
>>>         draft-hellstrom-slim-modality-grouping    with two kinds of
>>>         grouping: One kind of grouping to tell that two or more
>>>         languages in different streams are alternatives with the same
>>>         content and a priority order is assigned to them to guide the
>>>         selection of which one to use during the call. The other kind of
>>>         grouping telling that two or more languages in different streams
>>>         are desired together with the same language content but
>>>         different modalities ( such as the use for captioned telephony
>>>         with the same content provided in both speech and text, or sign
>>>         language interpretation where you see the interpreter, or
>>>         possibly spoken language interpretation with the languages
>>>         provided in different audio streams ). I hope that that draft
>>>         can be progressed. I see it as a needed complement to the pure
>>>         language indications per media.
>>>
>>>
>>>     Oh, sorry. I did read that draft but forgot about it.
>>>
>>>         The discussion in this thread is more about how an application
>>>         would easily know that e.g. "ase" is a sign language and "en" is
>>>         a spoken (or written) language, and also a discussion about what
>>>         kinds of languages are allowed and indicated by default in each
>>>         media type. It was not at all about falsely using language tags
>>>         in the wrong media type as Bernard understood my wording. It was
>>>         rather a limitation to what modalities are used in each media
>>>         type and how to know the modality with cases that are not
>>>         evident, e.g. "application" and "message" media types.
>>>
>>>
>>>     What do you mean by "know"? Is it for the *UA* software to know, or
>>>     for the human user of the UA to know? Presumably a human user that
>>>     cares will understand this if presented with the information in some
>>>     way. But typically this isn't presented to the user.
>>>
>>>     For the software to know must mean that it will behave differently
>>>     for a tag that represents a sign language than for one that
>>>     represents a spoken or written language. What is it that it will do
>>>     differently?
>>>
>>>              Thanks,
>>>              Paul
>>>
>>>
>>>         Right now we have returned to a very simple rule: we define only
>>>         use of spoken language in audio media, written language in text
>>>         media and sign language in video media.
>>>         We have discussed other use, such as a view of a speaking person
>>>         in video, text overlay on video, a sign language notation in
>>>         text media, written language in message media, written language
>>>         in WebRTC data channels, sign written and spoken in bucket media
>>>         maybe declared as application media. We do not define these
>>>         cases. They are just not defined, not forbidden. They may be
>>>         defined in the future.
>>>
>>>         My proposed wording in section 5.4 got too many
>>>         misunderstandings so I gave up with it. I think we can live with
>>>         5.4 as it is in version -16.
>>>
>>>         Thanks,
>>>         Gunnar
>>>
>>>
>>>
>>>             (IIRC I suggested something along these lines a long time 
>>> ago.)
>>>
>>>                  Thanks,
>>>                  Paul
>>>
>>>             _______________________________________________
>>>             SLIM mailing list
>>>             SLIM@ietf.org <mailto:SLIM@ietf.org>
>>>             https://www.ietf.org/mailman/listinfo/slim
>>>             <https://www.ietf.org/mailman/listinfo/slim>
>>>
>>>
>>>
>>>     _______________________________________________
>>>     SLIM mailing list
>>>     SLIM@ietf.org <mailto:SLIM@ietf.org>
>>>     https://www.ietf.org/mailman/listinfo/slim
>>>     <https://www.ietf.org/mailman/listinfo/slim>
>>>
>>>
>>
>