Re: [Slim] Issue 43: How to know the modality of a language indication?

Gunnar Hellström <gunnar.hellstrom@omnitor.se> Tue, 24 October 2017 14:51 UTC

Return-Path: <gunnar.hellstrom@omnitor.se>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AFFB0138BDB for <slim@ietfa.amsl.com>; Tue, 24 Oct 2017 07:51:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y_9Fpd8pJWl9 for <slim@ietfa.amsl.com>; Tue, 24 Oct 2017 07:51:51 -0700 (PDT)
Received: from bin-vsp-out-02.atm.binero.net (bin-mail-out-05.binero.net [195.74.38.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AD8CA13F4BE for <slim@ietf.org>; Tue, 24 Oct 2017 07:51:43 -0700 (PDT)
X-Halon-ID: c8c4425a-b8ca-11e7-99c7-005056917f90
Authorized-sender: gunnar.hellstrom@omnitor.se
Received: from [192.168.2.136] (unknown [87.96.178.34]) by bin-vsp-out-02.atm.binero.net (Halon) with ESMTPSA id c8c4425a-b8ca-11e7-99c7-005056917f90; Tue, 24 Oct 2017 16:51:15 +0200 (CEST)
To: Bernard Aboba <bernard.aboba@gmail.com>
Cc: slim@ietf.org, Paul Kyzivat <pkyzivat@alum.mit.edu>
References: <CAOW+2dtSOgp3JeiSVAttP+t0ZZ-k3oJK++TS71Xn7sCOzMZNVQ@mail.gmail.com> <p06240606d607257c9584@172.20.60.54> <fb9e6b79-7bdd-9933-e72e-a47bc8c93b58@omnitor.se> <CAOW+2dtteOadptCT=yvfmk01z-+USfE4a7JO1+u_fkTp72ygNA@mail.gmail.com> <da5cfaea-75f8-3fe1-7483-d77042bd9708@alum.mit.edu> <b2611e82-2133-0e77-b72b-ef709b1bba3c@omnitor.se> <1b0380ef-b57d-3cc7-c649-5351dc61f878@alum.mit.edu> <CAOW+2dtVE5BDmD2qy_g-asXvxntif4fVC8LYO4j7QLQ5Kq2E+g@mail.gmail.com> <3fc6d055-08a0-2bdb-f6e9-99b94efc49df@alum.mit.edu> <84fb37bd-5c7a-90ea-81fd-d315faabfd96@omnitor.se> <CAOW+2dvPSUGA_7tye+KqR1TGs1kYL43TdxBCDOHVEmWOFHud0Q@mail.gmail.com> <49cb3e25-6d65-1773-2803-dc667cd5890c@omnitor.se> <7d20fee8-fcb0-1f50-049b-82f0c2491f50@omnitor.se> <65f7d728-10b0-b8f9-3d82-8de13c5e7c67@omnitor.se> <CAOW+2dv-Pob1DPVXDe81hyeM8k7hEpT-9BaRte706_J+Snv60g@mail.gmail.com>
From: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
Message-ID: <440b9987-44fb-0952-90ec-16d61b1961f8@omnitor.se>
Date: Tue, 24 Oct 2017 16:51:36 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <CAOW+2dv-Pob1DPVXDe81hyeM8k7hEpT-9BaRte706_J+Snv60g@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------05FF34FA2EE7BBD75AE804E2"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/A4b6Wpgh0Z0zpXKqpwF9bfdW35g>
Subject: Re: [Slim] Issue 43: How to know the modality of a language indication?
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Oct 2017 14:51:58 -0000

Bernard and all,

Yes, I agree with your reasoning. I also understand how we ended up in a 
limiting section 5.4. We were mainly thinking of a globally 
interoperable multimedia communication system, and wanted high chance 
for language and modality match between callers and callees. Limiting 
the choices then seems to increase the opportunities for match. But the 
RFCs can be used in much smaller application areas, where it can be of 
value to differentiate between some less common combinations of media 
and languages that by the current wording of 5.4 maybe would be 
discouraged from using the mechanism, while it instead could do it by an 
internal agreement of which media - language combinations are relevant 
and possibly what extra information will be used to distinguish between 
otherwise ambiguous combinations. (as a tag for written or spoken 
language in video that may need further information to be decided.)

So, yet another proposal for Issue #43 and section 5.4, much more 
informative and less restrictive than the previous version.

-----Old text----


      5.4 Undefined Combinations



    The behavior when specifying a non-signed language tag for a video
    media stream, or a signed language tag for an audio or text media
    stream, is not defined in this document.

    The problem of knowing which language tags are signed and which are
    not is out of scope of this document.

-----New text------------
5.4 Media, Language and Modality indications

The combination of Language tags and other information in the media 
descriptions should be composed so that the intended modality can be 
concluded by the negotiating parties. For general use, with best 
opportunity for finding matching languages, it is recommended to use the 
most apparent combinations of language tags and media: sign language 
tags in video media, spoken language tags for audio media and written 
language tags for text media. The examples in this specification are all 
from this set of three obvious language/media/modality combinations.

The following explains some factors in combining language tags, media 
types and other media description information to identify intended 
language modalities.

A specific sign language can be identified by its existence in the IANA 
registry of language subtags according to BCP 47 [RFC5646] , and finding 
that the language subtag is found at least in two entries in the 
registry, once with the Type field "language" and once with the Type 
field "extlang" combined with the Prefix field value "sgn".

A generic identification of sign language competence or preference, 
without specifying exactly which sign language, can be indicated by use 
of the value "sgn" in the language tag of the corresponding "hlang" 
attribute.

Sign language communication in the usually used visual modality is most 
often conveyed in a "video" media stream. Application specific use may 
appear in other media, such as "message" and "application". Certain 
textual notation modalities of sign language may appear in the "text" 
media stream.

A specific spoken or written language can be identified by finding that 
the language subtag exsists in the IANA registry of language subtags 
according to  BCP 47 [RFC5646] with the Type field "language" and no 
entry for the language subtag exists with the value "sgn" in the Prefix 
field of any entry with type field value "extlang".  The spoken modality 
is usually conveyed in an "audio" media stream. The written modality in 
real-time is usually conveyed in a "text" media stream.

Use of a language subtag for a written or spoken language in other media 
streams than  "text" or "audio" require further indications or 
application agreements for identification of the modality. A number of 
such further indications are available and new ones may be added by 
further work. Use of written modality in another media stream than 
"text", may be discriminated by use of a script subtag in the language 
tag, where that is appropriate. Use for sending of a visual view of a 
speaking person may be indicated by the value "speaker" in an SDP 
Content attribute according to RFC 4796 [RFC4796] in a "video" media 
stream or another media carrying video (e.g. "message" or "application").

Use of written modality in another media stream than "text", may, for 
cases when the script subtag is suppressed, be discriminated by use of 
any other appropriate notation or application agreement. An appropriate 
notation may be use of a media subtype specific to the intended modality.
-------------------------------------------------------------------End 
of new text---------------------------------------

Gunnar








Den 2017-10-24 kl. 03:51, skrev Bernard Aboba:
> Thanks for suggesting a way forward, Gunnar. I too would like to get 
> Issue 43 resolved so we can move forward in the process.
> Please send your thoughts to the mailing list (preferrably before the 
> October 30 submission deadline so we can spin a new draft version).
>
> In thinking about the issue, a question Paul asked has stuck in my 
> mind:  What difference does it/should it make?
>
> Let us presume that the user agents are configured to signal the 
> language preferences.
>
> In a pure peer-to-peer case (me calling you, no intermediaries), I 
> configure my UA to indicate a preference for Mandarin on the text 
> modality, French sign language on the audio modality and Swahili on 
> the video modality.
>
> You (knowing my background and weird sense of humour) after being 
> notified of my preference, realize I am kidding and agree to accept 
> the call, knowing that whatever language preference you indicate, we 
> will most likely communicate in English since I do not speak Mandarin 
> or Swahili and am barely conversant in French, let along French sign 
> language.
>
> Would this scenario have worked out better with rules that mandated 
> that my odd choices for audio and video languages be labelled 
> "undefined"? I think not.
>
> In a scenario where the call is between me and a call center (not a 
> PSAP) my flippant UA configuration might result in the call being 
> rejected due to a lack of Mandarin, Swahili or French sign language 
> resources within the call center.  But it's not clear that labelling 
> my odd choices as "undefined" should play a role in that decision.  
> For example, if the call center did have someone who spoke Swahili, 
> and connected me to them (perhaps under the theory that my declared 
> preference might indicate an ability to lip-read Swahili), this might 
> have improved the chance of communication had my UA configuration been 
> based on genuine expertise rather than a warped sense of humour.
>
> In other words,it is not clear to me how Section 5.4's discussion of 
> scope improves or clarifies the situation in any way - and there is 
> some possibility that it could cause problems.
>
> On Mon, Oct 23, 2017 at 2:17 PM, Gunnar Hellström 
> <gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>> wrote:
>
>     Issue #43 is the only issue we have left now. I do not want to see
>     the discussion stop again until we have a solution on it that
>     seems acceptable.
>
>     Section 5.4 seems to be a good place to handle Issue #43.
>
>     Currently, section 5.4 is more aimed at limiting what kind of
>     coding for languages and modalities are acceptable.
>
>     Some viewpoints said that such limitations are not needed and that
>     5.4 can be deleted.
>
>     I think we can do something in between these extremes. We can
>     introduce explanations for what is required from an acceptable
>     coding of a combination of media, languages, directions, and other
>     parameters and explain basic ways to assess what is the resulting
>     modality, and also explain that the more common media and language
>     combinations that are used, the higher chance there is for a
>     match. Thus unusual combinations are discouraged but not forbidden
>     as long as the modality can be assessed from them. They can be
>     used in specific applications.
>
>     I might continue tomorrow with wording proposal for the reasoning
>     above, hoping that we can close issue #43 and the discussions
>     around 5.4 soon.
>
>     /Gunnar
>
>
>     Den 2017-10-17 kl. 11:02, skrev Gunnar Hellström:
>>
>>     An even more general way to express what section 5.4 tries to say is:
>>
>>     ------------------------------------------------------------------------------------------------------------------------------------------
>>
>>     5.4 Combinations of Language tags and Media descriptions
>>
>>     The combination of Language tags and other information in the
>>     media descriptions should be made so that the intended modality
>>     can be concluded by the negotiating parties.
>>
>>
>>     ----------------------------------------------------------------------------------------
>>
>>     That makes us not need to investigate what is possible today, and
>>     what further attributes or coding rules may be added in the future.
>>
>>     We have a risk that implementers start using some insufficient
>>     coding, that can cause interop issues. But instead we do not need
>>     to limit valid use that we just have not thought about by saying
>>     that specific combinations are out of scope or not defined. It is
>>     up to implementers to check that the combinations they use result
>>     in unambiguous modality.
>>
>>     And it opens for use of possible new attributes e.g. 
>>     a=modality:spoken  or a=modality:written etc, to complement the
>>     undefined case when a non-signed language tag without script
>>     subtag is used in video media, and also for explaining any use of
>>     m=application or m=message media in interactive communication.
>>
>>     It does not really answer Issue #43 by explaining HOW to assess
>>     the modality easily, but it requires the implementers to make
>>     sure that it is possible.
>>
>>     And deducting the intended modality is the key to successful
>>     neotiation and communication.
>>
>>     Do you think this would be clear enough, or do we need to go into
>>     what clear cases we have?
>>
>>     Gunnar
>>
>>
>>
>>     Den 2017-10-17 kl. 00:21, skrev Gunnar Hellström:
>>>     Den 2017-10-16 kl. 01:21, skrev Bernard Aboba:
>>>>     Paul said:
>>>>
>>>>     ""- can the UA use this information to change how to render the
>>>>     media?"
>>>>
>>>>     [BA] If the video is used for signing, an application might
>>>>     infer an encoder preference for frame rate over resolution
>>>>     (e.g. in WebRTC, RTCRtpParameters.degradationPreference =
>>>>     "maintain-framerate" )
>>>     <GH>Right, that is a valid example of how real "knowledge" of
>>>     the modality can be used by the application.
>>>
>>>
>>>     And, as a response on issue #43,
>>>
>>>     A simple way is to say
>>>
>>>     Video media descriptions shall only contain sign language tags
>>>     Audio media descriptions shall only contain language tags for
>>>     spoken language
>>>     Text media descriptions shall only contain language tags for
>>>     written language
>>>     Use of other media descriptions such as message and application
>>>     with language indications require other specifications on how to
>>>     assess the modality for non-signed languages.
>>>
>>>     The current 5.4 does not mention our main problem with the
>>>     language tags, that there is no difference on them if we mean
>>>     use for spoken language or written language. We should have made
>>>     better efforts to solve that problem long ago, but we have not.
>>>
>>>     5.4 can be modified to specify the simple limited case and the
>>>     problems that block us from specifying other cases:
>>>
>>>
>>>           5.4
>>>           <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4>.
>>>           Media and modality Combination problems
>>>
>>>
>>>
>>>
>>>         The problem of indicating a language tag for the view of a speaking person in a video stream is out of scope for this document.
>>>
>>>         The problem of indicating a language tag for use of written language coded as a component in a video stream is out of scope for this document.
>>>
>>>         The use of language tags for negotiation of languages in other media than audio, video and text is not defined in this document.
>>>
>>>         The problem of knowing which language tags are signed and which are not can be deducted
>>>         from the IANA language tag registry. How this is done is out of scope of this document.
>>>
>>>     --------------------------------------------------------
>>>
>>>
>>>     But if we want to allow more cases, we need to consider the
>>>     following complications:
>>>
>>>
>>>     1. to assess if a language represents a Sign Language, the
>>>     application can look for the word "sign" in the description in
>>>     the  IANA language registry or a copy thereof as Randall already
>>>     indicated.
>>>
>>>     2. For written languages used as a text component in a video
>>>     stream, it is possible to code this for languages requiring a
>>>     script subtag, but not for languages with suppressed script subtags
>>>
>>>     3. We have also discussed proposals for how to code written
>>>     language in video stream for languages not requiring a script
>>>     subtag, but not got acceptance for our proposals. So we need to
>>>     say that that is currently undefined.
>>>
>>>     4. We also discussed how to code a view of a speaking person in
>>>     video and said that that could be done by using the
>>>     "definitively not written" script subtag on a non-signed
>>>     language tag in video. But that was not appreciated by the
>>>     language experts. Another option was to not allow written
>>>     language overlayed on video, and that is the lately used option.
>>>     ( up to version -16 or so)
>>>
>>>     5. For talking and hearing audio media, we only have that case
>>>     for language-tags in Audio. So that is easy to code and assess.
>>>
>>>     6. For written language in text media, a check can be made about
>>>     if "sign" is part of the language tag description, and if not,
>>>     it is a written language.
>>>
>>>     7. For signed language in text media, a check can be made about
>>>     if "sign" is part of the language tag description, and if it is,
>>>     it is a signed language in text notation. (extremely unusual)
>>>
>>>     8. For use with language tags in other media than audio, video
>>>     and text, there is a need for a description on how to assess the
>>>     modality, especially for non-signed languages before it is used.
>>>
>>>
>>>     We can construct a section 5.4 to describe this situation, but I
>>>     doubt that it is worth the effort.
>>>
>>>
>>>>
>>>>     See:
>>>>     https://rawgit.com/w3c/webrtc-pc/master/webrtc.html#dom-rtcrtpparameters-degradationpreference
>>>>     <https://rawgit.com/w3c/webrtc-pc/master/webrtc.html#dom-rtcrtpparameters-degradationpreference>
>>>>
>>>>     On Sun, Oct 15, 2017 at 2:22 PM, Gunnar Hellström
>>>>     <gunnar.hellstrom@omnitor.se
>>>>     <mailto:gunnar.hellstrom@omnitor.se>> wrote:
>>>>
>>>>         Den 2017-10-15 kl. 21:27, skrev Paul Kyzivat:
>>>>
>>>>             On 10/15/17 1:49 PM, Bernard Aboba wrote:
>>>>
>>>>                 Paul said:
>>>>
>>>>                 "For the software to know must mean that it will
>>>>                 behave differently for a tag that represents a sign
>>>>                 language than for one that represents a spoken or
>>>>                 written language. What is it that it will do
>>>>                 differently?"
>>>>
>>>>                 [BA] In terms of behavior based on the
>>>>                 signed/non-signed distinction, in -17 the only
>>>>                 reference appears to be in Section 5.4, stating
>>>>                 that certain combinations are not defined in the
>>>>                 document (but that definition of those combinations
>>>>                 was out of scope):
>>>>
>>>>
>>>>             I'm asking whether this is a distinction without a
>>>>             difference. I'm not asking whether this makes a
>>>>             difference in the *protocol*, but whether in the end it
>>>>             benefits the participants in the call in any way.
>>>>
>>>>         <GH>Good point, I was on my way to make a similar comment
>>>>         earlier today. The difference it makes for applications to
>>>>         "know" what modality a language tag represents in its used
>>>>         position seems to be only for imagined functions that are
>>>>         out of scope for the protocol specification.
>>>>
>>>>             For instance:
>>>>
>>>>             - does it help the UA to decide how to alert the
>>>>             callee, so that the
>>>>               callee can better decide whether to accept the call
>>>>             or instruct the
>>>>               UA about how to handle the call?
>>>>
>>>>         <GH>Yes, for a regular human user -to-user call, the result
>>>>         of the negotiation must be presented to the participants,
>>>>         so that they can start the call with a language and
>>>>         modality that is agreed.
>>>>         That presentation could be exactly the description from the
>>>>         language tag registry, and then no "knowledge" is needed
>>>>         from the application. But it is more likely that the
>>>>         application has its own string for presentation of the
>>>>         negotiated language and modality. So that will be
>>>>         presented. But it is still found by a table lookup between
>>>>         language tag and string for a language name, so no real
>>>>         knowledge is needed.
>>>>         We have said many times that the way the application tells
>>>>         the user the result of the negotiation is out of scope for
>>>>         the draft, but it is good to discuss and know that it can
>>>>         be done.
>>>>         A similar mechanism is also needed for configuration of the
>>>>         user's language preference profile further discussed below.
>>>>
>>>>
>>>>             - does it allow the UA to make a decision whether to
>>>>             accept the media?
>>>>
>>>>         <GH>No, the media should be accepted regardless of the
>>>>         result of the language negotiation.
>>>>
>>>>
>>>>             - can the UA use this information to change how to
>>>>             render the media?
>>>>
>>>>         <GH>Yes, for the specialized text notation of sign language
>>>>         we have discussed but currently placed out of scope, a very
>>>>         special rendering application is needed. The modality would
>>>>         be recognized by a script subtag to a sign language tag
>>>>         used in text media. However, I think that would be best to
>>>>         also use it with a specific text subtype, so that the
>>>>         rendering can be controlled by invocation of a "codec" for
>>>>         that rendering.
>>>>
>>>>
>>>>             And if there is something like this, will the UA be
>>>>             able to do this generically based on whether the media
>>>>             is sign language or not, or will the UA need to already
>>>>             understand *specific* sign language tags?
>>>>
>>>>         <GH>Applications will need to have localized versions of
>>>>         the names for the different sign languages and also for
>>>>         spoken languages and written languages, to be used in
>>>>         setting of preferences and announcing the results of the
>>>>         negotiation. It might be overkill to have such localized
>>>>         names for all languages in the IANA language registry, so
>>>>         it will need to be able to handle localized names of a
>>>>         subset och the registry. With good design however, this is
>>>>         just an automatic translation between a language tag and a
>>>>         corresponding name, so it does in fact not require any
>>>>         "knowledge" of what modality is used with each language tag.
>>>>         The application can ask for the configuration:
>>>>         "Which languages do you want to offer to send in video"
>>>>         "Which languages do you want to offer to send in text"
>>>>         "Which languages do you want to offer to send in audio"
>>>>         "Which languages do you want to be prepared to receive in
>>>>         video"
>>>>         "Which languages do you want to be prepared to receive in text"
>>>>         "Which languages do you want to be prepared to receive in
>>>>         audio"
>>>>
>>>>         And for each question provide a list of language names to
>>>>         select from. When the selection is made, the corresponding
>>>>         language tag is placed in the profile for negotiation.
>>>>
>>>>         If the application provides the whole IANA language
>>>>         registry to the user for each question, then there is a
>>>>         possibility that the user by mistake selects a language
>>>>         that requires another modality than the question was about.
>>>>         If the application shall limit the lists provided for each
>>>>         question, then it will need a kind of knowledge about which
>>>>         language tags suit each modality (and media)
>>>>
>>>>
>>>>
>>>>             E.g., A UA serving a deaf person might automatically
>>>>             introduce a sign language interpreter into an incoming
>>>>             audio-only call. If the incoming call has both audio
>>>>             and video then the video *might* be for conveying sign
>>>>             language, or not. If not then the UA will still want to
>>>>             bring in a sign language interpreter. But is knowing
>>>>             the call generically contains sign language sufficient
>>>>             to decide against bringing in an interpreter? Or must
>>>>             that depend on it being a sign language that the user
>>>>             can use? If the UA is configured for all the specific
>>>>             sign languages that the user can deal with then there
>>>>             is no need to recognize other sign languages generically.
>>>>
>>>>         <GH>We are talking about specific language tags here and
>>>>         knowing what modality they are used for. The user needs to
>>>>         specify which sign languages they prefer to use. The callee
>>>>         application can be made to look for gaps between what the
>>>>         caller offers and what the callee can accept, and from that
>>>>         deduct which type and languages for a conversion that is
>>>>         needed, and invoke that as a relay service. That invocation
>>>>         can be made completely table driven and have corresponding
>>>>         translation profiles for available relay services. But it
>>>>         is more likely that it is done by having some knowledge
>>>>         about which languages are sign languages and which are
>>>>         spoken languages and sending the call to the relay service
>>>>         to try to sort out if they can handle the translation.
>>>>
>>>>
>>>>
>>>>         So, the answer is - no, the application does not really
>>>>         have any knowledge about which modality a language tag
>>>>         represents in its used position. If the user selects to
>>>>         indicate very rare language tag indications for a media,
>>>>         then a match will just become very unlikely.
>>>>
>>>>         Where does this discussion take us? Should we modify
>>>>         section 5.4 again?
>>>>
>>>>         Thanks
>>>>         Gunnar
>>>>
>>>>                 Thanks,
>>>>                 Paul
>>>>
>>>>                       5.4
>>>>                 <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4
>>>>                 <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4>>.
>>>>                       Undefined Combinations
>>>>
>>>>
>>>>
>>>>                     The behavior when specifying a non-signed
>>>>                 language tag for a video
>>>>                     media stream, or a signed language tag for an
>>>>                 audio or text media
>>>>                     stream, is not defined in this document.
>>>>
>>>>                     The problem of knowing which language tags are
>>>>                 signed and which are
>>>>                     not is out of scope of this document.
>>>>
>>>>
>>>>
>>>>                 On Sun, Oct 15, 2017 at 10:13 AM, Paul Kyzivat
>>>>                 <pkyzivat@alum.mit.edu
>>>>                 <mailto:pkyzivat@alum.mit.edu>
>>>>                 <mailto:pkyzivat@alum.mit.edu
>>>>                 <mailto:pkyzivat@alum.mit.edu>>> wrote:
>>>>
>>>>                     On 10/15/17 2:24 AM, Gunnar Hellström wrote:
>>>>
>>>>                         Paul,
>>>>                         Den 2017-10-15 kl. 01:19, skrev Paul Kyzivat:
>>>>
>>>>                             On 10/14/17 2:03 PM, Bernard Aboba wrote:
>>>>
>>>>                                 Gunnar said:
>>>>
>>>>                                 "Applications not implementing such
>>>>                 specific notations
>>>>                                 may use the following simple
>>>>                 deductions.
>>>>
>>>>                                 - A language tag in audio media is
>>>>                 supposed to indicate
>>>>                                 spoken modality.
>>>>
>>>>                                 [BA] Even a tag with "Sign
>>>>                 Language" in the description??
>>>>
>>>>                                 - A language tag in text media is
>>>>                 supposed to indicate                 written modality.
>>>>
>>>>                                 [BA] If the tag has "Sign Language"
>>>>                 in the description,
>>>>                                 can this document really say that?
>>>>
>>>>                                 - A language tag in video media is
>>>>                 supposed to indicate
>>>>                                 visual sign language modality
>>>>                 except for the case when
>>>>                                 it is supposed to indicate a view
>>>>                 of a speaking person
>>>>                                 mentioned in section 5.2
>>>>                 characterized by the exact same
>>>>                                 language tag also appearing in an
>>>>                 audio media specification.
>>>>
>>>>                                 [BA] It seems like an over-reach to
>>>>                 say that a spoken
>>>>                                 language tag in video media should
>>>>                 instead be
>>>>                                 interpreted as a request for Sign
>>>>                 Language.  If this
>>>>                                 were done, would it always be clear
>>>>                 which Sign Language
>>>>                                 was intended?  And could we really
>>>>                 assume that both
>>>>                                 sides, if negotiating a spoken
>>>>                 language tag in video
>>>>                                 media, were really indicating the
>>>>                 desire to sign?  It
>>>>                                 seems like this could easily result
>>>>                 interoperability
>>>>                                 failure.
>>>>
>>>>
>>>>                             IMO the right way to indicate that two
>>>>                 (or more) media
>>>>                             streams are conveying alternative
>>>>                 representations of the
>>>>                             same language content is by grouping
>>>>                 them with a new
>>>>                             grouping attribute. That can tie
>>>>                 together an audio with a
>>>>                             video and/or text. A language tag for
>>>>                 sign language on the
>>>>                             video stream then clarifies to the
>>>>                 recipient that it is sign
>>>>                             language. The grouping attribute by
>>>>                 itself can indicate that
>>>>                             these streams are conveying language.
>>>>
>>>>                         <GH>Yes, and that is proposed in
>>>>                 draft-hellstrom-slim-modality-grouping with two
>>>>                 kinds of
>>>>                         grouping: One kind of grouping to tell that
>>>>                 two or more
>>>>                         languages in different streams are
>>>>                 alternatives with the same
>>>>                         content and a priority order is assigned to
>>>>                 them to guide the
>>>>                         selection of which one to use during the
>>>>                 call. The other kind of
>>>>                         grouping telling that two or more languages
>>>>                 in different streams
>>>>                         are desired together with the same language
>>>>                 content but
>>>>                         different modalities ( such as the use for
>>>>                 captioned telephony
>>>>                         with the same content provided in both
>>>>                 speech and text, or sign
>>>>                         language interpretation where you see the
>>>>                 interpreter, or
>>>>                         possibly spoken language interpretation
>>>>                 with the languages
>>>>                         provided in different audio streams ). I
>>>>                 hope that that draft
>>>>                         can be progressed. I see it as a needed
>>>>                 complement to the pure
>>>>                         language indications per media.
>>>>
>>>>
>>>>                     Oh, sorry. I did read that draft but forgot
>>>>                 about it.
>>>>
>>>>                         The discussion in this thread is more about
>>>>                 how an application
>>>>                         would easily know that e.g. "ase" is a sign
>>>>                 language and "en" is
>>>>                         a spoken (or written) language, and also a
>>>>                 discussion about what
>>>>                         kinds of languages are allowed and
>>>>                 indicated by default in each
>>>>                         media type. It was not at all about falsely
>>>>                 using language tags
>>>>                         in the wrong media type as Bernard
>>>>                 understood my wording. It was
>>>>                         rather a limitation to what modalities are
>>>>                 used in each media
>>>>                         type and how to know the modality with
>>>>                 cases that are not
>>>>                         evident, e.g. "application" and "message"
>>>>                 media types.
>>>>
>>>>
>>>>                     What do you mean by "know"? Is it for the *UA*
>>>>                 software to know, or
>>>>                     for the human user of the UA to know?
>>>>                 Presumably a human user that
>>>>                     cares will understand this if presented with
>>>>                 the information in some
>>>>                     way. But typically this isn't presented to the
>>>>                 user.
>>>>
>>>>                     For the software to know must mean that it will
>>>>                 behave differently
>>>>                     for a tag that represents a sign language than
>>>>                 for one that
>>>>                     represents a spoken or written language. What
>>>>                 is it that it will do
>>>>                     differently?
>>>>
>>>>                              Thanks,
>>>>                              Paul
>>>>
>>>>
>>>>                         Right now we have returned to a very simple
>>>>                 rule: we define only
>>>>                         use of spoken language in audio media,
>>>>                 written language in text
>>>>                         media and sign language in video media.
>>>>                         We have discussed other use, such as a view
>>>>                 of a speaking person
>>>>                         in video, text overlay on video, a sign
>>>>                 language notation in
>>>>                         text media, written language in message
>>>>                 media, written language
>>>>                         in WebRTC data channels, sign written and
>>>>                 spoken in bucket media
>>>>                         maybe declared as application media. We do
>>>>                 not define these
>>>>                         cases. They are just not defined, not
>>>>                 forbidden. They may be
>>>>                         defined in the future.
>>>>
>>>>                         My proposed wording in section 5.4 got too many
>>>>                         misunderstandings so I gave up with it. I
>>>>                 think we can live with
>>>>                         5.4 as it is in version -16.
>>>>
>>>>                         Thanks,
>>>>                         Gunnar
>>>>
>>>>
>>>>
>>>>                             (IIRC I suggested something along these
>>>>                 lines a long time ago.)
>>>>
>>>>                                  Thanks,
>>>>                                  Paul
>>>>
>>>>                 _______________________________________________
>>>>                             SLIM mailing list
>>>>                 SLIM@ietf.org <mailto:SLIM@ietf.org>
>>>>                 <mailto:SLIM@ietf.org <mailto:SLIM@ietf.org>>
>>>>                 https://www.ietf.org/mailman/listinfo/slim
>>>>                 <https://www.ietf.org/mailman/listinfo/slim>
>>>>                            
>>>>                 <https://www.ietf.org/mailman/listinfo/slim
>>>>                 <https://www.ietf.org/mailman/listinfo/slim>>
>>>>
>>>>
>>>>
>>>>                     _______________________________________________
>>>>                     SLIM mailing list
>>>>                 SLIM@ietf.org <mailto:SLIM@ietf.org>
>>>>                 <mailto:SLIM@ietf.org <mailto:SLIM@ietf.org>>
>>>>                 https://www.ietf.org/mailman/listinfo/slim
>>>>                 <https://www.ietf.org/mailman/listinfo/slim>
>>>>                     <https://www.ietf.org/mailman/listinfo/slim
>>>>                 <https://www.ietf.org/mailman/listinfo/slim>>
>>>>
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         -----------------------------------------
>>>>         Gunnar Hellström
>>>>         Omnitor
>>>>         gunnar.hellstrom@omnitor.se
>>>>         <mailto:gunnar.hellstrom@omnitor.se>
>>>>         +46 708 204 288 <tel:%2B46%20708%20204%20288>
>>>>
>>>>
>>>
>>>     -- 
>>>     -----------------------------------------
>>>     Gunnar Hellström
>>>     Omnitor
>>>     gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>
>>>     +46 708 204 288
>>
>>     -- 
>>     -----------------------------------------
>>     Gunnar Hellström
>>     Omnitor
>>     gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>
>>     +46 708 204 288
>>
>>
>>     _______________________________________________
>>     SLIM mailing list
>>     SLIM@ietf.org <mailto:SLIM@ietf.org>
>>     https://www.ietf.org/mailman/listinfo/slim
>>     <https://www.ietf.org/mailman/listinfo/slim>
>
>     -- 
>     -----------------------------------------
>     Gunnar Hellström
>     Omnitor
>     gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>
>     +46 708 204 288
>
>

-- 
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288