Re: [Slim] Issue 43: How to know the modality of a language indication?

Gunnar Hellström <gunnar.hellstrom@omnitor.se> Tue, 17 October 2017 09:03 UTC

Return-Path: <gunnar.hellstrom@omnitor.se>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CB0BB1332D7 for <slim@ietfa.amsl.com>; Tue, 17 Oct 2017 02:03:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XTqh6dSD35wH for <slim@ietfa.amsl.com>; Tue, 17 Oct 2017 02:03:09 -0700 (PDT)
Received: from bin-vsp-out-03.atm.binero.net (bin-mail-out-05.binero.net [195.74.38.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D22D113331E for <slim@ietf.org>; Tue, 17 Oct 2017 02:03:04 -0700 (PDT)
X-Halon-ID: f5a29f73-b319-11e7-83a9-0050569116f7
Authorized-sender: gunnar.hellstrom@omnitor.se
Received: from [192.168.2.136] (unknown [87.96.178.34]) by bin-vsp-out-03.atm.binero.net (Halon) with ESMTPSA id f5a29f73-b319-11e7-83a9-0050569116f7; Tue, 17 Oct 2017 11:02:54 +0200 (CEST)
From: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
To: Bernard Aboba <bernard.aboba@gmail.com>
Cc: Paul Kyzivat <pkyzivat@alum.mit.edu>, slim@ietf.org
References: <CAOW+2dtSOgp3JeiSVAttP+t0ZZ-k3oJK++TS71Xn7sCOzMZNVQ@mail.gmail.com> <p06240606d607257c9584@172.20.60.54> <fb9e6b79-7bdd-9933-e72e-a47bc8c93b58@omnitor.se> <CAOW+2dtteOadptCT=yvfmk01z-+USfE4a7JO1+u_fkTp72ygNA@mail.gmail.com> <da5cfaea-75f8-3fe1-7483-d77042bd9708@alum.mit.edu> <b2611e82-2133-0e77-b72b-ef709b1bba3c@omnitor.se> <1b0380ef-b57d-3cc7-c649-5351dc61f878@alum.mit.edu> <CAOW+2dtVE5BDmD2qy_g-asXvxntif4fVC8LYO4j7QLQ5Kq2E+g@mail.gmail.com> <3fc6d055-08a0-2bdb-f6e9-99b94efc49df@alum.mit.edu> <84fb37bd-5c7a-90ea-81fd-d315faabfd96@omnitor.se> <CAOW+2dvPSUGA_7tye+KqR1TGs1kYL43TdxBCDOHVEmWOFHud0Q@mail.gmail.com> <49cb3e25-6d65-1773-2803-dc667cd5890c@omnitor.se>
Message-ID: <7d20fee8-fcb0-1f50-049b-82f0c2491f50@omnitor.se>
Date: Tue, 17 Oct 2017 11:02:51 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <49cb3e25-6d65-1773-2803-dc667cd5890c@omnitor.se>
Content-Type: multipart/alternative; boundary="------------AFBAE38C0049BF1C725DFCC7"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/kxN_VHGMMjqRakYKwxxP2fDZ5gc>
Subject: Re: [Slim] Issue 43: How to know the modality of a language indication?
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Oct 2017 09:03:14 -0000

An even more general way to express what section 5.4 tries to say is:

------------------------------------------------------------------------------------------------------------------------------------------

5.4 Combinations of Language tags and Media descriptions

The combination of Language tags and other information in the media 
descriptions should be made so that the intended modality can be 
concluded by the negotiating parties.


----------------------------------------------------------------------------------------

That makes us not need to investigate what is possible today, and what 
further attributes or coding rules may be added in the future.

We have a risk that implementers start using some insufficient coding, 
that can cause interop issues. But instead we do not need to limit valid 
use that we just have not thought about by saying that specific 
combinations are out of scope or not defined. It is up to implementers 
to check that the combinations they use result in unambiguous modality.

And it opens for use of possible new attributes e.g. a=modality:spoken  
or a=modality:written  etc, to complement the undefined case when a 
non-signed language tag without script subtag is used in video media, 
and also for explaining any use of m=application or m=message media in 
interactive communication.

It does not really answer Issue #43 by explaining HOW to assess the 
modality easily, but it requires the implementers to make sure that it 
is possible.

And deducting the intended modality is the key to successful neotiation 
and communication.

Do you think this would be clear enough, or do we need to go into what 
clear cases we have?

Gunnar



Den 2017-10-17 kl. 00:21, skrev Gunnar Hellström:
> Den 2017-10-16 kl. 01:21, skrev Bernard Aboba:
>> Paul said:
>>
>> ""- can the UA use this information to change how to render the media?"
>>
>> [BA] If the video is used for signing, an application might infer an 
>> encoder preference for frame rate over resolution (e.g. in WebRTC, 
>> RTCRtpParameters.degradationPreference = "maintain-framerate" )
> <GH>Right, that is a valid example of how real "knowledge" of the 
> modality can be used by the application.
>
>
> And, as a response on issue #43,
>
> A simple way is to say
>
> Video media descriptions shall only contain sign language tags
> Audio media descriptions shall only contain language tags for spoken 
> language
> Text media descriptions shall only contain language tags for written 
> language
> Use of other media descriptions such as message and application with 
> language indications require other specifications on how to assess the 
> modality for non-signed languages.
>
> The current 5.4 does not mention our main problem with the language 
> tags, that there is no difference on them if we mean use for spoken 
> language or written language. We should have made better efforts to 
> solve that problem long ago, but we have not.
>
> 5.4 can be modified to specify the simple limited case and the 
> problems that block us from specifying other cases:
>
>
>       5.4
>       <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4>.
>       Media and modality Combination problems
>
>
>
>
>     The problem of indicating a language tag for the view of a speaking person in a video stream is out of scope for this document.
>
>     The problem of indicating a language tag for use of written language coded as a component in a video stream is out of scope for this document.
>
>     The use of language tags for negotiation of languages in other media than audio, video and text is not defined in this document.
>
>     The problem of knowing which language tags are signed and which are not can be deducted
>     from the IANA language tag registry. How this is done is out of scope of this document.
>
> --------------------------------------------------------
>
>
> But if we want to allow more cases, we need to consider the following 
> complications:
>
>
> 1. to assess if a language represents a Sign Language, the application 
> can look for the word "sign" in the description in the  IANA language 
> registry or a copy thereof as Randall already indicated.
>
> 2. For written languages used as a text component in a video stream, 
> it is possible to code this for languages requiring a script subtag, 
> but not for languages with suppressed script subtags
>
> 3. We have also discussed proposals for how to code written language 
> in video stream for languages not requiring a script subtag, but not 
> got acceptance for our proposals. So we need to say that that is 
> currently undefined.
>
> 4. We also discussed how to code a view of a speaking person in video 
> and said that that could be done by using the "definitively not 
> written" script subtag on a non-signed language tag in video. But that 
> was not appreciated by the language experts. Another option was to not 
> allow written language overlayed on video, and that is the lately used 
> option. ( up to version -16 or so)
>
> 5. For talking and hearing audio media, we only have that case for 
> language-tags in Audio. So that is easy to code and assess.
>
> 6. For written language in text media, a check can be made about if 
> "sign" is part of the language tag description, and if not, it is a 
> written language.
>
> 7. For signed language in text media, a check can be made about if 
> "sign" is part of the language tag description, and if it is, it is a 
> signed language in text notation. (extremely unusual)
>
> 8. For use with language tags in other media than audio, video and 
> text, there is a need for a description on how to assess the modality, 
> especially for non-signed languages before it is used.
>
>
> We can construct a section 5.4 to describe this situation, but I doubt 
> that it is worth the effort.
>
>
>>
>> See: 
>> https://rawgit.com/w3c/webrtc-pc/master/webrtc.html#dom-rtcrtpparameters-degradationpreference
>>
>> On Sun, Oct 15, 2017 at 2:22 PM, Gunnar Hellström 
>> <gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>> wrote:
>>
>>     Den 2017-10-15 kl. 21:27, skrev Paul Kyzivat:
>>
>>         On 10/15/17 1:49 PM, Bernard Aboba wrote:
>>
>>             Paul said:
>>
>>             "For the software to know must mean that it will behave
>>             differently for a tag that represents a sign language
>>             than for one that represents a spoken or written
>>             language. What is it that it will do differently?"
>>
>>             [BA] In terms of behavior based on the signed/non-signed
>>             distinction, in -17 the only reference appears to be in
>>             Section 5.4, stating that certain combinations are not
>>             defined in the document (but that definition of those
>>             combinations was out of scope):
>>
>>
>>         I'm asking whether this is a distinction without a
>>         difference. I'm not asking whether this makes a difference in
>>         the *protocol*, but whether in the end it benefits the
>>         participants in the call in any way.
>>
>>     <GH>Good point, I was on my way to make a similar comment earlier
>>     today. The difference it makes for applications to "know" what
>>     modality a language tag represents in its used position seems to
>>     be only for imagined functions that are out of scope for the
>>     protocol specification.
>>
>>         For instance:
>>
>>         - does it help the UA to decide how to alert the callee, so
>>         that the
>>           callee can better decide whether to accept the call or
>>         instruct the
>>           UA about how to handle the call?
>>
>>     <GH>Yes, for a regular human user -to-user call, the result of
>>     the negotiation must be presented to the participants, so that
>>     they can start the call with a language and modality that is agreed.
>>     That presentation could be exactly the description from the
>>     language tag registry, and then no "knowledge" is needed from the
>>     application. But it is more likely that the application has its
>>     own string for presentation of the negotiated language and
>>     modality. So that will be presented. But it is still found by a
>>     table lookup between language tag and string for a language name,
>>     so no real knowledge is needed.
>>     We have said many times that the way the application tells the
>>     user the result of the negotiation is out of scope for the draft,
>>     but it is good to discuss and know that it can be done.
>>     A similar mechanism is also needed for configuration of the
>>     user's language preference profile further discussed below.
>>
>>
>>         - does it allow the UA to make a decision whether to accept
>>         the media?
>>
>>     <GH>No, the media should be accepted regardless of the result of
>>     the language negotiation.
>>
>>
>>         - can the UA use this information to change how to render the
>>         media?
>>
>>     <GH>Yes, for the specialized text notation of sign language we
>>     have discussed but currently placed out of scope, a very special
>>     rendering application is needed. The modality would be recognized
>>     by a script subtag to a sign language tag used in text media.
>>     However, I think that would be best to also use it with a
>>     specific text subtype, so that the rendering can be controlled by
>>     invocation of a "codec" for that rendering.
>>
>>
>>         And if there is something like this, will the UA be able to
>>         do this generically based on whether the media is sign
>>         language or not, or will the UA need to already understand
>>         *specific* sign language tags?
>>
>>     <GH>Applications will need to have localized versions of the
>>     names for the different sign languages and also for spoken
>>     languages and written languages, to be used in setting of
>>     preferences and announcing the results of the negotiation. It
>>     might be overkill to have such localized names for all languages
>>     in the IANA language registry, so it will need to be able to
>>     handle localized names of a subset och the registry. With good
>>     design however, this is just an automatic translation between a
>>     language tag and a corresponding name, so it does in fact not
>>     require any "knowledge" of what modality is used with each
>>     language tag.
>>     The application can ask for the configuration:
>>     "Which languages do you want to offer to send in video"
>>     "Which languages do you want to offer to send in text"
>>     "Which languages do you want to offer to send in audio"
>>     "Which languages do you want to be prepared to receive in video"
>>     "Which languages do you want to be prepared to receive in text"
>>     "Which languages do you want to be prepared to receive in audio"
>>
>>     And for each question provide a list of language names to select
>>     from. When the selection is made, the corresponding language tag
>>     is placed in the profile for negotiation.
>>
>>     If the application provides the whole IANA language registry to
>>     the user for each question, then there is a possibility that the
>>     user by mistake selects a language that requires another modality
>>     than the question was about. If the application shall limit the
>>     lists provided for each question, then it will need a kind of
>>     knowledge about which language tags suit each modality (and media)
>>
>>
>>
>>         E.g., A UA serving a deaf person might automatically
>>         introduce a sign language interpreter into an incoming
>>         audio-only call. If the incoming call has both audio and
>>         video then the video *might* be for conveying sign language,
>>         or not. If not then the UA will still want to bring in a sign
>>         language interpreter. But is knowing the call generically
>>         contains sign language sufficient to decide against bringing
>>         in an interpreter? Or must that depend on it being a sign
>>         language that the user can use? If the UA is configured for
>>         all the specific sign languages that the user can deal with
>>         then there is no need to recognize other sign languages
>>         generically.
>>
>>     <GH>We are talking about specific language tags here and knowing
>>     what modality they are used for. The user needs to specify which
>>     sign languages they prefer to use. The callee application can be
>>     made to look for gaps between what the caller offers and what the
>>     callee can accept, and from that deduct which type and languages
>>     for a conversion that is needed, and invoke that as a relay
>>     service. That invocation can be made completely table driven and
>>     have corresponding translation profiles for available relay
>>     services. But it is more likely that it is done by having some
>>     knowledge about which languages are sign languages and which are
>>     spoken languages and sending the call to the relay service to try
>>     to sort out if they can handle the translation.
>>
>>
>>
>>     So, the answer is - no, the application does not really have any
>>     knowledge about which modality a language tag represents in its
>>     used position. If the user selects to indicate very rare language
>>     tag indications for a media, then a match will just become very
>>     unlikely.
>>
>>     Where does this discussion take us? Should we modify section 5.4
>>     again?
>>
>>     Thanks
>>     Gunnar
>>
>>             Thanks,
>>             Paul
>>
>>                   5.4
>>             <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4
>>             <https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-17#section-5.4>>.
>>                   Undefined Combinations
>>
>>
>>
>>                 The behavior when specifying a non-signed language
>>             tag for a video
>>                 media stream, or a signed language tag for an audio
>>             or text media
>>                 stream, is not defined in this document.
>>
>>                 The problem of knowing which language tags are signed
>>             and which are
>>                 not is out of scope of this document.
>>
>>
>>
>>             On Sun, Oct 15, 2017 at 10:13 AM, Paul Kyzivat
>>             <pkyzivat@alum.mit.edu <mailto:pkyzivat@alum.mit.edu>
>>             <mailto:pkyzivat@alum.mit.edu
>>             <mailto:pkyzivat@alum.mit.edu>>> wrote:
>>
>>                 On 10/15/17 2:24 AM, Gunnar Hellström wrote:
>>
>>                     Paul,
>>                     Den 2017-10-15 kl. 01:19, skrev Paul Kyzivat:
>>
>>                         On 10/14/17 2:03 PM, Bernard Aboba wrote:
>>
>>                             Gunnar said:
>>
>>                             "Applications not implementing such
>>             specific notations
>>                             may use the following simple deductions.
>>
>>                             - A language tag in audio media is
>>             supposed to indicate
>>                             spoken modality.
>>
>>                             [BA] Even a tag with "Sign Language" in
>>             the description??
>>
>>                             - A language tag in text media is
>>             supposed to indicate                 written modality.
>>
>>                             [BA] If the tag has "Sign Language" in
>>             the description,
>>                             can this document really say that?
>>
>>                             - A language tag in video media is
>>             supposed to indicate
>>                             visual sign language modality except for
>>             the case when
>>                             it is supposed to indicate a view of a
>>             speaking person
>>                             mentioned in section 5.2 characterized by
>>             the exact same
>>                             language tag also appearing in an audio
>>             media specification.
>>
>>                             [BA] It seems like an over-reach to say
>>             that a spoken
>>                             language tag in video media should instead be
>>                             interpreted as a request for Sign
>>             Language.  If this
>>                             were done, would it always be clear which
>>             Sign Language
>>                             was intended?  And could we really assume
>>             that both
>>                             sides, if negotiating a spoken language
>>             tag in video
>>                             media, were really indicating the desire
>>             to sign?  It
>>                             seems like this could easily result
>>             interoperability
>>                             failure.
>>
>>
>>                         IMO the right way to indicate that two (or
>>             more) media
>>                         streams are conveying alternative
>>             representations of the
>>                         same language content is by grouping them
>>             with a new
>>                         grouping attribute. That can tie together an
>>             audio with a
>>                         video and/or text. A language tag for sign
>>             language on the
>>                         video stream then clarifies to the recipient
>>             that it is sign
>>                         language. The grouping attribute by itself
>>             can indicate that
>>                         these streams are conveying language.
>>
>>                     <GH>Yes, and that is proposed in
>>                     draft-hellstrom-slim-modality-grouping with two
>>             kinds of
>>                     grouping: One kind of grouping to tell that two
>>             or more
>>                     languages in different streams are alternatives
>>             with the same
>>                     content and a priority order is assigned to them
>>             to guide the
>>                     selection of which one to use during the call.
>>             The other kind of
>>                     grouping telling that two or more languages in
>>             different streams
>>                     are desired together with the same language
>>             content but
>>                     different modalities ( such as the use for
>>             captioned telephony
>>                     with the same content provided in both speech and
>>             text, or sign
>>                     language interpretation where you see the
>>             interpreter, or
>>                     possibly spoken language interpretation with the
>>             languages
>>                     provided in different audio streams ). I hope
>>             that that draft
>>                     can be progressed. I see it as a needed
>>             complement to the pure
>>                     language indications per media.
>>
>>
>>                 Oh, sorry. I did read that draft but forgot about it.
>>
>>                     The discussion in this thread is more about how
>>             an application
>>                     would easily know that e.g. "ase" is a sign
>>             language and "en" is
>>                     a spoken (or written) language, and also a
>>             discussion about what
>>                     kinds of languages are allowed and indicated by
>>             default in each
>>                     media type. It was not at all about falsely using
>>             language tags
>>                     in the wrong media type as Bernard understood my
>>             wording. It was
>>                     rather a limitation to what modalities are used
>>             in each media
>>                     type and how to know the modality with cases that
>>             are not
>>                     evident, e.g. "application" and "message" media
>>             types.
>>
>>
>>                 What do you mean by "know"? Is it for the *UA*
>>             software to know, or
>>                 for the human user of the UA to know? Presumably a
>>             human user that
>>                 cares will understand this if presented with the
>>             information in some
>>                 way. But typically this isn't presented to the user.
>>
>>                 For the software to know must mean that it will
>>             behave differently
>>                 for a tag that represents a sign language than for
>>             one that
>>                 represents a spoken or written language. What is it
>>             that it will do
>>                 differently?
>>
>>                          Thanks,
>>                          Paul
>>
>>
>>                     Right now we have returned to a very simple rule:
>>             we define only
>>                     use of spoken language in audio media, written
>>             language in text
>>                     media and sign language in video media.
>>                     We have discussed other use, such as a view of a
>>             speaking person
>>                     in video, text overlay on video, a sign language
>>             notation in
>>                     text media, written language in message media,
>>             written language
>>                     in WebRTC data channels, sign written and spoken
>>             in bucket media
>>                     maybe declared as application media. We do not
>>             define these
>>                     cases. They are just not defined, not forbidden.
>>             They may be
>>                     defined in the future.
>>
>>                     My proposed wording in section 5.4 got too many
>>                     misunderstandings so I gave up with it. I think
>>             we can live with
>>                     5.4 as it is in version -16.
>>
>>                     Thanks,
>>                     Gunnar
>>
>>
>>
>>                         (IIRC I suggested something along these lines
>>             a long time ago.)
>>
>>                              Thanks,
>>                              Paul
>>
>>                         _______________________________________________
>>                         SLIM mailing list
>>             SLIM@ietf.org <mailto:SLIM@ietf.org>
>>             <mailto:SLIM@ietf.org <mailto:SLIM@ietf.org>>
>>             https://www.ietf.org/mailman/listinfo/slim
>>             <https://www.ietf.org/mailman/listinfo/slim>
>>                         <https://www.ietf.org/mailman/listinfo/slim
>>             <https://www.ietf.org/mailman/listinfo/slim>>
>>
>>
>>
>>                 _______________________________________________
>>                 SLIM mailing list
>>             SLIM@ietf.org <mailto:SLIM@ietf.org>
>>             <mailto:SLIM@ietf.org <mailto:SLIM@ietf.org>>
>>             https://www.ietf.org/mailman/listinfo/slim
>>             <https://www.ietf.org/mailman/listinfo/slim>
>>                 <https://www.ietf.org/mailman/listinfo/slim
>>             <https://www.ietf.org/mailman/listinfo/slim>>
>>
>>
>>
>>
>>     -- 
>>     -----------------------------------------
>>     Gunnar Hellström
>>     Omnitor
>>     gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>
>>     +46 708 204 288 <tel:%2B46%20708%20204%20288>
>>
>>
>
> -- 
> -----------------------------------------
> Gunnar Hellström
> Omnitor
> gunnar.hellstrom@omnitor.se
> +46 708 204 288

-- 
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288