Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language

Gunnar Hellström <gunnar.hellstrom@omnitor.se> Mon, 20 November 2017 16:08 UTC

Return-Path: <gunnar.hellstrom@omnitor.se>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4151E1294A3 for <slim@ietfa.amsl.com>; Mon, 20 Nov 2017 08:08:00 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2PV150sQ2H0s for <slim@ietfa.amsl.com>; Mon, 20 Nov 2017 08:07:56 -0800 (PST)
Received: from bin-vsp-out-02.atm.binero.net (bin-mail-out-06.binero.net [195.74.38.229]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6D789129B1D for <slim@ietf.org>; Mon, 20 Nov 2017 08:07:54 -0800 (PST)
X-Halon-ID: ecd2c2cc-ce0c-11e7-96ae-005056917f90
Authorized-sender: gunnar.hellstrom@omnitor.se
Received: from [192.168.2.136] (unknown [83.209.157.37]) by bin-vsp-out-02.atm.binero.net (Halon) with ESMTPSA id ecd2c2cc-ce0c-11e7-96ae-005056917f90; Mon, 20 Nov 2017 17:07:37 +0100 (CET)
From: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
To: "Phillips, Addison" <addison@lab126.com>, Bernard Aboba <bernard.aboba@gmail.com>
Cc: "slim@ietf.org" <slim@ietf.org>
References: <CAOW+2dsZtuciPiKMfif=ZmUqBcUd9TyYtL5gPYDp7ZfLOHHDBA@mail.gmail.com> <6ebf2b8a-8699-27c1-87af-41acab4cb940@omnitor.se> <CAOW+2duq9qkXBy8S+a_GSpmPwypMGLfYL3V9ZZfkrDraSA+S1w@mail.gmail.com> <f75ade4b0f3740c9af26ec274e9857e1@EX13D08UWB002.ant.amazon.com>
Message-ID: <d04c79be-ba92-bece-9d3e-2439186b18f8@omnitor.se>
Date: Mon, 20 Nov 2017 17:07:43 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <f75ade4b0f3740c9af26ec274e9857e1@EX13D08UWB002.ant.amazon.com>
Content-Type: multipart/alternative; boundary="------------79230663AB024BEAA48745C9"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/LZkrR1hXT9QWmXrLK9AHFWUWyCo>
Subject: Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Nov 2017 16:08:00 -0000

Addison, thanks for guidance,

Den 2017-11-20 kl. 00:07, skrev Phillips, Addison:
>
> A few points.
>
> 1.Sign languages do not necessarily use the subtag ‘sgn’. In fact, 
> most sign languages have subtags that have nothing to do with the 
> ‘sgn’ subtag. They are not ‘extlang’ and do not have a macrolanguage 
> of ‘sgn’. For example, here is the record for American Sign Language:
>
> %%
>
> Type: language
>
> Subtag: ase
>
> Description: American Sign Language
>
> Added: 2009-07-29
>
> %%
>
<GH>But all sign languages appear again in the registry registered as 
type "extlang" with Prefix value "sgn"
e.g. for American Sign Language :

%%
Type: extlang
Subtag: ase
Description: American Sign Language
Added: 2009-07-29
Preferred-Value: ase
Prefix: sgn
%%

Therefore the procedure to assess if a language is a sign language I 
have proposed is to search the registry for this combination. The 
procedure could be formalized to use the recommendations about "extlang 
form" described in RFC 5646 section 4.5 on canonicalization (not for 
sending the tag, but when evaluating a tag).

> 2.Sign languages may have the words “Sign Language” in one of their 
> Description fields (notice that I say “one of”). I’m not sure that all 
> sign languages have this. Someone would have to ask the ISO 639 folks 
> if there are any outliers.
>
<GH>Yes, but the rule about Prefix: sgn above is much more suitable for 
use in a procedure for finding out the modality. The description string 
could possibly just be used when telling the human parties what the 
result of the negotiation was.
>
> 3.Suppress-Script is an advisory field in the registry. It does not 
> cause or require that the script subtag be omitted. It is also an 
> incomplete bit of documentation: many languages that fit the criteria 
> for Suppress-Script do not have the field in the registry. Use or 
> non-use of the script subtag is not a very reliable indicator of 
> language modality on its own. Tags like “en-US” or “de” function well 
> for identifying the language of various materials and I caution folks 
> that, in my experience, arcane rules about the specialized use of 
> subtags are likely to be ignored.
>
> Overall, my suggestion would be: if you need to deal with modality, 
> don’t use (existing) language subtags for it. Encode metadata to make 
> things explicit. Use private use subtags or an extension for it if you 
> must. But don’t provide super-special meaning for subtags. Personally, 
> I tend to think you should encode modality on a different level. 
> Someone who is hard of hearing and low vision has different needs from 
> a blind user who has different needs from a deaf person with limited 
> mobility… etc.
>
<GH>From the discussion above, we can distinguish sign languages, and 
thereby signed modality without using any extra subtags in the attributes.

But we cannot reliably distinguish between the sound of spoken language 
and a view of a speaking person without applying more subtle indications 
like what media subtypes there are specified etc, and that carries us 
too far away from the request in issue #43 to have a simple way for 
applications to assess the modality.

So, in summary, I suggest that we in section 5.4 describe:
1. The three obvious cases:  written modality  in text media, spoken in 
audio and signed in video ( including an indication on how sign language 
is identified.)
I think it is important to make the reader aware of that it is only for 
these three obvious cases that we have a solution, and that justifies a 
place for section 5.4 but with far less detail than my latest proposal.

2.  Tell that other cases can be defined in specific applications or by 
further work.



Thanks,
Gunnar
>
> Addison
>
> *From:*SLIM [mailto:slim-bounces@ietf.org] *On Behalf Of *Bernard Aboba
> *Sent:* Saturday, November 18, 2017 9:25 PM
> *To:* Gunnar Hellström <gunnar.hellstrom@omnitor.se>
> *Cc:* slim@ietf.org
> *Subject:* Re: [Slim] Moving forward on 
> draft-ietf-slim-negotiating-human-language
>
> Gunnar said:
>
> "I earlier thought that an application needed to look into the 
> language subtag Description to find the word "sign" there in the text 
> string. That is not a good solution."
>
> [BA] Agreed. Also, as indicated in RFC 5646 Section 4.1.2 and the IANA 
> registry 
> (https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), 
> sign languages may not have always have a subtag of 'sgn':
>
>    Sign languages share a mode of communication rather than a linguistic
>    heritage.  There are many sign languages that have developed
>    independently, and the subtag 'sgn' indicates only the presence of a
>    sign language.  A number of sign languages also had grandfathered
>    tags registered for them during the RFC 3066 
> <https://tools.ietf.org/html/rfc3066> era.  For example, the
>    grandfathered tag "sgn-US" was registered to represent 'American Sign
>    Language' specifically, without reference to the United States.  This
>    is still valid, but deprecated: a document in American Sign Language
>    can be labeled either "ase" or "sgn-ase" (the 'ase' subtag is for the
>    language called 'American Sign Language').
>
> Gunnar also said:
>
> "A specific sign language can be identified by its existence in the IANA
> registry of language subtags according to BCP 47 [RFC5646] , and finding
> that the language subtag is found at least in two entries in the
> registry, once with the Type field "language" and once with the Type
> field "extlang" combined with the Prefix field value "sgn".
>
> So that should be the response on Dales request to easily decide if a 
> language tag is for a sign language. "
>
> [BA]  Looking at the IANA registry, having a Type field "extlang" 
> combined with the Prefix field value 'sgn' seems to be used as an 
> indicator of a sign language.  Do you think we can rely on this? 
> Currently this is only a SHOULD  in RFC 5646 Section 3.4:
>
>             3.  Sign languages SHOULD have an 'extlang' record with 
> an'Prefix' of 'sgn'.
> "My wording proposal starts with the obvious cases: a non-signed 
> language tag in audio media is spoken, and a non-signed language tag 
> in text media is written."
> [BA] Assuming your suggested approach allows us to reliably determine 
> non-sign languages, this seems solid.
> Gunnar further said:
> "But for other cases, like in video or message or application or 
> multiplexed media, other indications must be used to
> understand the intended modality... I wish for the ambiguous cases we 
> could use the script subtag -zxxx to indicate
> spoken modality and a real script subtag..."
> [BA] This is where I become uneasy, because without an explicit 
> mechanism such as script subtags, there is the potential for ambiguity.
> Trying to address reduce that ambiguity via heuristics could turn out 
> to be a bad idea, compared with proceeding more cautiously
> by leaving behavior undefined for now and revisiting the situation 
> later when we understand the problem better. For example:
> "Use for sending of a visual view of a speaking person may be 
> indicated by the value "speaker" in an SDP
> Content attribute according to RFC 4796 [RFC4796] in a "video" media 
> stream or another media carrying video (e.g. "message" or "application")."
> [BA] There are quite a few potential corner cases here. For example, 
> if "en-US" language is included in an offer within a video m-line, 
> should the answerer assume this implies a willingness to lip read US 
> English if the value "speaker" is in the Content attribute?  What can 
> be assumed if a value other than "speaker" is in the Content 
> attribute? Might that represent something entirely different, such as 
> the desire to receive captioning in US English? If so, how could the 
> Offerer indicate both the capability of lipreading and the ability to 
> handle captions? And what happens if the Answerer doesn't mimic the 
> Content attribute in the Offer? Seems like there are some potential 
> "gotchas" here.
> "Use of written modality in another media stream than
> "text", may be discriminated by use of a script subtag in the language
> tag, where that is appropriate."
> [BA] What if the language in question has script subtags suppressed? 
> What if the Offerer includes a script subtag in the video m-line but 
> also "speaker" in the Content attribute? Again, there could be quite a 
> few corner cases lurking here.
>
> On Sat, Nov 18, 2017 at 2:33 PM, Gunnar Hellström 
> <gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>> wrote:
>
>     Thanks Bernard for pushing for closing the last open issue.
>
>     Den 2017-11-18 kl. 19:46, skrev Bernard Aboba:
>
>         At this point, only a single Issue (43) remains open on
>         draft-ietf-slim-negotiating-human-language::
>
>         https://trac.ietf.org/trac/slim/ticket/43
>
>         This relates to the modality of a language indication.
>
>         Currently, Gunnar has suggested a modification to the text of
>         Section 5.4 in order to address the issue:
>
>         https://mailarchive.ietf.org/arch/msg/slim/A4b6Wpgh0Z0zpXKqpwF9bfdW35g
>
>         Can WG participants review this suggested change, so that we
>         can determine how to move forward?
>
>         Currently, Section 5.4 states that:
>
>            The problem of knowing which language tags are signed and
>         which are
>
>            not is out of scope of this document.
>
>     I earlier thought that an application needed to look into the
>     language subtag Description to find the word "sign" there in the
>     text string. That is not a good solution. But when studying the
>     topic again in RFC 5646 I found that there is a consistent
>     machine-implementable way to assess if a language subtag is for a
>     sign language.
>     Therefore I included this text in the latest proposal for section
>     5.4 at the link that Bernard provided:
>
>     "
>
>     A specific sign language can be identified by its existence in the
>     IANA
>
>     registry of language subtags according to BCP 47 [RFC5646] , and
>     finding
>
>     that the language subtag is found at least in two entries in the
>
>     registry, once with the Type field "language" and once with the Type
>
>     field "extlang" combined with the Prefix field value "sgn".
>
>     "
>
>     So that should be the response on Dales request to easily decide
>     if a language tag is for a sign language.
>
>     Worse is next topic in issue 43: to assess if a language tag is
>     for a spoken modality or written modality of a language.
>     My wording proposal starts with the obvious cases: a non-signed
>     languge tag in audio media is spoken, and a non-signed language
>     tag in text media is written. But for other cases, like in video
>     or message or application or multiplexed media, other indications
>     must be used to understand the intended modality. The proposed
>     text mentions a few, and leaves to applications to decide which
>     mechanisms to use for such cases. I wish we for the ambiguous
>     cases could use the script subtag -zxxx to indicate spoken
>     modality and a real script subtag even on language subtags where
>     script subtags are suppressed, because that would satisfy issue 43
>     nicely and make section 5.4 much shorter and clearer. But we have
>     had resistance against that solution.
>
>     The proposed text might be a bit long and detailed. I am prepared
>     to agree on a shortened version if there are any proposals. I
>     think though that contents of 5.4 in the direction of my proposal
>     is what satisfies  issue 43 and also the comments lately that
>     section 5.4 is too restrictive.
>
>     /Gunnar
>
>
>         _______________________________________________
>
>         SLIM mailing list
>
>         SLIM@ietf.org <mailto:SLIM@ietf.org>
>
>         https://www.ietf.org/mailman/listinfo/slim
>
>
>
>     -- 
>
>     -----------------------------------------
>
>     Gunnar Hellström
>
>     Omnitor
>
>     gunnar.hellstrom@omnitor.se <mailto:gunnar.hellstrom@omnitor.se>
>
>     +46 708 204 288
>
>
>
> _______________________________________________
> SLIM mailing list
> SLIM@ietf.org
> https://www.ietf.org/mailman/listinfo/slim

-- 
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom@omnitor.se
+46 708 204 288