Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language

"Phillips, Addison" <addison@lab126.com> Sun, 19 November 2017 23:07 UTC

Return-Path: <prvs=489527da5=addison@lab126.com>
X-Original-To: slim@ietfa.amsl.com
Delivered-To: slim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 47BA31200B9 for <slim@ietfa.amsl.com>; Sun, 19 Nov 2017 15:07:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.919
X-Spam-Level:
X-Spam-Status: No, score=-6.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 52drv95VZylk for <slim@ietfa.amsl.com>; Sun, 19 Nov 2017 15:07:46 -0800 (PST)
Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) (using TLSv1.2 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3977512009C for <slim@ietf.org>; Sun, 19 Nov 2017 15:07:46 -0800 (PST)
X-IronPort-AV: E=Sophos;i="5.44,423,1505779200"; d="scan'208,217";a="662872309"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1e-17c49630.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-2101.iad2.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Nov 2017 23:07:28 +0000
Received: from EX13MTAUWB001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1e-17c49630.us-east-1.amazon.com (8.14.7/8.14.7) with ESMTP id vAJN7Npk126011 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Sun, 19 Nov 2017 23:07:24 GMT
Received: from EX13D08UWB004.ant.amazon.com (10.43.161.232) by EX13MTAUWB001.ant.amazon.com (10.43.161.249) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Sun, 19 Nov 2017 23:07:23 +0000
Received: from EX13D08UWB002.ant.amazon.com (10.43.161.168) by EX13D08UWB004.ant.amazon.com (10.43.161.232) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Sun, 19 Nov 2017 23:07:23 +0000
Received: from EX13D08UWB002.ant.amazon.com ([10.43.161.168]) by EX13D08UWB002.ant.amazon.com ([10.43.161.168]) with mapi id 15.00.1236.000; Sun, 19 Nov 2017 23:07:23 +0000
From: "Phillips, Addison" <addison@lab126.com>
To: Bernard Aboba <bernard.aboba@gmail.com>, Gunnar Hellström <gunnar.hellstrom@omnitor.se>
CC: "slim@ietf.org" <slim@ietf.org>
Thread-Topic: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language
Thread-Index: AQHTYJ2lgFSz4ObNVkeN4ZgI9lM9A6MauTKAgABzB4CAASVlQA==
Date: Sun, 19 Nov 2017 23:07:23 +0000
Message-ID: <f75ade4b0f3740c9af26ec274e9857e1@EX13D08UWB002.ant.amazon.com>
References: <CAOW+2dsZtuciPiKMfif=ZmUqBcUd9TyYtL5gPYDp7ZfLOHHDBA@mail.gmail.com> <6ebf2b8a-8699-27c1-87af-41acab4cb940@omnitor.se> <CAOW+2duq9qkXBy8S+a_GSpmPwypMGLfYL3V9ZZfkrDraSA+S1w@mail.gmail.com>
In-Reply-To: <CAOW+2duq9qkXBy8S+a_GSpmPwypMGLfYL3V9ZZfkrDraSA+S1w@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.43.161.100]
Content-Type: multipart/alternative; boundary="_000_f75ade4b0f3740c9af26ec274e9857e1EX13D08UWB002antamazonc_"
MIME-Version: 1.0
Precedence: Bulk
Archived-At: <https://mailarchive.ietf.org/arch/msg/slim/0-00cjlg9Q1uXvfvevaDObGqp10>
Subject: Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language
X-BeenThere: slim@ietf.org
X-Mailman-Version: 2.1.22
List-Id: Selection of Language for Internet Media <slim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/slim>, <mailto:slim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/slim/>
List-Post: <mailto:slim@ietf.org>
List-Help: <mailto:slim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/slim>, <mailto:slim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Nov 2017 23:07:50 -0000

A few points.


1.       Sign languages do not necessarily use the subtag ‘sgn’. In fact, most sign languages have subtags that have nothing to do with the ‘sgn’ subtag. They are not ‘extlang’ and do not have a macrolanguage of ‘sgn’. For example, here is the record for American Sign Language:

%%
Type: language
Subtag: ase
Description: American Sign Language
Added: 2009-07-29
%%



2.       Sign languages may have the words “Sign Language” in one of their Description fields (notice that I say “one of”). I’m not sure that all sign languages have this. Someone would have to ask the ISO 639 folks if there are any outliers.

3.       Suppress-Script is an advisory field in the registry. It does not cause or require that the script subtag be omitted. It is also an incomplete bit of documentation: many languages that fit the criteria for Suppress-Script do not have the field in the registry. Use or non-use of the script subtag is not a very reliable indicator of language modality on its own. Tags like “en-US” or “de” function well for identifying the language of various materials and I caution folks that, in my experience, arcane rules about the specialized use of subtags are likely to be ignored.

Overall, my suggestion would be: if you need to deal with modality, don’t use (existing) language subtags for it. Encode metadata to make things explicit. Use private use subtags or an extension for it if you must. But don’t provide super-special meaning for subtags. Personally, I tend to think you should encode modality on a different level. Someone who is hard of hearing and low vision has different needs from a blind user who has different needs from a deaf person with limited mobility… etc.

Addison

From: SLIM [mailto:slim-bounces@ietf.org] On Behalf Of Bernard Aboba
Sent: Saturday, November 18, 2017 9:25 PM
To: Gunnar Hellström <gunnar.hellstrom@omnitor.se>
Cc: slim@ietf.org
Subject: Re: [Slim] Moving forward on draft-ietf-slim-negotiating-human-language

Gunnar said:

"I earlier thought that an application needed to look into the language subtag Description to find the word "sign" there in the text string. That is not a good solution."

[BA] Agreed. Also, as indicated in RFC 5646 Section 4.1.2 and the IANA registry (https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), sign languages may not have always have a subtag of 'sgn':


   Sign languages share a mode of communication rather than a linguistic

   heritage.  There are many sign languages that have developed

   independently, and the subtag 'sgn' indicates only the presence of a

   sign language.  A number of sign languages also had grandfathered

   tags registered for them during the RFC 3066<https://tools.ietf.org/html/rfc3066> era.  For example, the

   grandfathered tag "sgn-US" was registered to represent 'American Sign

   Language' specifically, without reference to the United States.  This

   is still valid, but deprecated: a document in American Sign Language

   can be labeled either "ase" or "sgn-ase" (the 'ase' subtag is for the

   language called 'American Sign Language').

Gunnar also said:


"A specific sign language can be identified by its existence in the IANA

registry of language subtags according to BCP 47 [RFC5646] , and finding

that the language subtag is found at least in two entries in the

registry, once with the Type field "language" and once with the Type

field "extlang" combined with the Prefix field value "sgn".

So that should be the response on Dales request to easily decide if a language tag is for a sign language. "

[BA]  Looking at the IANA registry, having a Type field "extlang" combined with the Prefix field value 'sgn' seems to be used as an indicator of a sign language.  Do you think we can rely on this? Currently this is only a SHOULD  in RFC 5646 Section 3.4:


            3.  Sign languages SHOULD have an 'extlang' record with an'Prefix' of 'sgn'.





"My wording proposal starts with the obvious cases: a non-signed language tag in audio media is spoken, and a non-signed language tag in text media is written."



[BA] Assuming your suggested approach allows us to reliably determine non-sign languages, this seems solid.



Gunnar further said:



"But for other cases, like in video or message or application or multiplexed media, other indications must be used to

understand the intended modality... I wish for the ambiguous cases we could use the script subtag -zxxx to indicate

spoken modality and a real script subtag..."



[BA] This is where I become uneasy, because without an explicit mechanism such as script subtags, there is the potential for ambiguity.

Trying to address reduce that ambiguity via heuristics could turn out to be a bad idea, compared with proceeding more cautiously

by leaving behavior undefined for now and revisiting the situation later when we understand the problem better. For example:



"Use for sending of a visual view of a speaking person may be indicated by the value "speaker" in an SDP

Content attribute according to RFC 4796 [RFC4796] in a "video" media stream or another media carrying video (e.g. "message" or "application")."

[BA] There are quite a few potential corner cases here. For example, if "en-US" language is included in an offer within a video m-line, should the answerer assume this implies a willingness to lip read US English if the value "speaker" is in the Content attribute?  What can be assumed if a value other than "speaker" is in the Content attribute? Might that represent something entirely different, such as the desire to receive captioning in US English? If so, how could the Offerer indicate both the capability of lipreading and the ability to handle captions? And what happens if the Answerer doesn't mimic the Content attribute in the Offer? Seems like there are some potential "gotchas" here.

"Use of written modality in another media stream than

"text", may be discriminated by use of a script subtag in the language

tag, where that is appropriate."

[BA] What if the language in question has script subtags suppressed? What if the Offerer includes a script subtag in the video m-line but also "speaker" in the Content attribute? Again, there could be quite a few corner cases lurking here.

















On Sat, Nov 18, 2017 at 2:33 PM, Gunnar Hellström <gunnar.hellstrom@omnitor.se<mailto:gunnar.hellstrom@omnitor.se>> wrote:
Thanks Bernard for pushing for closing the last open issue.
Den 2017-11-18 kl. 19:46, skrev Bernard Aboba:

At this point, only a single Issue (43) remains open on draft-ietf-slim-negotiating-human-language::
https://trac.ietf.org/trac/slim/ticket/43

This relates to the modality of a language indication.

Currently, Gunnar has suggested a modification to the text of Section 5.4 in order to address the issue:
https://mailarchive.ietf.org/arch/msg/slim/A4b6Wpgh0Z0zpXKqpwF9bfdW35g


Can WG participants review this suggested change, so that we can determine how to move forward?

Currently, Section 5.4 states that:


   The problem of knowing which language tags are signed and which are

   not is out of scope of this document.
I earlier thought that an application needed to look into the language subtag Description to find the word "sign" there in the text string. That is not a good solution. But when studying the topic again in RFC 5646 I found that there is a consistent machine-implementable way to assess if a language subtag is for a sign language.
Therefore I included this text in the latest proposal for section 5.4 at the link that Bernard provided:

"

A specific sign language can be identified by its existence in the IANA

registry of language subtags according to BCP 47 [RFC5646] , and finding

that the language subtag is found at least in two entries in the

registry, once with the Type field "language" and once with the Type

field "extlang" combined with the Prefix field value "sgn".
"

So that should be the response on Dales request to easily decide if a language tag is for a sign language.

Worse is next topic in issue 43: to assess if a language tag is for a spoken modality or written modality of a language.
My wording proposal starts with the obvious cases: a non-signed languge tag in audio media is spoken, and a non-signed language tag in text media is written.  But for other cases, like in video or message or application or multiplexed media, other indications must be used to understand the intended modality. The proposed text mentions a few, and leaves to applications to decide which mechanisms to use for such cases. I wish we for the ambiguous cases could use the script subtag -zxxx to indicate spoken modality and a real script subtag even on language subtags where script subtags are suppressed, because that would satisfy issue 43 nicely and make section 5.4 much shorter and clearer. But we have had resistance against that solution.

The proposed text might be a bit long and detailed. I am prepared to agree on a shortened version if there are any proposals. I think though that contents of 5.4 in the direction of my proposal is what satisfies  issue 43 and also the comments lately that section 5.4 is too restrictive.

/Gunnar





_______________________________________________

SLIM mailing list

SLIM@ietf.org<mailto:SLIM@ietf.org>

https://www.ietf.org/mailman/listinfo/slim



--

-----------------------------------------

Gunnar Hellström

Omnitor

gunnar.hellstrom@omnitor.se<mailto:gunnar.hellstrom@omnitor.se>

+46 708 204 288